The disclosed subject matter relates to the field of graphics processing and, without limitation, systems and methods relating to tessellating in a graphics pipeline.
Graphics processing units (GPUs) have become important for processing data-parallel graphics tasks. Developers now recognize that non-graphics data-parallel tasks can also be handled by GPUs, taking advantage of their massively parallel capabilities. Vendors and standards organizations have created application programming interfaces (APIs) that make graphics data-parallel tasks easier to program. There are also low-level APIs (or libraries/frameworks, etc.) that reside closer to hardware and are generally employed by applying the output of the higher-level APIs. In other words, the higher-level APIs generally simply prepare output for consumption by the lower-level APIs.
GPUs commonly use programs called shader programs or shaders. One common example of a shader is a program that operates on a pixels (or the computational equivalent). In addition to shaders, GPUs may execute programs generally called Kernels. Like shaders, kernels are generally programs used in parallel execution, but kernels differ from shaders in that kernels are used for compute functions rather than simply shading.
Kernels or shaders may be used in a graphics pipeline as part of a tessellation process. In graphics, tessellation refers to the subdivision of graphics sections (e.g., “patches”) for rendering. If a section is more subdivided, the rendered graphic will be more refined and show more detail. Referring to
Referring to
Vertex Shader
A vertex shader 410 is a common type of 3D shader that operates on a single vertex, meaning that it takes a single vertex as input and produces a single vertex as output. Most commonly the purpose of a vertex shader is to transform a 3D point in virtual space (e.g. a model) to a 2D point (and potentially a depth value) that will appear on a screen. Vertex shaders are known in the art and generally allow control over graphics aspects such as position, movement, lighting and color. Vertex shaders do not create new vertices.
Hull Shader
A hull shader 411 is a programmable shader that is generally used to indicate how much tessellation should occur in a patch and where. A developer or a system uses tessellation factors to indicate the level of tessellation desired for the patch being processed and the areas in the patch where there should be more or less tessellation. Any number of tessellation factors may be used and many are known in the art. Some example tessellation factors are provided below in examples of embodiment implementations. A hull shader receives a patch (e.g. patch control points) as input and produces a patch (e.g. patch control points) as output. The hull shader may transform the input control points that define a low-order surface into the output control points that describe a patch. In some examples, the hull shader transforms basis function from a base mesh to surface patches. The hull shader may also perform calculations and provide data (e.g. patch constant data) for later portions or the pipeline (e.g., the tessellator and the domain shader). In some examples, the hull shader receives a group of vertices or control points representing a patch (e.g. between 1 and 32 control points), and outputs a user-defined number (e.g. between 1 and 32) of control points that represent the output patch. For example, if there are more control points in the output patch, then more tessellation will be used on the patch.
Tessellator
Tessellator 412 is a fixed-function portion of the pipeline that creates a sampling pattern across a surface associated with a patch and generates primitives (triangles, lines, or points) that connect these samples. The purpose of the tessellator 412 is to divide a domain such as a line, triangle, or quad into smaller items to reflect more detail (e.g. small triangles). To be very clear, tessellator 412 does not transform the output patch from the hull shader 411. Rather, tessellator 412 uses tessellation factors to develop a tiled canonical domain (e.g. polygon) in a normalized (e.g. zero-to-one) coordinate system. For example, a quad domain (e.g.
Domain Shader
The domain shader 413 is a programmable shader stage that uses as its input output portions from both the tessellator 412 and the hull shader 411. Thus, domain shader 412 has access to both a low-order patch representing the appearance of the graphic (output of the hull shader 411), the patch data (output of the hull shader), and information regarding how that low-order patch should be tessellated (output of the tessellator 412). Having these inputs, the domain shader may produce, as output, vertex data for each surface sample on the patch produced by the tessellation stage, where the output vertex data closely represents the appearance of the underlying graphic (e.g., data may include positions, texture coordinates, attributes, etc.). The domain shader 413 may be called for each vertex generated by the tessellator 412 and may generate the final vertex data for the tessellated primitives (e.g. triangles). For example, the domain shader may modify a vertex position by sampling a texture for displacement mapping to add additional detail to the rendered geometry.
Geometry Shader
A geometry shader 414 is a 3D shader that may generate new graphics primitives based upon the input primitives to the pipeline. The geometry shader may be used, for example, in point sprite generation, geometry tessellation, and shadow volume extrusion.
Rasterizer
Rasterizer portion 415 serves the purpose of converting vector graphics (e.g. mathematically described graphics) to fragments, which are often embodied as pixels. Thus, the rasterizer 415 generally accepts vertex data and outputs pixel information.
Fragment Shader
Fragment shader 416 shades the fragments, for example, adding color and other visible attributes to each pixel prior to its use in a frame buffer and ultimately for display on a display device (not shown in
Many embodiments of the disclosure relate to the use of software with graphics processing units (GPUs), for creating graphics that benefit from tessellation. Some embodiments employ a graphics pipeline to produce one or more graphic frames, the graphics pipeline including a tessellator, a domain shader, a rasterizer portion and a fragment shader. Other embodiments may employ an alternative graphics pipeline, also to produce one or more graphic frames, the alternative pipeline including a tessellator, a post-tessellation vertex function, rasterizer and a fragment function. Furthermore some embodiments of the aforementioned pipelines are preceded by a compute kernel or a patch kernel as explained herein.
Tessellation according to DX11 employs at least a six or seven stage pipeline. Embodiments of the disclosure offer simplified and more flexible tessellation pipelines by eliminating early pipeline stages such as a vertex shader or a hull shader that are not always necessary, but consume resources whether or not they are necessary. In some embodiments of the disclosure, graphics pipelines are proposed that do not employ the vertex shader and hull shader. Instead, the functions of vertex shader and hull shader may be obviated by pre-supplied or otherwise supplied patches, patch data, and tessellation factors. In particular, patches, patch data, and tessellation factors may be supplied by the developer and stored in memory for retrieval at runtime. In addition, a compute kernel may be used to generate or retrieve any part of the necessary information that is not directly retrievable from memory.
In some embodiments of the disclosure, a scaling technique may be employed to derive new tessellation factors without traditional calculation of those factors. In particular, tessellation factors may be scaled according to the distance from the camera of the subject graphic—the closer the camera the higher the tessellation and vice versa.
This disclosure pertains to systems, methods, and computer readable media to improve the operation of graphics development systems and graphics systems. It also pertains to a variety of architectures for design and/or operation of a graphics development system and the rendering of related graphics on an end-user device. In general, many embodiments of this disclosure envision the use of tessellation in the graphics pipeline as embodied by the teaching and suggestions herein.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed concepts. As part of this description, some of this disclosure's drawings represent structures and devices in block diagram form in order to avoid obscuring the novel aspects of the disclosed concepts. In the interest of clarity, not all features of an actual implementation may be described. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes, and may not have been selected to emphasize the inventive subject matter, leaving resorting to the claims as a potential necessity to determine such inventive subject matter. Reference in this disclosure to “one embodiment” or to “an embodiment” or “embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosed subject matter, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment. In addition, the use of the word “or” in this disclosure is intended to indicate an optional alternative (as in and/or) and not an exclusive alternative (as in or, but not both), unless the exclusivity is specifically noted. Furthermore, use of the word “include” and its various forms is intended to be illustrative of included items and is not intended that the included items are the only included matters.
It will be appreciated that in the development of any actual implementation (as in any software and/or hardware development project), numerous decisions must be made to achieve the developers' specific goals (e.g., compliance with system- and business-related constraints), and that these goals may vary from one implementation to another. It will also be appreciated that such development efforts might be complex and time-consuming, but would nonetheless be a routine undertaking for those having the benefit of this disclosure and being of ordinary skill in the design and implementation of user interface and response systems and/or gesture identification processing systems.
Exemplary Hardware and Software
The embodiments described herein may have implication and use in and with respect to all types of devices, including single- and multi-processor computing systems and vertical devices (e.g., cameras, gaming systems, appliances, etc.) that incorporate single- or multi-processing computing systems. The discussion herein is made with reference to a common computing configuration that may be discussed as a software development system or an end-user system. This common computing configuration may have a CPU resource including one or more microprocessors (each having one or more processing cores) and a graphics resource including one or more GPUs (each having one or more processing cores). In many embodiments, the CPU(s) and GPU(s) work together to present graphic content on a display that may or may not be integral with a computing device that includes the processing resource and graphics resource. As discussed below, in many embodiments, the computing device may employ novel processes and hardware arrangements to improve graphics efficiency or performance by provision of improved tessellation.
This discussion is only for illustration regarding sample embodiments and is not intended to confine application of the disclosed subject matter to the disclosed hardware. Other systems having other known or common hardware configurations (now or in the future) are fully contemplated and expected. With that caveat a typical hardware and software operating environment is discussed below. The hardware configuration may be found, for example, in a server computer system, a workstation computer system, a laptop computer system, a tablet computer system, a desktop computer system, a gaming platform (whether or not portable), a television, an entertainment system, a smart phone, a phone, or any other computing device, whether mobile or stationary.
Referring to
Returning to
When executed by processor 105 and/or graphics hardware 120, computer program code (e.g., shaders or kernels) may implement one or more of the methods or processes described herein. Communication interface 130 may include semiconductor-based circuits and be used to connect computer system 100 to one or more networks. Illustrative networks include, but are not limited to: a local network such as a USB network; a business's local area network; and a wide area network such as the Internet and may use any suitable technology (e.g., wired or wireless). Communications technologies that may be implemented include cell-based communications (e.g., LTE, CDMA, GSM, HSDPA, etc.) or other communications (Ethernet, WiFi, Bluetooth®, USB, Thunderbolt®, Firewire®, etc.). User interface adapter 135 may be used to connect keyboard 150, microphone 155, pointer device 160, speaker 165, and other user interface devices such as a touchpad and/or a touch screen (not shown). Display adapter 140 may be used to connect one or more display units 170.
Processor 105 may execute instructions necessary to carry out or control the operation of many functions performed by system 100 (e.g., evaluation, transformation, and compilation of graphics programs). Processor 105 may, for instance, drive display 170 and receive user input from user interface adapter 135 or any other user interfaces embodied by a system. User interface 135, for example, can take a variety of forms, such as a button, a keypad, a dial, a click wheel, a keyboard, a display screen, and/or a touch screen. Processor 105 may be any type of computing device such as one or more microprocessors working alone or in combination with one or more GPUs, DSPs, system-on-chip devices such as those found in some mobile devices. Processor 105 may include one or more dedicated GPUs or graphics subsystems that accept program instructions to create or alter display information such as mathematical models or pixels. In addition, processor 105 may be based on reduced instruction-set computer (RISC) or complex instruction-set computer (CISC) architectures or any other suitable architecture and may include one or more processing cores. Graphics hardware 120 may be special purpose computational hardware for processing graphics and/or assisting processor 105 in performing computational tasks. In some embodiments, graphics hardware 120 may include CPU-integrated graphics and/or one or more programmable GPUs, which may be operated in serial or parallel cooperation. Graphics hardware, such as GPUs may employ integrated memory, such as SRM, external memory such as memory 110 (that is either dedicated or shared), or a combination of both.
Output from the sensors 125 may be processed, at least in part, by processors 105 and/or graphics hardware 120, and/or a dedicated image processing unit incorporated within or without system 100. Information so captured may be stored in memory 110 and/or storage 115 and/or any storage accessible on an attached network. Memory 110 may include one or more different types of media used by processor 105, graphics hardware 120, and sensors 125 to perform device functions. Storage 115 may store data such as media (e.g., audio, image, and video files); metadata for media; computer program instructions; and other software; including database applications (e.g., a database storing avatar frames), preference information, device profile information, and any other suitable data. Memory 110 and storage 115 may be used to retain computer program instructions or code organized into one or more modules in either compiled form or written in any desired computer programming language. When executed by, for example, processor 105 or one or more GPUs in the system, such computer program code may implement one or more of the acts or functions described herein (e.g., compiling shader code, generating executable code, executing executable code, executing shaders, executing kernels, or executing a tessellator software module).
In addition to the foregoing, in some embodiments, graphics hardware 120 may further include a hardware tessellator to perform the tessellator functions described below.
Client computers 215 (i.e., 215A, 215B, and 215C), which may take the form of any smartphone, gaming system, tablet computer system, desktop computer system, set top box, entertainment device/system, television, telephone, communications device, or intelligent machine, including embedded systems, may also be coupled to networks 205, and/or data server computers 210. In some embodiments, network architecture 210 may also include network printers such as printer 220 and storage systems such as 225, which may be used to store multi-media items or other data that are referenced herein. To facilitate communication between different network devices (e.g., data servers 210, end-user computers 215, network printer 220, and storage system 225), at least one gateway or router 230 may be optionally coupled there-between. Furthermore, in order to facilitate such communication, each device employing the network may comprise a network adapter circuit and related software. For example, if an Ethernet network is desired for communication, each participating device must have an Ethernet adapter or embedded Ethernet-capable ICs. Further, the devices may carry network adapters for any network in which they might participate (including, but not limited to, PANs, LANs, WANs, and cellular networks).
As noted above, embodiments of the inventions disclosed herein include software. As such, a description of common computing software architecture is provided as expressed in a layer diagram in
Returning to
Referring again to
Software Raytracer 353 is software for creating image information based upon the process of tracing the path of light through pixels in the plane of an image. Pure Software Rasterizer 354 refers generally to software used to make graphics information such as pixels without specialized graphics hardware (e.g., using only the CPU). These libraries or frameworks shown within the O/S services layer 385 are only exemplary and intended to show the general level of the layer and how it relates to other software in a sample arrangement (e.g. kernel operations usually below and higher-level Applications Services 360 usually above). In addition, it may be useful to note that Metal 352 represents a published framework/library of Apple Inc. that is known to developers in the art. Furthermore, OpenGL 351 may represent a framework/library present in versions of software either currently or formerly distributed by Apple Inc.
Above the O/S services layer 385 there is an Application Services layer 380, which includes Sprite Kit 361, Scene Kit 362 Core Animation 363, and Core Graphics 364. The O/S services layer represents higher-level frameworks that are commonly directly accessed by application programs. In some embodiments of this disclosure the O/S services layer includes graphics-related frameworks that are high level in that they are agnostic to the underlying graphics libraries (such as those discussed with respect to layer 385). In such embodiments, these higher-level graphics frameworks are meant to provide developer access to graphics functionality in a more user/developer friendly way and allow developers to avoid work with shading and graphics primitives. By way of example, Sprite Kit 361 is a graphics rendering and animation infrastructure made available by Apple Inc. Sprite Kit 361 may be used to animate textured images or “sprites.” Scene Kit 362 is a 3D-rendering framework from Apple Inc. that supports the import, manipulation, and rendering of 3D assets at a higher level than frameworks having similar capabilities, such as OpenGL. Core Animation 363 is a graphics rendering and animation infrastructure made available from Apple Inc. Core Animation 363 may be used to animate views and other visual elements of an application. Core Graphics 364 is a two-dimensional drawing engine from Apple Inc. Core Graphics 365 provides 2D rendering for applications.
Above the application services layer 380, there is the application layer 375, which may comprise any type of application program. By way of example,
In evaluating O/S services layer 385 and applications services layer 380, it may be useful to realize that different frameworks have higher- or lower-level application program interfaces, even if the frameworks are represented in the same layer of the
DirectX 11—DX11 Tessellation
Referring to
Referring again to
Referring again to
Pipeline Embodiments for Tessellation
Referring to
In embodiments or pipeline instances without compute kernel 601, one or more shaders or compute kernels may be used to generate the patch data or tessellation factors. In yet other embodiments, the patch data or tessellation factors may be generated offline and simply read from memory obviating the need (at least partially) for the use of compute kernel 601. For example, tessellation factors may be provided for all patches or frames and simply accessed from memory 608. Alternatively, tessellation factors may be for one or more patches or frames and additionally scaling factors associated with a variety of camera positions (e.g. distance) relative to those patches or frames may also be provided. Some embodiments simply use the CPU or a compute kernel to determine tessellation factors for every frame by using the provided factors along with an appropriate scaling factor. Importantly, the use of the term scaling factor is not intended to limit the use to simple mathematics. Differing embodiments may embody scaling factors as simple multipliers or complex functions. For example, a scaling factor may be an integer, a mathematical function or even a programmatic sequence that includes both functions and conditions so that the scaling effects may depend upon various factors including system events (e.g., application state, or screen or graphics settings).
Referring again to
The domain shader 605 produces vertex information that may be transformed into fragments or pixels by rasterizer 606. Notably, in some embodiments, domain shader 605 is the only vertex shader in the pipeline or in the tessellation portion of the pipeline, which is a significant efficiency as compared to the DX11 pipeline. After rasterization, the fragment shader may color or otherwise shade the fragments or pixels and store the result back in the graphics memory 608. In some embodiments, after shading, fragments or pixels are stored in a buffer such as a frame buffer and the fragments or pixels may be organized as frames for display on a target display device 620.
The arrows shown in
With reference to
With reference to
As suggested by
Referring again to
Referring again to
In some embodiments, tessellator 704 takes as input one or more of: (i) the number of patches to be processed: (ii) for each patch to be processed, the patch type (e.g. quad or triangle), or if all patches are the same type, then simply the patch type; (iii) a selected output primitive type (e.g., triangles) for each patch or for all patches if the selected output primitive type is the same; (iv), a buffer (e.g. an address or pointer) that stores the per-patch tessellation factors for each patch to be tessellated, or if the factors are the same for all the patches, a single buffer; and (v) the output primitive orientation (e.g. if the output primitive is a triangle). As discussed above, the tessellator 704 may produce a canonical domain as an output, which in some embodiments is bound to the post-tessellation vertex function 705 as an input.
Referring again to
In one or more embodiments, the post-tessellation vertex function 705 generates the final vertex data for the tessellated triangles. For example, to add additional detail (such as displacement mapping values) to the rendered geometry, the post-tessellation vertex function may sample a texture to modify the vertex position by a displacement value. In some embodiments, the post-tessellation vertex function 705 serves as the last or only vertex shader in the pipeline or in the tessellation portion of the pipeline. After processing by the post-tessellation vertex function 705, the post processed vertices represent the appearance of the graphic to be embodied in a frame or other visible embodiment of the graphic (e.g. produced at or after 711, post shading and ultimately sent to display device 720 for display).
Referring again to
Tessellator Primitive Generation
As suggested above, in one or more embodiments, tessellator 704 consumes input patch information and produces a new set of, for example, triangles reflecting the desired degree of tessellation. In some embodiments, these triangles are produced by subdividing the patch (quad or triangle) according to the per-patch tessellation factors discussed below. This subdivision may be performed in an implementation-dependent manner. For example, for triangle patches, the tessellator 704 may subdivide a triangle primitive into smaller triangles; and for quad patches, the primitive generator may subdivide a rectangle primitive into smaller triangles. In at least one embodiment, each vertex produced by the tessellator 704 may be expressed in barycentric coordinates and associated (u, v, w) or (u, v) coordinates in a normalized parameter space, with parameter values in the range [0, 1].
Quad Patches
In some programmed embodiments of the disclosure, per-patch tessellation factors may be declared for example as structs. With reference to
With reference to
Regarding
Triangle Patches
As stated above, in some programmed embodiments of the disclosure, per-patch tessellation factors may be declared for example as structs. With reference to
With reference to
Regarding edgeTessellationFactor: the value in index 0 provides the tessellation factor for the u==0 edge of the patch; the value in index 1 provides the tessellation factor for the v==0 edge of the patch; and the value in index 2 provides the tessellation factor for the w==0 edge of the patch.
Discarding Patches
Some embodiments of the disclosure contemplate discarding certain patches. For example, with reference to
In one programmatic embodiment, if the tessellation factor scale is enabled (e.g. tessellationFactorScaleEnabled in MTLRenderPipelineDescriptor), then the tessellator (e.g. 704) first multiplies the relevant edge and inside tessellation factors of the patch by the specified scale factor. In one embodiment, for quad patches, all four edge tessellation factors are relevant. In another embodiment, for triangle patches, only the first three edge tessellation factors are relevant.
Implementation Upon Prior Systems
One or more of the embodiments described herein may be conceived as altered versions of graphics development environments and frameworks that are currently commonly known. For example, many embodiments of this disclosure may relate to the Apple Metal programming environment and operation. Furthermore, many embodiments of the disclosure are particularly intended for implementation on hardware and systems suited for DX11. The following further description of embodiments and implementation details are intended to illustrate concepts often through the recitation of implementation examples and code examples. No limitation to the details shown is intended. The examples illustrate concepts regarding implementation, such as APIs and are also illustrative of the concepts discussed above.
APIs
As indicated above, this disclosure contemplates the use of a program interface for developers to manipulate the use of the tessellation pipeline embodiments taught and suggested herein. For example, tessellation properties may be manipulated by the developer using an application interface. In some embodiments associated with Apple's Metal programming paradigm, the interface may be associated with MTLRenderPipelineDescriptor. APIs may be provided to the developer to indicate or control one or more of the following:
Specific API Implementation Examples
In some embodiments associated with Apple's Metal programming paradigm, specific implementation examples may be as follows:
The post-tessellation vertex function may be specified as vertexFunction in MTLRenderPipelineDescriptor.
MTLRenderPipelineDescriptor Properties for Tessellation
The following new properties are added to MTLRenderPipelineDescriptor. Note, however, in some examples, if the vertex function is not a post-tessellation vertex function, all the following tessellation properties are ignored.
NSUInteger maxTessellationFactor specifies the maximum tessellation factor to be used by the tessellator when tessellating a patch (or patches).
The maximum tessellation factor is 64. The default is 16.
The maximum tessellation factor must be a power of 2 if tessellationPartitionMode is MTLTessellationPartitionModePow2.
The maximum tessellation factor must be an even number if tessellationPartitionMode is MTLTessellationPartitionModeFractionalOdd or MTLTessellationPartitionModeFractionalEven.
BOOL tessellationFactorScaleEnabled indicates if the tessellation factor is scaled or not. If the scale is enabled, the scale factor is applied to the tessellation factors after the patch cull check is performed and the patch is not culled and before the tessellation factors are clamped to the maxTessellationFactor. The default is NO.
MTLTessellationFactorFormat tessellationFactorFormat describes the format of the tessellation factors specified in the tessellation factor buffer.
tessellationFactorFormat must be one of the following values:
MTLTessellationControlPointIndexType tessellationControlPointIndexType describes the size of the control-point indices specified by the controlPointIndexBuffer in the drawIndexedPatches API.
tessellationControlPointIndexType must be one of the following values:
MTLTessellationFactorStepFunction tessellationFactorStepFunction specifies the step function used to determine the tessellation factors for a patch from the tessellation factor buffer. The default value is MTLTessellationFactorStepFunctionConstant.
MTLWinding tessellationOutputWindingOrder specifies the winding order of triangles output by the tessellator. The default value is MTLWindingClockwise.
MTLTessellationPartitionMode tessellationPartitionMode specifies the partitioning mode used by the tessellator to derive the number and spacing of segments used to subdivide a corresponding edge. tessellationPartitionMode is one of the following values:
The default value is MTLTessellationPartitionModePow2. (In the descriptions below, max is the maxTessellationFactor specified in the MTLRenderPipelineDescriptor.)
The following describes the tessellation factor range for the supported tessellation partitioning modes:
MTLTessellationPartitionModePow2, range=[1, max];
MTLTessellationPartitionModeInteger, range=[1, max−1]; and MTLTessellationPartitionModeFractionalEven, range=[2,max].
If tessellationPartitionMode is MTLTessellationPartitionModePow2, the floating-point tessellation level is first clamped to the range [1, max]. The result is rounded up to the nearest integer m, where m is a power of 2, and the corresponding edge is divided into m segments of equal length in (u, v) space.
If tessellationPartitionMode is MTLTessellationPartitionModeInteger, the floating-point tessellation level is first clamped to the range [1, max]. The result is rounded up to the nearest integer n, and the corresponding edge is divided into n segments of equal length in (u, v) space.
If tessellationPartitionMode is MTLTessellationPartitionModeFractionalEven, the tessellation level is first clamped to the range [2, max] and then rounded up to the nearest even integer n. If tessellationPartitionMode is MTLTessellationPartitionModeFractionalOdd, the tessellation level is clamped to the range [1, max−1] and then rounded up to the nearest odd integer n. If n is 1, the edge is not subdivided. Otherwise, the corresponding edge is divided into n−2 segments of equal length, and two additional segments of equal length that are typically shorter than the other segments. The length of the two additional segments relative to the others decrease monotonically by the value of n−f where f is the clamped floating-point tessellation level. If n−f is zero, the additional segments have equal length to the other segments. As n−f approaches 2.0, the relative length of the additional segments approaches zero. The two additional segments should be placed symmetrically on opposite sides of the subdivided edge. The relative location of these two segments is undefined, but must be identical for any pair of subdivided edges with identical values of f.
Specifying Tessellation Factors
The following MTLRenderCommandEncoder API specifies the per-patch tessellation factors:
The following MTLRenderCommandEncoder API specifies the per-patch tessellation scale factor:
(void)setTessellationFactorScale: (float)scale
With respect to specifying tessellation factors, in some embodiments, offset must be in a multiple of 4 bytes, and scale may be converted to a half-precision floating-point value before it is multiplied by the tessellation factors. In many embodiments, scale must be a positive normal half-precision floating-point value; i.e., is neither <=zero, denormal, infinite, nor NaN.
In many embodiments, for quad patches, the tessellation factor stride is 12 bytes and, for triangle patches, the tessellation factor stride is 8 bytes.
MTLTessellationFactorStepFunction
The MTLTessellationFactorStepFunction is defined as:
If the step function is MTLTessellationFactorStepFunctionPerInstance and MTLTessellationFactorStepFunctionPerPatchAndPerInstance, instanceStride must be a value>0. Otherwise instanceStride must be 0.
If stepFunction is MTLTessellationFactorStepFunctionConstant, for all instances, the tessellation factor for all patches in drawPatches is at location offset in the tessellation factor buffer.
If stepFunction is MTLTessellationFactorStepFunctionPerPatch, for all instances, the tessellation factor for a patch in drawPatches is at location offset+(drawPatchIndex*tessellation factor stride) in the tessellation factor buffer.
If stepFunction is MTLTessellationFactorStepFunctionPerinstance, for a given instance ID, the tessellation factor for all patches in drawPatches is at location offset+(instance ID* instanceStride) in the tessellation factor buffer.
If stepFunction is MTLTessellationFactorStepFunctionPerPatchAndPerinstance, for a given instance ID, the tessellation factor for a patch in drawPatches is at location offset+(drawPatchIndex*tessellation factor stride+instance ID*instanceStride) in the tessellation factor buffer. (patchCount is either a direct or indirect argument to drawPatches.
Specifying Patch Control-Point and Per-Patch Data
The post-tessellation vertex function can read the patch control-point and any user per-patch data by either: indexing into one or more buffers that are passed as arguments to the post-tessellation vertex function using the patch ID; or accessing values that are directly passed in as an argument to the post-tessellation vertex function declared with the [[stage_in]] qualifier.
When directly passed in as argument declared with the [[stage_in]] qualifier, the patch control-point data and per-patch data are declared as elements in a user-defined struct. The patch control-point data must be declared as a patch control_point<T>templated type, where T is a user-defined struct that describes the patch control-point data. All other elements declared in this struct describe the per-patch data. Passing patch data using the [[stage_in]] qualifier allows developers to decouple the actual storage format of the patch data from the types declared in the post-tessellation vertex function (similar to support for per-vertex data inputs to a regular vertex function).
All per-patch inputs to the post-tessellation vertex function declared with the [[stage_in]] qualifier must specify an attribute location using [[attribute(index)]]. The index value is an unsigned integer value that identifies the patch input data location that is being assigned. The MTLVertexDescriptor object is used to configure how the patch data stored in memory is mapped to patch data declared in a shader.
In some examples, the following new enums are added to MTLVertexStepFunction:
MTLVertexStepFunctionPerPatch
MTLVertexStepFunctionPerPatchControlPoint
If step function is MTLVertexStepFunctionPerPatch, the shader fetches data based on the patch index of the patch.
If step function is MTLVertexStepFunctionPerPatchControlPoint, the shader fetches data based on the control-point indices associated with the patch.
The patch control-point data layout is described in MTLVertexDescriptor with an MTLVertexStepFunctionPerPatchControlPoint step function. The per-patch data layout is described in MTLVertexDescriptor with an MTLVertexStepFunctionPerPatch step function.
The MTLVertexStepFunctionConstant and MTLVertexStepFunctionPerInstance step functions can also be used to describe per-patch or control-point data. However, the MTLVertexStepFunctionPerVertex step function cannot be used to describe patch control-point and per-patch data.
Specifying Per-Thread Compute Kernel Data
An app developer typically uses an MTLVertexDescriptor-like structure to describe the inputs to the DirectX/OpenGL vertex shader. MTLStageInputOutputDescriptor is introduced in MTLFeatureSet_OSX_GPUFamily1_v2 to enable using a descriptor similar to MTLVertexDescriptor to specify the actual format of the per-thread data (such as control-point or per-patch data) for a compute kernel at runtime. Although intended to support compute kernel generation of tessellation factors, this generic API approach to provide [[stage_in]] data (i.e., per-thread data) can be used for a number of use cases. The API changes are:
(A) MTLStageInputOutputDescriptor is added that is similar to MTLVertexDescriptor with the following differences:
Drawing Tessellated Primitives
To render a number of instances of tessellated patches, you can call the following drawPatches or drawIndexedPatches draw calls in MTLRenderCommandEncoder with patch data. The drawIndexedPatches calls (third and fourth calls below) support using a buffer of indices to indirectly reference the control-point indices of a patch. If the vertex function is a post-tessellation vertex function, only the drawPatches or drawIndexedPatches APIs from MTLRenderCommandEncoder can be called to render primitives. Calling the drawPrimitives or drawIndexedPrimitives APIs causes the validation layer to report an error. If the vertex function is not a post-tessellation vertex function, calling the drawPatches or drawIndexedPatches API from MTLRenderCommandEncoder causes the validation layer to report an error.
For all draw patch API calls, the per-patch data and an array of patch control points are organized for rendering in contiguous array elements, starting from baseInstance. The number of patch instances rendered is specified by instanceCount. numberOfPatchControlPoints refers to the number of control-points in a patch, which must be a value between 0 and 32, inclusive. The patchStart and patchCount arguments refer to the patch start index and the number of patches in each instance of the draw call, respectively.
The second and fourth draw patch calls listed above support use a MTLBuffer (indirectBuffer) that indirectly specifies the draw call parameters in the corresponding fields of the MTLDrawPatchIndirectArguments structure defined as follows:
To render patch data, the drawPatches API fetches per-patch data and the control-point data. Patch data is typically stored together for all patches of one or more meshes in one or more buffers. A kernel is then run to generate the view-dependent tessellation factors. When generating the tessellation factors, we only want to generate the factors for patches that are not to be discarded, which means the patch IDs of the patches to be tessellated and rendered might not be contiguous.
A buffer index (drawPatchIndex) in the range from [patchStart, patchStart+patchCount−1] is used to reference data. In cases where the patch indices used to fetch the patch control-point and per-patch data are not contiguous, drawPatchIndex can reference patchIndexBuffer. Each element of patchIndexBuffer contains a 32-bit patchIndex value that references the control-point and per-patch data. The patchIndex fetched from patchIndexBuffer is at the location: (drawPatchIndex*4)+patchIndexBufferOffset.
patchIndexBuffer also enables the patchIndex used to read the per-patch and patch control-point data to be different from the index used to read the patch tessellation factors. For the fixed-function tessellator, drawPatchIndex is directly used as an index to fetch patch tessellation factors.
If patchIndexBuffer is null, the drawPatchIndex and patchIndex are the same value.
In cases where control-points are shared across patches or the patch control-point data is not contiguous, use the drawIndexedPatches API. patchIndex references a specified controlPointIndexBuffer, which contains the control-point indices of a patch. (tessellationControlPointIndexType describes the size of the control-point indices in controlPointIndexBuffer and must be either tessellationControlPointIndexTypeUInt16 or tessellationControlPointIndexTypeUInt32.) The actual location of the first control-point index in controlPointIndexBuffer is computed as:
Several (numberOfPatchControlPoints) control-point indices must be stored consecutively in controlPointIndexBuffer, starting at the location of the first control-point index.
Implementation Examples for Porting DX11-Style Tessellation Shaders to Apple Metal
In DX11, the HLSL vertex shader is executed for each control-point of a patch. The HLSL hull shader is specified by two functions: a function that executes for each control-point of the patch and another that executes per-patch. The output of the vertex shader is input to these two functions that make up the hull shader. Below is a very simple HLSL vertex and hull shader example, which is translated to the Metal shading language later.
The HLSL vertex and hull shaders described above can be translated to Metal functions, and a compute kernel that calls these Metal functions can be created that executes these shader functions as a single kernel. The translated vertex and control-point hull functions are called per-thread in the compute kernel, followed by a threadgroup barrier, and then the per-patch hull function is executed by a subset of the threads in the threadgroup. Being able to directly call the translated vertex and hull functions in the kernel makes it really easy for developers to port their vertex and hull shaders from DirectX or OpenGL to Metal. The HLSL vertex and hull shaders can be translated to the following Metal functions:
A compute kernel that calls these vertex and hull functions can be:
In PatchKernel, a MTLStageInputOutputDescriptor object can be used to describe the [[stage_in]] data for the input VertexIn struct:
It is to be understood that the above description is intended to be illustrative, and not restrictive. The material has been presented to enable any person skilled in the art to make and use the invention as claimed and is provided in the context of particular embodiments, variations of which will be readily apparent to those skilled in the art (e.g., many of the disclosed embodiments may be used in combination with each other). In addition, it will be understood that some of the operations identified herein may be performed in different orders. The scope of the invention, therefore, should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.”
Number | Date | Country | |
---|---|---|---|
62349023 | Jun 2016 | US |