A portion of the disclosure of this patent document contains material which is subject to (copyright or mask work) protection. The (copyright or mask work) owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all (copyright or mask work) rights whatsoever.
This disclosure relates generally to the field of computer programming. More particularly, but not by way of limitation, it relates to a programming language and compilation system for the programming language for programming kernels for execution on a graphical processor unit.
Graphics processor units (GPUs) have become more and more important for processing data-parallel graphics tasks. Developers have also recognized that non-graphics data-parallel tasks can also be handled by GPUs, taking advantage of their massive parallel capabilities. Vendors and standards organizations have created application programming interfaces (APIs) that make graphics data-parallel tasks easier to program. Similarly, vendors and standards organizations have created different APIs that make compute or non-graphics data-parallel tasks easier to program. However, these high-level APIs have resulted in performance degradation, as well as making combining graphics and compute data-parallel tasks different less convenient, because of the need to use different APIs for each type of task or to write code in languages that are quite different from programming languages such as C++ that developers very commonly use for writing code on the CPU.
A compiler and library provide the ability to compile a programming language according to a defined language model into a programming language independent, machine independent intermediate representation, for conversion into an executable on a target programmable device. The language model allows writing programs that perform data-parallel graphics and non-graphics tasks.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without these specific details. In other instances, structure and devices are shown in block diagram form in order to avoid obscuring the invention. References to numbers without subscripts or suffixes are understood to reference all instance of subscripts and suffixes corresponding to the referenced number. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter. Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment.
As used herein, the term “a computer system” can refer to a single computer or a plurality of computers working together to perform the function described as being performed on or by a computer system. Similarly, a machine-readable medium can refer to a single physical medium or a plurality of media that may together contain the indicated information stored thereon. A processor can refer to a single processing element or a plurality of processing elements, implemented either on a single chip or on multiple processing chips.
A graphics processor unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. A GPU is efficient at manipulating computer graphics and has a highly parallel structure that makes it more efficient than a general-purpose computer processor (CPU) where processing of large blocks of data is done in parallel. GPUs are also used for non-graphical parallel processing, sometimes referred to as “compute processing,” in addition to graphics processing.
Embodiments described in more detail below allow software developers to prepare applications using a programming language that complies with a language model that is designed to assist developers to write efficient multi-threaded programs that can perform both graphics and compute (non-graphics) processing on GPUs. The compiler for the programming language generates a machine-independent programming language independent intermediate representation. The developer need not be concerned with the architecture of the specific GPU used, allowing the hardware that on which the program may eventually execute to be changed without requiring the developer to rewrite the program. The intermediate representation may then be distributed without the source code for the program. To allow the program to execute on a target machine, the intermediate representation may be compiled using an embedded compiler for the intermediate representation on the target machine, producing a final executable that can be stored for repeated execution. Alternately, an intermediate system may be used to compile the intermediate representation into a machine-specific executable.
The programming language described in more detail below uses a language model that is unified for compute and graphics functionality and is designed for ahead-of-time compilation. Developers can use the language to write code that is executed on the GPU for graphics and general-purpose data-parallel computations. In one embodiment, the language model and programming language are based on the C++ programming language, with added features for GPU programming of parallel tasks. In one embodiment, the programming language is based on the C++11 Specification (a.k.a., the ISO/IEC JTC1/SC22/WG21 N3290 Language Specification) with specific extensions and restrictions. Other embodiments may be based upon other programming languages and language models, such as Objective C.
Traditional compilers break the compilation process into two steps: compile and link. The compiler compiles source code to an object file containing machine code and the linker combines object files with libraries to form an executable program. In a simple system, the linker typically does little more than concatenate the object files and resolve symbol references.
In recent years, however, the compilation process has been split into more phases. The compiler now parses the source code and generates an object code file, but instead of containing machine code specific to a target machine, the object code is in the form of an intermediate representation, a form of virtual instruction set. Typically, the intermediate representation is programming language independent and machine independent, being designed for a hypothetical or virtual machine. An optimizing linker then combines the object files, optimizes them, and integrates them into a native executable for the target machine. The optimizing linker may itself be split into separate phases, so that the optimization is done on the intermediate representation, and the final code generation for the target machine may be performed on a different computer system than the optimization and linking The linker may be used by multiple compiler phases, allowing a single linker to be used with multiple programming languages.
By having the linker emit an optimized form of the intermediate representation, the creation of the native executable may be pushed to the target system. An embedded compiler on the target system can compile the intermediate representation into native code for direct execution on the target system. This compiler structure allows the developer to write source code in an machine-independent programming language, compile the source code into optimized intermediate representation code, then distribute the intermediate representation code to the target or end-user system for conversion into an executable form. Thus, source code need not be distributed, allowing protection of the source code from the end user system. In addition, if desired, the target machine instead of compiling the intermediate representation ahead of time, then later executing the target-specific executable, a just-in-time compiler or interpreter may be used to allow on-the-fly compilation or interpretation of the intermediate representation at execution time.
Turning now to
The application 130 may be delivered to the target machine 150 in any desired manner, including electronic transport over a network and physical transport of machine-readable media. This generally involves delivery of the application 130 to a server (not shown in
Upon installation of the application 130 as a collection of pipeline objects 140 that contain state information 142, fragment shaders 144, and vertex shaders 146, the application is compiled by an embedded GPU compiler 170 that compiles the intermediate representation into native binary code for the GPU 180, using a cache 160. The compiled native code may be cached in the cache 160 or stored elsewhere in the target system 150. Finally, the GPU 180 may execute the native binary code, performing the graphics and compute kernels for data parallel operations.
Referring now to
As illustrated in
The storage device 214 is typically a magnetic hard drive, an optical drive, a non-volatile solid-state memory device, or other types of memory systems, which maintain data (e.g. large amounts of data) even after power is removed from the system. While
Referring now to
Computing system 300 includes a CPU 310, a GPU 330. In the embodiment illustrated in
In addition, computing system 300 also includes a system memory 340 that may be accessed by CPU 310 and GPU 330. In various embodiments, computing system 300 may comprise a supercomputer, a desktop computer, a laptop computer, a video-game console, an embedded device, a handheld device (e.g., a mobile telephone, smart phone, MP3 player, a camera, a GPS device, or other mobile device), or any other device that includes or is configured to include a GPU. Although not illustrated in
GPU 330 assists CPU 310 by performing certain special functions, such as graphics-processing tasks and data-parallel, general-compute tasks, usually faster than CPU 310 could perform them in software.
GPU 330 is coupled with CPU 310 and system memory 340 over link 350. Link 350 may be any type of bus or communications fabric used in computer systems, including a peripheral component interface (PCI) bus, an accelerated graphics port (AGP) bus, a PCI Express (PCIE) bus, or another type of link, including non-bus links If multiple links 350 are employed, they may be of different types.
In addition to system memory 340, computing system 300 may include a local memory 320 that is coupled to GPU 330, as well as to link 350. Local memory 320 is available to GPU 330 to provide access to certain data (such as data that is frequently used) faster than would be possible if the data were stored in system memory 340. Local memory 320 may also be available to CPU 310 to provide access to data such as binaries stored in the local memory 320. In some embodiments, separate local memories may be use for the CPU 310 and GPU 330, instead of sharing a common local memory 320.
Although a single CPU 310 and GPU 330 are illustrated in
Turning now to
We now turn to the programming language and language model. The specific syntax illustrated below for the programming language is an example and by way of illustration only, and different syntax may be used as desired. The programming language complies with a language model that allows developers to use low-level data structures for programming both graphics and compute (non-graphics) data-parallel tasks or kernels on the GPU, without having to worry about the specific GPU that will eventually execute the program. The following description of the programming language and language model is Copyright 2014 Apple Inc.
Introduction
This document describes language model for a Unified Graphics and Compute Language according to one embodiment. The language is a C++ based programming language that developers can use to write code that is executed on the GPU for graphics and general-purpose data-parallel computations. Since the language is based on C++, developers will find it familiar and easy to use. With the language, both graphics and compute programs can be written with a single, unified language, which allows tighter integration between the two.
The language is designed to work together with a framework, which manages the execution, and optionally the compilation, of the language code. In one embodiment, the framework and development environment uses clang and LLVM so developers get a compiler that delivers close to the metal performance for code executing on the GPU.
Organization of this Description
The description of the language model is organized into the following chapters:
This chapter, “Introduction,” is an introduction to this document and covers the similarities and differences between the language and C++11.
“Data Types” lists the language data types, including types that represent vectors, matrices, buffers, textures, and samplers. It also discusses type alignment and type conversion.
“Operators” lists the language operators.
“Functions, Variables, and Qualifiers” details how functions and variables are declared, sometimes with qualifiers that restrict how they are used.
“The Standard Library” defines a collection of built-in the language functions.
“Compiler Options” details the options for the language compiler, including pre-processor directives, options for math intrinsics, and options that control optimization.
“Numerical Compliance” describes requirements for representing floating-point numbers, including accuracy in mathematical operations.
The language and C++11
The programming language is based on the C++11 Specification (a.k.a., the ISO/IEC JTC1/SC22/WG21 N3290 Language Specification), which is incorporated by reference herein in its entirety, with specific extensions and restrictions. Please refer to the C++11 Specification for a detailed description of the language grammar.
This section and its subsections describe modifications and restrictions to the C++11 language supported in the language.
For more information about the language pre-processing directives and compiler options, see the compiler options section of this document.
Overloading
The language supports overloading as defined by section 13 of the C++11 Specification. The function overloading rules are extended to include the address space qualifier of an argument. The language graphics and kernel functions cannot be overloaded. (For definition of graphics and kernel functions, see the Function Qualifiers section of this document.)
Templates
The language supports templates as defined by section 14 of the C++11 Specification.
Preprocessing Directives
The language supports the pre-processing directives defined by section 16 of the C++11 Specification.
Restrictions
The following C++11 features are not available in the language according to one embodiment (section numbers in this list refer to the C++11 Specification): lambda expressions (section 5.1.2); dynamic cast operator (section 5.2.7); type identification (section 5.2.8); new and delete operators (sections 5.3.4 and 5.3.5); noexcept operator (section 5.3.7); derived classes (section 10); member access control (section 11); special member functions (section 12); and exception handling (section 15).
The C++ standard library must not be used in the language code. Instead of the C++ standard library, the language has its own standard library that is discussed in the Standard Library section below.
In one embodiment, the language restricts the use of pointers: Arguments to the language graphics and kernel functions declared in a program that are pointers must be declared with the global, local or constant address space qualifier. (See the Address Space Qualifiers section of this document for more about the language address space qualifiers.) Function pointers are not supported.
Arguments to the language graphics and kernel functions cannot be declared as a pointer to a pointer(s).
Members of a struct or class must belong to the same address space. Bit-field struct members are not supported.
The goto statement is not supported.
Arguments to graphics and kernel functions cannot be declared to be of type size_t, ptrdiff_t, or a struct and/or union that contain members declared to be one of these built-in scalar types.
main is a reserved keyword in the language. A function cannot be called main.
The Pixel and Texel Coordinate System
In the language, the origin of the pixel coordinate system of a framebuffer attachment is defined at the top left corner. Similarly, the origin of the texel coordinate system of a framebuffer attachment is the top left corner.
Data Types
This chapter details the language data types, including types that represent vectors and matrices. Atomic data types, buffers, textures, samplers, arrays, and user-defined structs are also discussed. Type alignment and type conversion are also described.
Scalar Data Types
The language supports the scalar types listed in Table 1 (
half-precision floating-point, half
Support for the double data type is optional. The language does not support the C++11 standard long, unsigned long, long, unsigned long, and long double data types.
The language supports the C++11 standard f or F suffix to specify a single precision floating-point literal value (e.g., 0.5f or 0.5F). In addition, the language supports the h or H suffix to specify a half precision floating-point literal value (e.g., 0.5h or 0.5H). The language also supports the u or U suffix for unsigned integer literals.
Vector and Matrix Data Types
The language defines its own built-in data types for 2-, 3-, and 4-component vectors of Boolean, integer, and floating-point values. Vectors may be used to represent graphics constructs, such as colors, vertices, surface normals, or texture coordinates, but they are not limited to representing those. The vector type names supported are:
where n is 2, 3, or 4 representing a 2-, 3- or 4-component vector type.
double n is optional. It is supported if double is supported.
The vec<T,n> templated type can also be used to define a vector. T is one of bool, char, short, int, uchar, ushort, uint, half, float, or double. n is 2, 3, or 4.
The language has built-in data types for matrices of floating-point values that have 2, 3, or 4 columns and rows. The supported matrix types are:
where n and m are number of columns and rows. n and m can be 2, 3 or 4. double nxm is optional. It is supported if double is supported.
The following template types can also be used to declare a matrix: matrix<T, n>, where n is both the number of columns and rows, and matrix<T,n,m>, where n and m are number of columns and rows. n and m is one of 2, 3 or 4. T is one of half, float, or double.
Accessing Vector Components
Vector components can be accessed using an array index. Array index 0 refers to the first component of the vector, index 1 to the second component, and so on. The following examples show various ways to access array components:
The language supports using the period (.) as a selection operator to access vector components, using letters that may indicate coordinate or color data:
In the following code, the vector test is initialized, and then components are accessed using the .xyzw or .rgba selection syntax:
The component selection syntax allows multiple components to be selected.
The component selection syntax also allows components to be permuted or replicated.
The component group notation can occur on the left hand side of an expression. To form the lvalue, swizzling may be applied. The resulting lvalue may be either the scalar or vector type, depending on number of components specified. Each component must be a supported scalar or vector type. The resulting lvalue of vector type must not contain duplicate components.
The following methods of vector component access are not permitted and result in a compile-time error in one embodiment:
Accessing components beyond those declared for the vector type is an error. 2-component vector data types can only access .xy or .rg elements. 3-component vector data types can only access .xyz or .rgb elements. For instance:
Accessing the same component twice on the left-hand side is ambiguous; for instance,
The .rgba and .xyzw qualifiers cannot be intermixed in a single access; for instance,
A pointer or reference to a vector with swizzles; for instance
The sizeof operator on a vector type returns the size of the vector, which is given as the number of components*size of each component. For example, sizeof (float 4) returns 16 and sizeof (half3) returns 6.
Accessing Matrix Components
The floatnxm, halfnxm and doublenxm matrices can be accessed as an array of n floatm, n halfm, or n doublem entries.
The components of a matrix can be accessed using the array subscripting syntax. Applying a single subscript to a matrix treats the matrix as an array of column vectors. The top column is column 0. A second subscript would then operate on the resulting vector, as defined earlier for vectors. Hence, two subscripts select a column and then a row.
Accessing a component outside the bounds of a matrix with a non-constant expression results in undefined behavior. Accessing a matrix component that is outside the bounds of the matrix with a constant expression generates a compile-time error.
Vector Constructors
Constructors can be used to create vectors from a set of scalars or vectors. When a vector is initialized, its parameter signature determines how it is constructed. For instance, if the vector is initialized with only a single scalar parameter, all components of the constructed vector are set to that scalar value.
If a vector is constructed from multiple scalars, one or more vectors, or a mixture of these, the vector's components are constructed in order from the components of the arguments. The arguments are consumed from left to right. Each argument has all its components consumed, in order, before any components from the next argument are consumed.
This is a complete list of constructors that are available for float4:
This is a complete list of constructors that are available for float3:
float3(float x);
This is a complete list of constructors that are available for float2:
The following examples illustrate uses of the constructors:
Under-initializing a vector constructor is a compile-time error.
Matrix Constructors
Constructors can be used to create matrices from a set of scalars, vectors or matrices. When a matrix is initialized, its parameter signature determines how it is constructed. For example, if a matrix is initialized with only a single scalar parameter, the result is a matrix that contains that scalar for all components of the matrix's diagonal, with the remaining components initialized to 0.0. For example, a call to
constructs a matrix with these initial contents:
A matrix can also be constructed from another matrix that is of the same size, i.e., has the same number of rows and columns. For example,
Matrix components are constructed and consumed in column-major order. The matrix constructor must have just enough values specified in its arguments to initialize every component in the constructed matrix object. Providing more arguments than are needed results in an error. Under-initializing a matrix constructor also results in a compile-time error.
A matrix of type T with n columns and m rows can also be constructed from n vec<T, m> vectors. The following examples are legal constructors:
The following are examples of matrix constructors that are not supported. A matrix cannot be constructed from multiple scalar values, nor from combinations of vectors and scalars.
Atomic Data Types
The language atomic data type is restricted for use by atomic functions implemented by the programming language, as described in the section on atomic functions. These atomic functions are a subset of the C++11 atomic and synchronization functions. The language atomic functions must operate on the language atomic data and must be lock-free.
The atomic types are defined as:
Buffers
The language implements buffers as a pointer to a built-in or user defined data type described in the global or constant address space. (Refer to the Address Space Qualifiers section for a full description of these address qualifiers.) These buffers can be declared in program scope or passed as arguments to a function.
Examples:
Textures
The texture data type is a handle to one-, two-, or three-dimensional texture data that corresponds to all or a portion of a single mipmap level of a texture. The following templates define specific texture data types:
T specifies the color type returned when reading from a texture or the color type specified when writing to the texture. For texture types (except depth texture types), T can be half, float, int, or uint. For depth texture types, T must be float.
NOTE: If T is int, the data associated with the texture must use a signed integer format. If T is uint, the data associated with the texture must use an unsigned integer format. If T is half, the data associated with the texture must be either a normalized (signed or unsigned integer) or half precision format. If T is float, the data associated with the texture must be either a normalized (signed or unsigned integer), half or single precision format.
The access qualifier describes how the texture can be accessed. The supported access qualifiers are: sample—The texture object can be sampled. sample implies the ability to read from a texture with and without a sampler; read—Without a sampler, a graphics or kernel function can only read the texture object. (For multi-sampled textures, only the read qualifier is supported.); or write—A graphics or kernel function can write to the texture object.
The depth_format qualifier describes the depth texture format. The only supported value is depth_float. The following example uses these access qualifiers with texture object arguments.
(See the Attribute Qualifiers to Locate Resources section for description of the texture_index attribute qualifier.)
Samplers
The s ampler type identifies how to sample a texture, including the following sampler state:
In one embodiment, an API allows the developer to create a sampler object and pass these as arguments to graphics and kernel functions. A sampler object can also be described in the program source instead of in the API. For these cases we only allow a subset of the sampler state to be specified: the addressing mode, filter mode, and normalized coordinates.
Table 2 (
Table 3 (
The enumeration types used by the sampler data type as described in Table 2 (
The language implements the sampler objects as follows:
Ts must be the enumeration types listed above that can be used by the sampler data type. If the same enumeration type is declared multiple times in a given sampler constructor, the last listed value will take effect.
The following the language program source illustrates several ways to declare samplers. (The attribute qualifiers (sampler_index (n), buffer_index (n), and texture_index (n)) that appear in the code below are explained in the Attribute Qualifiers section.). Note that samplers or constant buffers declared in program source do not need these attribute qualifiers.
NOTE: Samplers that are initialized in the program source must be declared with the constexpr qualifier.
Arrays and Structs
Arrays and structs are fully supported, including arrays of vectors, matrices, textures, and samplers. The texture and sampler types cannot be declared in a struct.
An array of samplers can be passed as an argument to a function or declared in program scope. The sampler array must be a sized array. The sampler index value must be known at compile time; otherwise, the behavior is undefined. When the sampler index value is not known at compile time, the compiler may throw a compilation error.
Some examples of sampler code are below:
An array of texture types can only be passed as arguments to a function. The texture array must be a sized array. The texture index value must be known at compile time; otherwise, the behavior is undefined. When the texture index value is not known at compile time, the language compiler may throw a compilation error.
Alignment of Types
Table 4 (
The alignas alignment specifier can be used to specify the alignment requirement of a type or an object. The alignas specifier may be applied to the declaration of a variable or a data member of a struct or class. It may also be applied to the declaration of a struct, class, or enumeration type.
The language compiler is responsible for aligning data items to the appropriate alignment as required by the data type. For arguments to a graphics or kernel function declared a pointer to a data type, the language compiler can assume that the pointer is always appropriately aligned as required by the data type. The behavior of an unaligned load or store is undefined.
Packed Vector Data Types
The vector data types described in the Vector and Matrix Data Types section are aligned to the size of the vector. There are a number of use cases where developers require their vector data to be tightly packed. For example, a vertex struct that may contain position, normal, tangent vectors and texture coordinates tightly packed and passed as a buffer to a vertex or vertex fetch function.
The packed vector types names supported are:
n is 2, 3, or 4 representing a 2-, 3- or 4-component vector type. Table 5 (
Packed vector data types can only be used as a data storage format. Loads and stores from a packed vector data type to an aligned vector data type and vice-versa, copy constructor and assignment operator are supported. The arithmetic, logical and relational operators are not supported for packed vector data types.
Examples:
Implicit Type Conversions
Implicit conversions between scalar built-in types (except void) are supported. When an implicit conversion is done, it is not just a re-interpretation of the expression's value but a conversion of that value to an equivalent value in the new type. For example, the integer value 5 is converted to the floating-point value 5.0.
All vector types are considered to have a higher conversion rank than scalar types. Implicit conversions from a vector type to another vector or scalar type are not permitted and a compilation error results. For example, the following attempt to convert from a 4-component integer vector to a 4-component floating-point vector fails.
Implicit conversions from scalar-to-vector types and scalar-to-matrix types are supported. The scalar value is replicated in each element of the vector. The scalar value is replicated in all components on the matrix's diagonal with the remaining components initialized to 0. The scalar may also be subject to the usual arithmetic conversion to the element type used by the vector or matrix.
For example:
Implicit conversions from a matrix type to another matrix, vector or scalar type are not permitted and a compilation error results.
Implicit conversions for pointer types follow the rules described in the C++11 Specification.
Type Conversions and Re-interpreting Data
The static_cast operator is used to convert from a scalar or vector type to another scalar or vector type with no saturation and with a default rounding mode (i.e., when converting to floating-point, round to the nearest even number; when converting to integer, round toward zero).
The language adds an as_type<type-id> operator to allow any scalar or vector data type (that is not a pointer) to be reinterpreted as another scalar or vector data type of the same size. The bits in the operand are returned directly without modification as the new type. The usual type promotion for function arguments is not performed.
For example, as_type<float>(0x3f800000) returns 1.0f, which is the value of the bit pattern 0x3f800000 if viewed as an IEEE-754 single precision value. It is an error to use the as_type<type-id> operator to reinterpret data to a type of a different number of bytes.
Examples:
Operators
This chapter lists and describes the language operators.
Scalar and Vector Operators
The arithmetic operators, add (+), subtract (−), multiply (*) and divide (/), operate on scalar and vector, integer and floating-point data types. All arithmetic operators return a result of the same built-in type (integer or floating-point) as the type of the operands, after operand type conversion. After conversion, the following cases are valid:
The two operands are scalars. In this case, the operation is applied, and the result is a scalar.
One operand is a scalar, and the other is a vector. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is performed component-wise, which results in a same size vector.
The two operands are vectors of the same size. In this case, the operation is performed component-wise, which results in a same size vector.
Division on integer types that results in a value that lies outside of the range bounded by the maximum and minimum representable values of the integer type does not cause an exception but results in an unspecified value. Division by zero with integer types does not cause an exception but results in an unspecified value. Division by zero for floating-point types results in ±infinity or NaN, as prescribed by the IEEE-754 standard. (For details about numerical accuracy of floating-point operations, see the section on Numerical Compliance.)
The operator modulus (%) operates on scalar and vector integer data types. All arithmetic operators return a result of the same built-in type (integer or floating-point) as the type of the operands, after operand type conversion. The following cases are valid:
The two operands are scalars. In this case, the operation is applied, and the result is a scalar.
One operand is a scalar, and the other is a vector. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is performed component-wise, which results in a same size vector.
The two operands are vectors of the same size. In this case, the operation is performed component-wise, which results in a same size vector.
The resulting value is undefined for any component computed with a second operand that is zero, while results for other components with non-zero operands remain defined.
If both operands are non-negative, the remainder is non-negative. If one or both operands are negative, results are undefined.
The arithmetic unary operators (+ and −) operate on scalar and vector, integer and floating-point types.
The arithmetic post- and pre-increment and decrement operators (−− and ++) operate on scalar and vector integer types. All unary operators work component-wise on their operands. The result is the same type they operated on. For post- and pre-increment and decrement, the expression must be one that could be assigned to (an 1-value). Pre-increment and pre-decrement add or subtract 1 to the contents of the expression they operate on, and the value of the pre-increment or pre-decrement expression is the resulting value of that modification. Post-increment and post-decrement expressions add or subtract 1 to the contents of the expression they operate on, but the resulting expression has the expression's value before the post-increment or post-decrement was executed.
The relational operators greater than (>), less than (<), greater than or equal (>=), and less than or equal (<=) operate on scalar and vector, integer and floating-point types to test whether any or all elements in the result of a vector relational operator test true. For example, to use in the context of an if ( . . . ) statement, see the any and all built-in functions defined in the section on Relational Functions. The result is a Boolean (bool type) scalar or vector. After operand type conversion, the following cases are valid:
The two operands are scalars. In this case, the operation is applied, resulting in a bool.
One operand is a scalar, and the other is a vector. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is performed component-wise, which results in a Boolean vector.
The two operands are vectors of the same type. In this case, the operation is performed component-wise, which results in a Boolean vector.
The relational operators always return false if either argument is a NaN.
The equality operators, equal (==) and not equal (!=), operate on scalar and vector, integer and floating-point types. All equality operators result in a Boolean (bool type) scalar or vector. After operand type conversion, the following cases are valid:
The two operands are scalars. In this case, the operation is applied, resulting in a bool.
One operand is a scalar, and the other is a vector. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is performed component-wise, resulting in a Boolean vector.
The two operands are vectors of the same type. In this case, the operation is performed component-wise resulting in a same size Boolean vector.
All other cases of implicit conversions are illegal. If one or both arguments is “Not a Number” (NaN), the equality operator equal (==) returns false. If one or both arguments is “Not a Number” (NaN), the equality operator not equal (==) returns true.
The bitwise operators and (&), or (|), exclusive or ({circumflex over ( )}), not (˜) operate on all scalar and vector built-in types except the built-in scalar and vector floating-point types. For built-in vector types, the operators are applied component-wise. If one operand is a scalar and the other is a vector, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is performed component-wise resulting in a same size vector.
The logical operators and (&&), or (∥) operate on two Boolean expressions. The result is a scalar or vector Boolean.
The logical unary operator not (!) operates on a Boolean expression. The result is a scalar or vector Boolean.
The ternary selection operator (?:) operates on three expressions (exp1? exp2: exp3). This operator evaluates the first expression exp1, which must result in a scalar Boolean. If the result is true, it selects to evaluate the second expression; otherwise, it evaluates the third expression. Only one of the second and third expressions is evaluated. The second and third expressions can be any type, as long their types match, or there is a conversion in section 0 that can be applied to one of the expressions to make their types match, or one is a vector and the other is a scalar in which case the scalar is widened to the same type as the vector type. This resulting matching type is the type of the entire expression.
The ones' complement operator (˜). The operand must be of a scalar or vector integer type, and the result is the ones' complement of its operand.
The operators right-shift (>>), left-shift (<<) operate on all scalar and vector integer types. For built-in vector types, the operators are applied component-wise. For the right-shift (>>), left-shift (<<) operators, if the first operand is a scalar, the rightmost operand must be a scalar. If the first operand is a vector, the rightmost operand can be a vector or scalar.
The result of E1<<E2 is E1 left-shifted by log 2(N) least significant bits in E2 viewed as an unsigned integer value, where N is the number of bits used to represent the data type of E1, if E1 is a scalar, or the number of bits used to represent the type of E1 elements, if E1 is a vector. The vacated bits are filled with zeros.
The result of E1>>E2 is E 1 right-shifted by log 2(N) least significant bits in E2 viewed as an unsigned integer value, where N is the number of bits used to represent the data type of E1, if E1 is a scalar, or the number of bits used to represent the type of E1 elements, if E1 is a vector. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the vacated bits are filled with zeros. If E1 has a signed type and a negative value, the vacated bits are filled with ones.
The assignment operator behaves as described by the C++11 Specification. For the lvalue=expression assignment operation, if expression is a scalar type and lvalue is a vector type, the scalar is converted to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is performed component-wise, which results in a same size vector.
NOTE: Operators not described above that are supported by C++11 (such as sizeof (T), unary (&) operator, and comma (,) operator) behave as described in the C++11 Specification.
Matrix Operators
The arithmetic operators add (+), subtract (−) operate on matrices. Both matrices must have the same numbers of rows and columns. The operation is done component-wise resulting in the same size matrix. The arithmetic operator multiply (*), operates on: a scalar and a matrix, a matrix and a scalar, a vector and a matrix, a matrix and a vector, or a matrix and a matrix.
If one operand is a scalar, the scalar value is multiplied to each component of the matrix resulting in the same size matrix. A right vector operand is treated as a column vector and a left vector operand as a row vector. For vector—matrix, matrix—vector and matrix—matrix multiplication, the number of columns of the left operand is required to be equal to the number of rows of the right operand. The multiply operation does a linear algebraic multiply, yielding a vector or a matrix that has the same number of rows as the left operand and the same number of columns as the right operand.
The examples below presume these vector, matrix, and scalar variables are initialized:
The following matrix-to-scalar multiplication
is equivalent to:
The following vertex-to-matrix multiplication
is equivalent to:
The following matrix-to-vertex multiplication
is equivalent to:
The following matrix-to-matrix multiplication
r=m*n;
is equivalent to:
Functions, Variables, and Qualifiers
This chapter describes how functions, arguments, and variables are declared. It also details how qualifiers are often used with functions, arguments, and variables to specify restrictions.
Function Qualifiers
The language supports the following qualifiers that restrict how a function may be used:
kernel—A data-parallel compute kernel.
vertex_fetch—A fetch shader that reads from resources and returns per-vertex inputs to a vertex shader.
vertex—A vertex shader that is executed for each vertex in the vertex stream and generates per-vertex output.
fragment—A fragment shader that is executed for each fragment in the fragment stream and their associated data and generates per-fragment output.
A vertex shader can also read from resources (buffers and textures) similar to a vertex fetch shader. The vertex fetch shader allows developers to decouple the data types used to declare the per-vertex inputs in the vertex shader from the data types used to declare the vertex data read from resources (such as vertex buffers). By decoupling these data types, a vertex shader can be paired with one or more fetch shaders and vice-versa. When a pipeline object is created using an appropriate API, a vertex fetch shader is optional.
A function qualifier is used at the start of a function, before its return type. The following example shows the syntax for a compute function.
For functions declared with the kernel qualifier, the return type must be void.
Only a graphics function can be declared with one of the vertex_fetch, vertex or fragment qualifiers. For graphics functions, the return type identifies whether the output generated by the function is either per-vertex or per-fragment. The return type for a graphics function (except vertex_fetch functions) may be void indicating that the function does not generate output.
Functions that use a kernel, vertex_fetch, vertex, or fragment function qualifier cannot call functions that also use these qualifiers, or a compilation error results.
Address Space Qualifiers for Variables and Arguments
The language implements address space qualifiers to specify the region of memory where a function variable or argument is allocated. These qualifiers describe disjoint address spaces for variables:
For more details on each, see the relevant section below. All arguments to a graphics or kernel function that are a pointer to a type must be declared with an address space qualifier. For graphics functions, an argument that is a pointer or reference to a type must be declared in the global or constant address space. For kernel functions, an argument that is a pointer or reference to a type must be declared in the global, local, or constant address space. The following example introduces the use of several address space qualifiers. (The local qualifier is supported here for the pointer l_data only if foo is called by a kernel function, as detailed below.
The address space for a variable at program scope must be constant.
Any variable that is a pointer or reference must be declared with one of the address space qualifiers discussed in this section. If an address space qualifier is missing on a pointer or reference type declaration, a compilation error occurs.
global Address Space
The global address space name refers to buffer memory objects allocated from the global memory pool that are both readable and writeable.
A buffer memory object can be declared as a pointer or reference to a scalar, vector or user-defined struct. The actual size of the buffer memory object is determined when the memory object is allocated via appropriate API calls in the host code.
Some examples are:
Since texture objects are always allocated from the global address space, the global address qualifier is not needed for texture types. The elements of a texture object cannot be directly accessed. Functions to read from and write to a texture object are provided.
local Address Space
The local address space name is used for variables inside a kernel function that need to be allocated in local memory and are shared by all work-items of a work-group. Variables declared in the local address space cannot be used in graphics functions.
Variables allocated in the local address space inside a kernel function are allocated for each work-group executing the kernel and exist only for the lifetime of the work-group that is executing the kernel.
Variables declared in the local address space inside a kernel function must occur at function scope. In the function example below, the floating-point variable a and the array b are properly allocated in the local address space. However, the floating-point variable c is not declared at function scope, so its declaration is not allowed there. (The qualifier [[local_index (0)]] in the code below is explained in more detail below.)
Variables allocated in the local address space inside a kernel function cannot be declared and initialized at the same time. In the following example, the variable a is improperly initialized during its declaration, but variable b has its value properly set afterwards.
constant Address Space
The constant address space name must be used for variables in program scope, which are allocated in global memory and are accessed inside functions as read-only variables.
Variables in program scope have the same lifetime as the program, and their values persist between calls to any of the compute or graphics functions in the program. In a compute kernel function, read-only variables can be accessed by all (global) work-items of the kernel during its execution.
Variables in program scope must be declared in the constant address space and initialized during the declaration statement. The values used to initialize them must be a compile-time constant.
Pointers or references to the constant address space are allowed as arguments to functions.
Writing to variables declared in the constant address space is a compile-time error. Declaring such a variable without initialization is also a compile-time error.
private Address Space
Variables declared inside a function are in the private address.
Function Arguments and Variables
All inputs, except for initialized variables in the constant address space and samplers declared in program scope, and outputs to a graphics and kernel functions are passed as arguments. Arguments to graphics (vertex and fragment) and kernel functions can be one of the following:
global buffer—a pointer or reference to any data type in the global address space (see the Buffers section above)
constant buffer—a pointer or reference to any data type in the constant address space (see the Buffers section above)
texture object (see the Textures section above)
sampler object (see the Samplers section above)
local buffer (can only be used as arguments with kernel functions)—a pointer to a type in the local address space
arrays of constant buffers, global buffers, textures or samplers.
Buffers (global and constant) specified as argument values to a graphics or kernel function cannot alias, i.e. a buffer passed as an argument value cannot overlap another buffer passed to a separate argument of the same graphics or kernel function.
The arguments to these functions are often specified with attribute qualifiers to provide further guidance on their use. Attribute qualifiers are used to specify:
the resource location for the argument (see the Attributes Qualifiers to Locate Resources section below),
built-in variables that support communicating data between fixed-function and programmable pipeline stages (see the Attributes Qualifiers for Built-in Variables section below),
which data is sent down the pipeline from vertex function to fragment function (see the stage_in Qualifier section below).
Attribute Qualifiers to Locate Resources
For each argument, an attribute qualifier must be specified to identify the location of a resource to use for this argument type. In one embodiment, a framework API uses this attribute to identify the location for the resource.
global and constant buffers—[[buffer_index (slot)]]
texture—[[texture_index(slot)]]
sampler—[[sampler_index(slot)]]
local buffer—[[local_index(slot)]]
arrays—[[buffer_index(slot, count)]], [[texture_index(slot, count)]], or [[sampler_index(slot, count)]].
The slot value is an unsigned integer that identifies the location of a resource that is being assigned. The proper syntax is for the attribute qualifier to follow the argument/variable name.
NOTE: The resource locations are shared between a vertex function and associated vertex fetch functions.
The example below is a simple kernel function, add_vectors, that adds an array of two buffers in global address space, inA and inB, and returns the result in the buffer out. The attribute qualifiers (buffer_index (slot)) specify the resource locations for the function arguments.
The example below shows attribute qualifiers used for function arguments of several different types (a buffer, a texture, and a sampler):
The example below shows attribute qualifiers for function arguments that are array types (a buffer array, a texture array and a sampler array):
Vertex function example that specifies resources and outputs to global memory
The following example is a vertex function, render_vertex, which outputs to global memory in the array xform_pos_output, which is a function argument specified with the global qualifier (which was introduced in the global address space section above). All the render_vertex function arguments are specified with resource qualifiers (buffer_index (0), buffer_index (1), buffer_index (2), and buffer_index (3)), as introduced in the Attribute Qualifiers to Locate Resources section above. (The position qualifier shown in this example is discussed in the Attribute Qualifiers for Built-in Variables section below.)
Attribute Qualifiers for Built-in Variables
Some graphics operations occur in the fixed-function pipeline stages and need to provide values to or receive values from graphics functions. Built-in input and output variables are used to communicate values between the graphics (vertex and fragment) functions and the fixed-function graphics pipeline stages. Attribute qualifiers are used with arguments and the return type of graphics functions to identify these built-in variables.
Attribute Qualifiers for Vertex or Vertex Fetch Function Input
Table 6 (
Attribute Qualifiers for Vertex Function Output
Table 7 (
The example below describes a vertex function called process_vertex. The function returns a user-defined struct called VertexOutput, which contains a built-in variable that represents the vertex position, so it requires the [[position]] qualifier.
Attribute Qualifiers for Fragment Function Input
Table 8 (
Note: A vertex function must output a return type that is declared with the position qualifier if there is an associated fragment function.
A variable declared with the [[position]] attribute as input to a fragment function can use one of the following sampling and interpolation qualifiers: center_no_perspective or centroid_no_perspective. For [[color (m)] ], m is used to specify the color attachment index when accessing (reading or writing) multiple color attachments in a fragment function. m is optional and can be a value from 0 to 7. If m is not specified, the color attachment index starts at 0. If there is only a single color attachment in a fragment function, then m cannot be used. (See examples of specifying the color attachment in the sections on Per-Fragment Functions and Programmable Blending below.)
Attribute Qualifiers for Fragment Function Output
The return type of a fragment function describes the per-fragment output. A fragment function can output one or more render-target color values, a depth value, and a coverage mask, which must be identified by using the attribute qualifiers listed in Table 9 (
The color attachment index m for fragment output is specified in the same way as it is for [[color (m)]] for fragment input (see discussion for Table 8 (
If a fragment function writes a depth value, the depth_qualifier must be specified with one of the following values:
The following example shows how color attachment indices can be specified. Color values written in clr_f write to color attachment index 0, clr_i to color attachment index 1, and clr_ui to color attachment index 2.
Attribute Qualifiers for Kernel Function Input
Table 10 (
Notes on kernel function attribute qualifiers:
Either the type used to declare [[global_id] ], [[global_size] ], [[local_id] ], [[local_size]] and [[thread_group_id]] must be a scalar type or a vector type. If it is a vector type, the number of components for the vector types used to declare these arguments must match. The data types used to declare [[global_id]] and [[global_size]] must match. The data types used to declare [[local_id]] and [[local_size]] must match. If [[local_id]] or [[local_size]] is declared to be of type uint, uint2 or uint3, [[linear_local_id]] must be declared to be of type uint.
stage_in Qualifier
The per-fragment inputs to a fragment function are generated using the output from a vertex function and the fragments generated by the rasterizer. Similarly, the per-vertex inputs to a vertex function can be generated using the output from a vertex fetch function. The per-fragment or per-vertex inputs:
must be the first argument to the fragment or vertex function,
and must be identified using the [[stage_in]] attribute qualifier.
Only one argument of the fragment or vertex function can be declared with the stage_in qualifier. For a user-defined struct declared with the stage_in qualifier, the members of the struct can be: a scalar integer or a scalar floating-point value, a vector of integer or floating-point values, a matrix of integer or floating-point values, or an array of scalars, vectors or matrices (that are or contain integer or floating-point values).
For a complete example of the use of the stage_in qualifier, see below.
Vertex and vertex fetch function example that uses the stage_in qualifier
The following example defines the vertex fetch function fetch_vertex that reads color and position data. fetch_vertex first unpacks and converts the color data into a 4-component vector of half-precision floating-point values. fetch_vertex ultimately returns output (a VertexInput struct per vertex) that is pipelined to be the first argument of the vertex function render_vertex, which uses the [[stage_in]] qualifier.
Fragment Function Example that Uses the stage_in Qualifier
An example in a section above previously introduces the process_vertex vertex function, which returns a VertexOutput struct per vertex. In the following example, the output from process_vertex is pipelined to become input for a fragment function called render_pixel, so the first argument of the fragment function uses the [[stage_in]] qualifier and must also be of the incoming VertexOutput type. (In render_pixel, the imgA and imgB 2D textures call the built-in function sample, which is introduced in the section on 2D Texture Functions below.)
Storage Class Specifiers
The language supports the static and extern storage class specifiers. The language does not support the thread_local storage class specifiers. The extern storage-class specifier can only be used for functions and variables declared in program scope or variables declared inside a function. The static storage-class specifier is only for global variables declared in program scope (see the constant address space section) and is not for local variables in graphics or kernel functions. In the following example, the static specifier is incorrectly used by the local variables b and c in a kernel function.
Sampling and Interpolation Qualifiers
Sampling and interpolation qualifiers are only used for return types of vertex functions and arguments to fragment functions. The qualifier determines what sampling method the fragment function uses and how the interpolation is performed, including whether to use perspective-correct interpolation, linear interpolation, or no interpolation.
The sampling and interpolation qualifier can be specified on any structure member declared with the stage_in qualifier. The sampling and interpolation qualifiers supported are:
The following example is user-defined struct that specifies how data in certain members are interpolated:
For integer and double types, the only valid interpolation qualifier is flat.
The sampling qualifier variants (sample_perspective and sample_no_perspective) interpolate at a sample location rather than at the pixel center. With one of these qualifiers, the fragment function or code blocks in the fragment function that use these variables execute per-sample rather than per-fragment.
Per-Fragment Function vs. Per-Sample Function
The fragment function is typically executed per-fragment. The sampling qualifier identifies if any fragment input is to be interpolated at per-sample vs. per-fragment. Similarly, the [[sample_id]] attribute is used to identify the current sample index and the [[color (m)]] attribute is used to identify the destination fragment color or sample color (for a multi-sampled color attachment) value. If any of these qualifiers are used with arguments to a fragment function, the fragment function may execute per-sample instead of per-pixel. The implementation may decide to only execute the code that depends on the per-sample values to execute per-sample and the rest of the fragment function may execute per-fragment. Should be executed on a per-sample basis.
Only the inputs with sample specified (or declared with the [[sample_id]] or [[color (m)]] qualifier) differ between invocations per-fragment or per-sample, whereas other inputs still interpolate at the pixel center.
The following example uses the [[color]] attribute to specify that this fragment function should be executed on a per-sample basis.
Programmable Blending
The fragment function can be used to perform per-fragment or per-sample programmable blending. The color attachment index identified by the [[color (m)]] attribute qualifier can be specified as an argument to a fragment function.
Below is a programmable blending example according to one embodiment:
Below is a programming blending example for a fragment function according to one embodiment:
Graphics Function—Signature Matching
A graphics function signature is a list of parameters that are either input to or output from a graphics function.
Vertex—Fragment Signature Matching
There are two kinds of data that can be passed between a vertex and fragment function: user-defined and built-in variables.
The per-instance input to a fragment function is declared with the [[stage_in]]
qualifier. These are output by an associated vertex function.
Built-in variables are declared with one of the attribute qualifiers defined in section 4.3.2. These are either generated by a vertex function (such as [[position]], [[point_size]], [[clip_distance]]), are generated by the rasterizer (such as [[point_coord]], [[front_facing]], [[sample_id]], [[sample_mask]]) or refer to a framebuffer color value (such as [[color]]) passed as an input to the fragment function.
The built-in variable [[position]] must always be returned. The other built-in variables ([[point_size]], [[clip_distance]]) generated by a vertex function, if needed, must be declared in the return type of the vertex function but cannot be accessed by the fragment function.
Built-in variables generated by the rasterizer or refer to a framebuffer color value may also declared as arguments of the fragment function with the appropriate attribute qualifier.
The attribute [[user(name)]] syntax can also be used to specify an attribute name for any user-defined variables.
A vertex and fragment function are considered to have matching signatures if:
There is no input argument with the [[stage_in]] qualifier declared in the fragment function.
For a fragment function argument declared with [[stage_in]], each element in the type associated with this argument can be one of the following: a built-in variable generated by the rasterizer, a framebuffer color value passed as input to the fragment function, or a user-generated output from a vertex function. For built-in variables generated by the rasterizer or framebuffer color values, there is no requirement for a matching type to be associated with elements of the vertex return type. For elements that are user-generated outputs, the following rules apply:
If the attribute name given by [[user(name)]] is specified for an element, then this attribute name must match with an element in the return type of the vertex function, and their corresponding data types must also match.
If the [[user(name)]] attribute name is not specified, then the argument name and types must match.
Below is an example of compatible signatures:
my_vertex_shader and my_fragment_shader or my_vertex_shader and my_fragment_shader2 can be used together to render a primitive. Below is another example of compatible signatures:
Below is another example of compatible signatures:
Below is another example of compatible signatures:
Below is an example of incompatible signatures:
Below is another example of incompatible signatures:
Vertex Fetch—Vertex Signature Matching
A vertex fetch and a vertex function are considered to have matching signatures if the type of the argument to the vertex function declared with the [[stage_in]] qualifier matches the return type of the vertex fetch function. The following restrictions apply:
The return type of a vertex fetch function cannot specify any built-in vertex variables (such as [[position]], [[point_size]], [[clip_distance] ]). Any input argument to a vertex fetch function cannot be declared with the [[stage_in]] qualifier.
Additional Attribute Qualifiers
The following additional attributes are supported by the language (besides the ones described in earlier sections).
The [[early_fragment_tests]] qualifier allows fragment functions to enable early fragment tests. If this attribute is specified with a fragment function, the per-fragment tests are performed prior to fragment function execution. Otherwise they are performed after fragment function execution. Fragment functions declared with the [[early_fragment_tests]] qualifier cannot output a depth value. The return type of the fragment function cannot contain an element declared with the [[depth (depth_qualifier)]] qualifier, or else a compilation error results.
If the work-group size is specified when the kernel is enqueued, then that work-group size is used. If the work-group size is not specified when a given kernel is enqueued, the [[thread_group_size (x, y, z)]] qualifier specifies the work-group size to use for the kernel.
Additional Notes
Writes to a buffer or a texture by a fragment shader that has been invoked to process fragments or samples not covered by a primitive being rasterized have no effect. This can happen to help calculate derivatives for texture lookups for example.
The Standard Library
This chapter describes the functions supported by the standard library for the programming language according to one embodiment.
Namespace and Header Files
The standard library functions and enums are declared in a language-specific namespace. In addition to the header files described in the standard library functions, the <language_stdlib>header is available and can access all the functions supported by the standard library. (The term “language” in “language_stdlib” may be replaced by a name for the compiler in one embodiment.)
Common Functions
The functions in Table 11 (
Integer Functions
The integer functions in Table 12 (
Relational Functions
The relational functions in Table 13 (
Math Functions
The math functions in Table 14 (
There are two variants of math functions available: the precise and the fast variants.
The -ffast-math compiler option (refer to the Compiler Options section below) can be used to specify which variant to use when compiling source in the programming language. The precise and fast nested namespaces are also available to allow developers to explicitly select the fast or precise variant of these math functions.
Examples:
#include <language_stdlib>
using namespace language;
float x;
float a=fast::sin(x);// use fast version of sin( )
float b=precise::cos(x); // use precise version of cos( )
Matrix Functions
The functions in Table 15 (
Example:
Geometric Functions
The functions in Table 16 (
Compute Functions
The compute functions in this section and its subsections can only be called from a kernel function and are defined in the header <language_compute>. (The term “language” in “language_compute” may be replaced by a name for the compiler in one embodiment.)
Thread-Group Synchronization Functions
The work-group function in Table 17 (
The thread_group_barrier function must be encountered by all work-items in a work-group executing the kernel.
If thread_group_barrier is inside a conditional statement and if any work-item enters the conditional statement and executes the barrier, then all work-items must enter the conditional and execute the barrier.
If thread_group_barrier is inside a loop, for each iteration of the loop, all work-items must execute the thread_group_barrier before any work-items are allowed to continue execution beyond the thread_group_barrier. The thread_group_barrier function also queues a memory fence (reads and writes) to ensure correct ordering of memory operations to local or global memory.
The mem_flags argument in thread_group_barrier is a bit-field that can be set to one or more of the following flags, as described in Table 18 (
Graphics Functions
This section and its subsections list the set of graphics functions that can be called by a fragment and vertex functions. These are defined in the header <language_graphics>. (The term “language” in “language_graphics” may be replaced by a name for the compiler in one embodiment.)
Fragment Functions
The functions in this section (listed in Table 19 (
Fragment Functions—Derivatives
The language includes the functions in Table 19 (
Fragment Functions—Samples
The language includes the following per-sample functions in Table 20 (
get_num_samples and get_sample_position return the number of samples for the color attachment and the sample offsets for a given sample index. For example, this can be used to shade per-fragment but do the alpha test per-sample for transparency super-sampling.
Fragment Functions—Flow Control
The language function in Table 21 (
Texture Functions
The texture functions are categorized into: sample from a texture, read (sampler-less read) from a texture, gather from a texture, write to a texture, and texture query functions.
These are defined in the header <language_texture>. (The term “language” in “language_texture” may be replaced by a name for the compiler in one embodiment.) The texture sample, sample_compare, gather, and gather_compare functions take an offset argument for a 2D texture, 2D texture array and 3D texture. The offset is an integer value that is applied to the texture coordinate before looking up each texel. This integer value can be in the range −8 to +7. The default value is 0.
Overloaded variants of texture sample and sample_compare functions for a 2D texture, 2D texture array, 3D texture, cubemap and cubemap array are available and allow the texture to be sampled using a bias that is applied to a mip-level before sampling or with user-provided gradients in the x and y direction.
NOTE: The texture sample, sample_compare, gather, and gather_compare functions require that the texture is declared with the sample access qualifier. The texture read functions require that the texture is declared with the sample or read access qualifier. The texture write functions require that the texture is declared with the write access qualifier.
1D Texture
The following built-in functions can be used to sample from a 1D texture.
The following built-in functions can be used to perform sampler-less reads from a 1D texture:
vec<T,4>read(uint coord, uint lod=0) const
The following built-in functions can be used to write to a specific mip-level of a 1D texture.
The following built-in 1D texture query functions are provided.
1D Texture Array
The following built-in functions can be used to sample from a 1D texture array.
The following built-in functions can be used to perform sampler-less reads from a 1D texture array:
The following built-in functions can be used to write to a specific mip-level of a 1D texture array.
The following built-in 1D texture array query functions are provided.
2D Texture
The following data types and corresponding constructor functions are available to specify various sampling options:
The following built-in functions can be used to sample from a 2D texture.
lod_options must be one of the following types: bias, level, or gradient2d.
The following built-in functions can be used to perform sampler-less reads from a 2D texture:
The following built-in functions can be used to write to a 2D texture.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a 2D texture.
The following built-in 2D texture query functions are provided.
2D Texture Sampling Example
The following code shows several uses of the 2D texture sample function, depending upon its arguments.
2D Texture Array
The following built-in functions can be used to sample from a 2D texture array.
lod_options must be one of the following types: bias, level, or gradient2d. The following built-in functions can be used to perform sampler-less reads from a 2D texture array:
The following built-in functions can be used to write to a 2D texture array.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a 2D texture array.
The following built-in 2D texture array query functions are provided.
3D Texture
The following data types and corresponding constructor functions are available to specify various sampling options:
The following built-in functions can be used to sample from a 3D texture.
lod_options must be one of the following types: bias, level, or gradient3d. The following built-in functions can be used to perform sampler-less reads from a 3D texture:
The following built-in functions can be used to write to a 3D texture.
The following built-in 3D texture query functions are provided.
Cube-Map Texture
The following data types and corresponding constructor functions are available to specify various sampling options:
The following built-in functions can be used to sample from a cube-map texture.
lod_options must be one of the following types: bias, level, or gradientcube. The following built-in functions can be used to write to a cube-map texture.
NOTE: Table 22 (
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a cube-map texture.
The following built-in cube-map texture query functions are provided.
Cube-Map Texture Array [Optional]
The following built-in functions can be used to sample from a cube-map texture array.
lod_options must be one of the following types: bias, level, or gradientcube. The following built-in functions can be used to write to a cube-map texture array.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a cube-map texture array.
The following built-in cube-map texture array query functions are provided.
2D Multi-sampled Texture
The following built-in functions can be used to perform sampler-less reads from a 2D multi-sampled texture:
The following built-in 2D multi-sampled texture query functions are provided.
2D Depth Texture
The following data types and corresponding constructor functions are available to specify various sampling options:
The following built-in functions can be used to sample from a 2D depth texture.
lod_options must be one of the following types: bias, level or gradient2d. The following built-in functions can be used to sample from a 2D depth texture and compare a single component against the specified comparison value
lod_options must be one of the following types: bias, level or gradient2d. T must be afloat type. The following built-in functions can be used to perform sampler-less reads from a 2D depth texture:
The following built-in functions can be used to write to a 2D depth texture.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a 2D depth texture.
The following built-in functions can be used do a gather of four samples that would be used for bilinear interpolation when sampling a 2D depth texture and comparing these samples with a specified comparison value.
T must be a float type. The following built-in 2D depth texture query functions are provided.
2D Depth Texture Array
The following built-in functions can be used to sample from a 2D depth texture array.
lod_options must be one of the following types: bias, level or gradient2d. The following built-in functions can be used to sample from a 2D depth texture array and compare a single component against the specified comparison value
lod_options must be one of the following types: bias, level or gradient2d. T must be afloat type. The following built-in functions can be used to perform sampler-less reads from a 2D depth texture array:
The following built-in functions can be used to write to a 2D depth texture array.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a 2D depth texture array.
The following built-in functions can be used do a gather of four samples that would be used for bilinear interpolation when sampling a 2D depth texture array and comparing these samples with a specified comparison value.
T must be a float type. The following built-in 2D depth texture array query functions are provided.
Cube-Map Depth Texture
The following data types and corresponding constructor functions are available to specify various sampling options:
The following built-in functions can be used to sample from a cube-map depth texture.
lod_options must be one of the following types: bias, level or gradientcube. The following built-in functions can be used to sample from a cube-map depth texture and compare a single component against the specified comparison value
lod_options must be one of the following types: bias, level or gradientcube. T must be a float type. The following built-in functions can be used to write to a cube-map depth texture.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a cube-map depth texture.
The following built-in functions can be used do a gather of four samples that would be used for bilinear interpolation when sampling a cube-map texture and comparing these samples with a specified comparison value.
T must be a float type. The following built-in cube-map depth texture query functions are provided.
Cube-Map Depth Texture Array [Optional]
The following built-in functions can be used to sample from a cube-map depth texture array.
lod_options must be one of the following types: bias, level, or gradientcube. The following built-in functions can be used to sample from a cube-map depth texture array and compare a single component against the specified comparison value
lod_options must be one of the following types: bias, level, or gradientcube. T must be a float type. The following built-in functions can be used to write to a cube-map depth texture array.
The following built-in functions can be used to do a gather of four samples that would be used for bilinear interpolation when sampling a cube-map depth texture array.
The following built-in functions can be used do a gather of four samples that would be used for bilinear interpolation when sampling a cube-map depth texture array and comparing these samples with a specified comparison value.
T must be a float type. The following built-in cube-map depth texture array query functions are provided.
2D Multi-sampled Depth Texture
The following built-in functions can be used to perform sampler-less reads from a 2D multi-sampled depth texture:
The following built-in 2D multi-sampled depth texture query functions are provided.
Pack and Unpack Functions
This section lists the language functions for converting a vector floating-point data to and from a packed integer value. The functions are defined in the header <language_pack>. (The term “language” in “language_pack” may be replaced by a name for the compiler in one embodiment.) Refer to conversion sections below for details on how to convert from a 8-bit, 10-bit or 16-bit signed or unsigned integer value to a normalized single- or half-precision floating-point value and vice-versa.
Unpack Integer(s); Convert to a Floating-Point Vector
Table 23 (
Convert Floating-Point Vector to Integers, then Pack the Integers
Table 24 (
Atomic Functions
The programming language implements a subset of the C++11 atomics and synchronization operations. For atomic operations, only a memory_order of memory_order_relaxed is supported in one embodiment. The memory scope of these atomic operations is a work-group if the atomic operation is to local memory and is a device if the atomic operation is to global memory.
There are only a few kinds of operations on atomic types, although there are many instances of those kinds This section specifies each general kind
These are defined in the header <language_atomic>. (The term “language” in “language_atomic” may be replaced by a name for the compiler in one embodiment.) NOTE: For iOS, atomic operations to global memory can only be performed inside a kernel function (a function declared with the kernel qualifier) or inside a function called from a kernel function. For OS X, atomic operations to global memory can be performed by graphics and kernel functions.
The memory_order enum is defined as:
enum memory_order {memory_order_relaxed};
Only memory_order_relaxed is supported in one embodiment of the language.
Atomic Store Functions
These functions atomically replace the value pointed to by obj (*obj) with desired.
Atomic Load Functions
These functions atomically obtain the value pointed to by obj.
Atomic Exchange Functions
These functions atomically replace the value pointed to by obj with desired and return the value obj previously held.
Atomic Compare and Exchange functions
These functions atomically compare the value pointed to by obj with the value in expected. If those values are equal, the function replaces *obj with desired (by performing a read-modify-write operation). The function returns the value obj previously held.
Atomic Fetch and Modify functions
The following operations perform arithmetic and bitwise computations. All of these operations are applicable to an object of any atomic type. The key, operator, and computation correspondence is given in Table 25 (
Atomically replaces the value pointed to by obj with the result of the computation of the value specified by key and arg. These operations are atomic read-modify-write operations. For signed integer types, arithmetic is defined to use two's complement representation with silent wrap-around on overflow. There are no undefined results. Returns the value obj held previously.
Compiler Options
The language compiler can be used online (i.e. using the appropriate APIs to compile the language sources) or offline. The language sources compiled offline can be loaded as binaries, using the appropriate APIs.
This chapter explains the compiler options supported by the language compiler according to one embodiment, which are categorized as pre-processor options, options for math intrinsics, options that control optimization, and miscellaneous options. The online and offline compiler support these options.
Pre-Processor Options
These options control a preprocessor that is run on each program source before actual compilation.
Predefine name as a macro, with definition 1.
The contents of definition are tokenized and processed as if they appeared in a #define directive. This option may receive multiple options, which are processed in the order in which they appear. This option allows developers to compile the language code to change which features are enabled or disabled.
-I dir—Add the directory di r to the list of directories to be searched for header files. This option is only available for the offline compiler.
Math Intrinsics Options
These options control compiler behavior regarding floating-point arithmetic. These options tradeoff between speed and correctness.
These options control how single precision, half precision and double precision denormalized numbers are handled. By default denorms are flushed to zero. These compiler options enable or disable (default) denorm support.
The -fdenorms option is ignored for single precision numbers if the device does not support single precision denormalized numbers. This option is ignored for half precision numbers if the device does not support half precision denormalized numbers.
This option is ignored for double precision numbers if the device does not support double precision
These options only apply to scalar and vector floating-point variables and computations on these floating-point variables inside a program. They do not apply to sampling, reading from or writing to texture objects.
These options enable (default) or disable the optimizations for floating-point arithmetic that may violate the IEEE 754 standard. They also enable or disable the high precision variant of math functions for single precision floating-point scalar and vector types.
Options Controlling the Language Version
The following option controls the version of unified graphics/compute language that the compiler accepts.
-std=
Determine the language revision to use. A value for this option must be provided. The possible values are:
ios-language10—support the unified graphics / compute language revision 1.0 programs for iOS.
osx-language10—support the unified graphics/compute language revision 1.0 programs for Mac OS X.
(The term “language” above may be replaced by a name for the compiler in one embodiment.)
Optional Features
When compiling the language source for a device, the following compiler options are used to identify the optional features to be enabled. These optional features are disabled by default.
Enable or disable (default) support for double-precision type and floating-point arithmetic operations.
If -std=ios-language10 the following optional features are available:
Framebuffer fetch which allows you to read colors from a color attachment in a fragment shader.
If -std=osx-language10 the following optional features are available:
Cubemap Arrays
Numerical Compliance
This chapter covers how the language represents floating-point numbers with regard to accuracy in mathematical operations. The language in one embodiment is compliant to a subset of the IEEE 754 standard.
INF, NaN and Denormalized Numbers
INF and NaNs must be supported for single precision and double precision floating-point numbers and are optional for half precision floating-point numbers. Support for signaling NaNs is not required.
Support for denormalized numbers with single precision and half precision floating-point is optional. Denormalized single or half precision floating-point numbers passed as input or produced as the output of single or half precision floating-point operations may be flushed to zero.
Double precision floating-point support is optional. If double precision is supported, support for denormalized numbers with double precision floating-point is required.
Rounding Mode
Either round to zero or round to nearest rounding mode may be supported for single precision and half precision floating-point operations. For devices that have full support for compute, round to nearest rounding mode is required for single precision floating-point operations.
For double precision floating-point, round to nearest rounding mode is required.
Floating-point Exceptions
Floating-point exceptions are disabled in the language. The result of a floating-point exception must match the IEEE 754 specification for the exceptions not enabled case.
Relative Error as ULPs
Table 26 (
Table 27 (
Edge Case Behavior in Flush To Zero Mode
If denormals are flushed to zero, then a function may return one of four results:
(1) Any conforming result for non-flush-to-zero mode.
(2) If the result given by (1) is a subnormal before rounding, it may be flushed to zero.
(3) Any non-flushed conforming result for the function if one or more of its subnormal operands are flushed to zero.
(4) If the result of (3) is a subnormal before rounding, the result may be flushed to zero.
In each of the above cases, if an operand or result is flushed to zero, the sign of the zero is undefined.
Texture Addressing and Conversion Rules
The texture coordinates specified to the sample, sample_compare, gather, gather_compare, read, and write functions cannot be INF or NaN. In addition, the texture coordinate must refer to a region inside the texture for the texture read and write functions. In the sections that follow, we discuss conversion rules that are applied when reading and writing textures in a graphics or kernel function.
Conversion rules for normalized integer pixel data types
In this section, we discuss converting normalized integer pixel data types to floating-point values and vice-versa.
Converting normalized integer pixel data types to floating-point values
For textures that have 8-bit, 10-bit or 16-bit normalized unsigned integer pixel values, the texture sample and read functions convert the pixel values from an 8-bit or 16-bit unsigned integer to a normalized single or half-precision floating-point value in the range [0.0 . . . 1.0].
For textures that have 8-bit or 16-bit normalized signed integer pixel values, the texture sample and read functions convert the pixel values from an 8-bit or 16-bit signed integer to a normalized single or half-precision floating-point value in the range [−1.0 . . . 1.0].
These conversions are performed as listed in the second column of Table 28 (
Converting floating-point values to normalized integer pixel data types
For textures that have 8-bit, 10-bit or 16-bit normalized unsigned integer pixel values, the texture write functions convert the single or half-precision floating-point pixel value to an 8-bit or 16-bit unsigned integer.
For textures that have 8-bit or 16-bit normalized signed integer pixel values, the texture write functions convert the single or half-precision floating-point pixel value to an 8-bit or 16-bit signed integer.
The preferred methods to perform conversions from floating-point values to normalized integer values are listed in Table 29 (
The GPU may choose to approximate the rounding mode used in the conversions described above. If a rounding mode other than round to nearest even is used, the absolute error of the implementation dependent rounding mode vs. the result produced by the round to nearest even rounding mode must be<=0.6.
Conversion rules for half precision floating-point pixel data type
For textures that have half-precision floating-point pixel color values, the conversions from half to float are lossless. Conversions from float to half round the mantissa using the round to nearest even or round to zero rounding mode. Denormalized numbers for the half data type which may be generated when converting a float to a half may be flushed to zero. A float NaN must be converted to an appropriate NaN in the half type. A float INF must be converted to an appropriate INF in the half type.
Conversion rules for floating-point channel data type
The following rules apply for reading and writing textures that have single-precision floating-point pixel color values.
NaNs may be converted to a NaN value(s) supported by the device.
Denorms may be flushed to zero.
All other values must be preserved.
Conversion rules for signed and unsigned integer pixel data types
For textures that have 8-bit or 16-bit signed or unsigned integer pixel values, the texture sample and read functions return a signed or unsigned 32-bit integer pixel value. The conversions described in this section must be correctly saturated.
Writes to these integer textures perform one of the conversions listed in Table 30 (
Conversion rules for sRGBA and sBGRA Textures
Conversion from sRGB space to linear space is automatically done when sampling from an sRGB texture. The conversion from sRGB to linear RGB is performed before the filter specified in the sampler specified when sampling the texture is applied. If the texture has an alpha channel, the alpha data is stored in linear color space.
Conversion from linear to sRGB space is automatically done when writing to an sRGB texture. If the texture has an alpha channel, the alpha data is stored in linear color space.
The following is the conversion rule for converting a normalized 8-bit unsigned integer sRGB color value to a floating-point linear RGB color value (call it c) as per rules described above.
The resulting floating point value, if converted back to an sRGB value without rounding to a 8-bit unsigned integer value, must be within 0.5 ulp of the original sRGB value.
The following are the conversion rules for converting a linear RGB floating-point color value (call it c) to a normalized 8-bit unsigned integer sRGB value.
The precision of the above conversion should be such that fabs(reference result−integer result)<=0.6.
It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention therefore should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Number | Name | Date | Kind |
---|---|---|---|
5179702 | Spix | Jan 1993 | A |
5313614 | Goettelmann | May 1994 | A |
5724590 | Goettelmann | Mar 1998 | A |
6058438 | Diehl | May 2000 | A |
6665688 | Callahan, II | Dec 2003 | B1 |
7173623 | Calkins | Feb 2007 | B2 |
7434213 | Prakash | Oct 2008 | B1 |
7659901 | Toelle | Feb 2010 | B2 |
7746347 | Brown | Jun 2010 | B1 |
7800620 | Tarditi, Jr. | Sep 2010 | B2 |
8044951 | Brown | Oct 2011 | B1 |
8149242 | Langyel | Apr 2012 | B2 |
8274517 | Boyd | Sep 2012 | B2 |
8477143 | Harper | Jul 2013 | B2 |
8566537 | Ni | Oct 2013 | B2 |
8595701 | Li | Nov 2013 | B2 |
20030056083 | Bates | Mar 2003 | A1 |
20040160446 | Gosalia | Aug 2004 | A1 |
20050122330 | Boyd | Jun 2005 | A1 |
20050237330 | Stauffer | Oct 2005 | A1 |
20060012604 | Seetharamaiah | Jan 2006 | A1 |
20060080677 | Louie | Apr 2006 | A1 |
20060098018 | Tarditi, Jr. | May 2006 | A1 |
20070033572 | Donovan | Feb 2007 | A1 |
20070294666 | Papakipos | Dec 2007 | A1 |
20080001952 | Srinivasan | Jan 2008 | A1 |
20080303833 | Swift | Dec 2008 | A1 |
20090125894 | Nair | May 2009 | A1 |
20090217249 | Kim | Aug 2009 | A1 |
20090284535 | Pelton | Nov 2009 | A1 |
20090307699 | Munshi | Dec 2009 | A1 |
20100047510 | Couvillion | Feb 2010 | A1 |
20100277486 | Bhoovaraghavan | Nov 2010 | A1 |
20110004827 | Doerr | Jan 2011 | A1 |
20110063296 | Bolz | Mar 2011 | A1 |
20110087864 | Duluk | Apr 2011 | A1 |
20110246973 | Meijer | Oct 2011 | A1 |
20110314444 | Zhang | Dec 2011 | A1 |
20120131545 | Linebarger | May 2012 | A1 |
20120147021 | Cheng | Jun 2012 | A1 |
20120242672 | Larson | Sep 2012 | A1 |
20130007703 | Auerbach | Jan 2013 | A1 |
20130141443 | Schmit | Jun 2013 | A1 |
20130159630 | Lichmanov | Jun 2013 | A1 |
20130169642 | Frascati | Jul 2013 | A1 |
20130187935 | Wexler | Jul 2013 | A1 |
20130198494 | Grover | Aug 2013 | A1 |
20140040855 | Wang | Feb 2014 | A1 |
20140053161 | Sadowski | Feb 2014 | A1 |
20140337321 | Coyote | Nov 2014 | A1 |
20140354658 | Dotsenko | Dec 2014 | A1 |
20140362093 | Lorach | Dec 2014 | A1 |
20150109293 | Wang | Apr 2015 | A1 |
20150179142 | Lehtinen | Jun 2015 | A1 |
20150221059 | Baker | Aug 2015 | A1 |
20150310578 | You | Oct 2015 | A1 |
20160291942 | Hutchison | Oct 2016 | A1 |
20160350245 | Shen | Dec 2016 | A1 |
Number | Date | Country |
---|---|---|
103262038 | Aug 2013 | CN |
103392171 | Nov 2013 | CN |
2012082423 | Jun 2012 | WO |
Entry |
---|
OpenGL Reference Manual, The Official Reference Document for OpenGL, Release 1, 1994, pp. 1-257. |
Lattner, Chris, “The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks: LLVM,” Mar. 2011, Retrieved from the Internet: URL: http://www.aosabook.org/en/llvm.html [retrieved on Apr. 7, 2014]. |
LLVM Language Reference Manual, LLVM Project, Apr. 7, 2014, Retrieved from the Internet: URL: http://llvm.org/docs/LangRef.html [retrieved on Apr. 7, 2014]. |
Kuan-Hsu Chen et al, “An automatic superword vectorization in LLVM,” 2010, In 16th Workshop on Compiler Techniques for High-Performance and Embedded Computing, pp. 19-27, Taipei, 2010. |
“Metal Programming Guide Contents,” Mar. 9, 2015 (Mar. 9, 2015), pp. 1-74, XP055207633, Retrieved from the Internet: URL: https://developer.apple.com/library/ios/documentation/Miscellaneous/Conceptual/MetalProgrammingGuide/MetalProgrammingGuide.pdf [retrieved on Aug. 13, 2015]. |
Chris Laftner, “LLVM & LLVM Bitcode Introduction,” Jan. 1, 2013 (Jan. 1, 2013), XP055206788, Retrieved from the Internet: URL: http://pllab.cs.nthu.edu.tw/cs240402/lectures/lectures_2013/LLVM Bitcode Introduction.pdf [retrived on Aug. 7, 2015]. |
Helge Rhodin, “A PTX Code Generator for LLVM,” Oct. 29, 2010 (Oct. 29, 2010), pp. 1-63, XP055208570, Saarbrucken, Germany, Retrieved from the Internet: URL: http://compilers.cs.uni-saarland.de/publications/theses/rhodin_bsc.pdf [retrieved on Aug. 19, 2015]. |
Ivan Nevraev: “Introduction to Direct3D 12”, Apr. 4, 2014 (Apr. 4, 2014), pp. 1-43, XP55203398, Retrieved from the Internet: URL:http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CDEQFjABahUKEwiCmsP_w-nGAhUKPhQKHcqZAP8&url=http%3A%2F%2Famd-dev.wpengine.netdna-cdn.com%2Fwordpress%2Fmedia%2F2012%2F10%2FIntroduction-To-DX12-Ivan-Nevraev. |
Matt Sandy: “DirectX 12”, Mar. 20, 2014 (Mar. 20, 2014), XP002742458, Retrieved from the Internet: URL: http://blogs.msdn.com/b/directx/archive/2014/03/20/directx-12.aspx [retrieved on Jul. 20, 2015]. |
Shih-Wei Liao, “Android RenderScript on LLVM,” Apr. 7, 2011 (Apr. 7, 2011), XP055206785, Retrieved from the Internet: URL: https://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_llvm_liao.pdf [retrieved on Aug. 7, 2015]. |
First Office Action received in Chinese Patent Application No. 201580028375.1, dated May 14, 2018. |
Holk, Eric et al.; “GPU Programming in Rust: Implementing High-Level Abstractions in a Systems-Level Language”; IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum; May 20, 2013; pp. 315-324. |
Foley et al., “Spark: Modular, Composable Shaders for Graphics Hardware,” [Retrieved from the Internet on Mar. 13, 2019] <http://www.cs.cmu.edu/afs/cs.cmu.edu/afs/cs.cmu.edu/academic/class/15869-f11/www/readings/foley11_spark.pdf> 2011. |
McDonnel et al., “Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operation,” [Retrieved from the Internet on Mar. 13, 2019] <https://ieeexplore.ieee.org/stamp/stamp.jsp? tp=&arnumber=5290718> 2009. |
Miranda et al., “Erbium: A Deterministic, Concurrent Intermediate Representation for Portable and Scalable Performance,” [Retrieved from the internet on Mar. 13, 2019] <http://delivery.acm.org/10.1145/1790000/1787312/p119-miranda.pdf?ip=151.207.250.61&id=1787312&acc=ACTIVE%2> 2010. |
Number | Date | Country | |
---|---|---|---|
20150347108 A1 | Dec 2015 | US |
Number | Date | Country | |
---|---|---|---|
62005646 | May 2014 | US |