BRIEF DESCRIPTION
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views. While several embodiments are described in connection with these drawings, there is no intent to limit the disclosure to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications, and equivalents.
FIG. 1A is a flowchart illustrating stream data processing steps that can be taken in an exemplary vector processing unit.
FIG. 1B is a flowchart illustrating stream data processing steps that can be taken in an exemplary scalar processing unit, similar to the steps illustrated in FIG. 1A.
FIG. 1C is an exemplary stream processing SIMD structure with software implementation of complex mathematical functions.
FIG. 1D is an exemplary stream processing SIMD structure with hardware implementation of complex mathematical functions using private special function unit (SFU) for each ALU.
FIG. 1E is an exemplary stream processing SIMD structure with hardware implementation of complex mathematical functions using a common SFU for all ALUs.
FIG. 1F is an exemplary stream processing SIMD structure with implementation of complex mathematical functions using a common SFU with interleaved access to common SFU.
FIG. 1G is an exemplary illustration of an SIMD factor reduction in the case of a common SIMD structure for both vertex and triangle processing.
FIG. 2A a flowchart illustrating steps that can be taken in an exemplary scalar processing unit, similar to the flowchart from FIG. 1, with an SIMD factor 4.
FIG. 2B is a flowchart illustrating steps that can be taken in an exemplary scalar processing unit, similar to the flowchart from FIG. 1, with an SIMD factor 1.
FIG. 2C is a flowchart illustrating steps that can be taken in an exemplary scalar processing unit, similar to the flowchart from FIG. 1, with an SIMD factor 8 for short data format.
FIG. 2D is a flowchart illustrating steps that can be taken in an exemplary processing unit, similar to the flowchart from FIG. 1, with an SIMD factor 4 for short data format.
FIG. 3 is an exemplary logical structure of paired scalar ALUs with dual format processing capabilities, illustrating processing characteristics from FIGS. 1 and 2A-2G, illustrating stream ALU functionality.
FIG. 4 is an exemplary stream processing unit in long format processing mode with paired scalar ALUs, similar to the structure from FIG. 3, and showing an upper level of control and memory.
FIG. 5A is a table illustrating exemplary arithmetic functionality of paired scalar ALUs, and can be used as a base for numerical processing instruction set development such as the ALUs illustrated in FIGS. 3 and 4.
FIG. 5B is a GPU structure where an exemplary stream processor pool is used as a computational core, where the stream processor has a scalable architecture and may contain from 2 to 16 ALUs combined with a reduced number of special function units.
FIG. 6 an exemplary flow diagram and logical structure of a stream processor with 4 scalar ALUs, and SFU interaction, similar to the ALUs from FIGS. 3 and 4.
FIG. 7A is a flowchart illustrating an exemplary normalized vector difference processing in a vector ALU.
FIG. 7B is a flowchart of an exemplary processing routine in a proposed stream scalar ALU combined with an SFU.
FIG. 7C is a continuation of FIG. 7B.
FIG. 8 is an exemplary ALU module, implementing functionality of the ALUs from FIG. 6.
FIG. 9 is an exemplary modular stream processor with a combination of 4 ALU modules, similar to the ALUs from FIGS. 3 and 4.
FIGS. 10A-10C are diagrams illustrating exemplary logical structure and data formats for Multiply Accumulate units, such as the Multiply Accumulate Unit from FIG. 8.
FIG. 11 is an exemplary structure of a MACC unit, similar to the MACC unit from FIG. 8.
FIG. 12 is an exemplary diagram of a short exponent calculation, similar to the short exponent calculation from FIG. 11.
FIG. 13 is an exemplary diagram of a short exponent calculation combined with a mixed exponent, similar to the short exponent calculation from FIG. 11.
FIG. 14 is an exemplary diagram of a short mantissa path for various channels, describing details of the mantissa path illustrated in FIG. 11.
FIG. 15 is an exemplary diagram of a long exponent calculation, describing details of the exponent calculation block from FIG. 11.
FIG. 16 is an exemplary diagram of a long exponent calculation, for a paired ALU, describing details of the long exponent calculation block from FIG.
FIG. 17 is an exemplary diagram of a long mantissa data path, describing details of a data path illustrated in FIG. 11.
FIG. 18 is an exemplary diagram of a long mantissa data path for a paired ALU, similar to the data path illustrated in FIG. 11.
FIG. 19 is an exemplary diagram of a mixed exponent calculation, describing details of the mixed exponent calculation illustrated in FIG. 11.
FIG. 20 is an exemplary diagram of a mixed exponent calculation for a paired ALU, similar to a mixed exponent calculation illustrated in FIG. 19.
FIG. 21 is an exemplary diagram of a mixed mantissa data path, describing details of the data path illustrated in FIG. 11.
FIG. 22 is an exemplary diagram of a mixed mantissa data path for a paired ALU, similar to a data path illustrated in FIG. 21.
FIG. 23 is an exemplary diagram of a merged mantissa data path, which can process short and long data formats, describing details of a possible implementation of the data path illustrated in FIG. 11.
FIG. 24 is an exemplary diagram illustrating a merged mantissa data path, similar to a data path illustrated in FIG. 11.
FIG. 25A is an exemplary diagram illustrating merged shift and control logic, which can be applied in the MACC from FIGS. 23 and 24.
FIG. 25B is an exemplary diagram illustrating sign control logic, which can be applied in the MACC from FIGS. 23 and 24.
FIG. 26 is an exemplary table of complement shift input and output formats, which may be utilized in the MACC from FIG. 11.
FIG. 27A is an exemplary diagram of a mantissa addition path, which can be utilized in the MACC from FIGS. 23 and 24.
FIG. 27B is an exemplary diagram of processing formats that can be utilized in the MAD carry save adder tree units from FIGS. 23 and 24.
FIG. 27C is a continuation of the processing formats from FIG. 27B:
FIG. 28A is an exemplary diagram of a fence implementation in a CSA adder, which may be utilized in the MACC from FIGS. 23 and 24.
FIG. 28B is an exemplary diagram of a fence implementation in a CPA adder, which may be utilized in the MACC from FIGS. 23 and 24.
FIG. 29 is an exemplary diagram of a fence implementation in a complement shift unit, which may be utilized in the MACC from FIGS. 23 and 24.
FIG. 30A is an exemplary fence in a normalization shifter, which may be utilized in the MACC from FIGS. 23 and 24.
FIG. 30B is a more detailed view of the exemplary fence from FIG. 30A.
FIG. 31 is a flowchart illustrating an exemplary process that may be utilized for sending data to a functionally separated ALU.