This invention relates generally to processor architectures. More particularly, this invention relates to a Single Instruction Multiple Data (SIMD) processor architecture that processes vectors in the same manner regardless of the size of the vector.
SIMD is a computation technique that performs the same operation on multiple data elements simultaneously. This technique exploits data level parallelism.
A vector is an ordered set of homogeneous data elements, referred to herein as vector units. The vector units correspond to the “multiple data” associated with a single instruction in a SIMD processor. The number of the vector units in a vector defines the vector's size or length. Typically, vector sizes are expressed in bits, as the sum of vector's data elements bit count.
Most SIMD instruction sets operate on a specific number of vector units. Therefore, if there is a change in processor architecture, say from 128-bit vectors to 256-bit vectors, a whole new instruction set is required. Consequently, all existing software needs to be re-written for the new architecture. There is an ongoing need for improved processing power, which results in an ongoing desire for larger vectors. It would be desirable to accommodate changing vector sizes without having to re-write software for each new vector size.
A processor has a special register to store a set of vector sizes up to a maximum size given by the implementation. An execution unit performs an operation on multiple vector units of a vector in the same manner regardless of the vector size.
A computer has a storage unit and a processor adapted to execute a single instruction on multiple vector units when a first value of the vector size is selected from the storage unit. The processor is also adapted to execute the same single instruction on multiple vector units when a second value of the vector size is selected from the storage unit.
A computer has a memory adapted to store a first plurality of instructions encoded for using a first vector size and a second plurality of instructions encoded for using a second vector size. An execution unit with a vector size greater or equal then the first and the second vector sizes executes the first plurality of instructions and the second plurality of instructions.
The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
The invention utilizes a single instruction set for all vector sizes. The instruction set specifies a type of vector unit, also referred to herein as a data format. This vector unit is processed the same by the execution unit, regardless of the number units within the vector. The number of units within a vector is derived from the vector size value stored in a special register. This accessible value effectively defines the vector size. However, since the instructions operate on vector units, changing vector sizes does not necessitate new instruction sets or the re-writing of computer code.
Table I illustrates a vector unit schema that may be utilized in accordance with an embodiment of the invention.
Table I defines vector units with different sizes or data element lengths. The associated abbreviation, e.g. “.b” for byte units, may be added to an instruction. For example, the instruction “add.b” specifies an add operation for all byte vector units. Any instruction may be augmented with the specified abbreviations. Consequently, instructions are defined in connection with a vector unit.
A vector unit index code may also be defined to select individual elements within a vector. Table II illustrates an index scheme that may be used in accordance with an embodiment of the invention.
Consider the following example that operates on word vector units (i.e., 32-bit data elements) in a 128-bit vector architecture. The initial values of vector registers w1, w2, and of general purpose register r2 are shown below in Table III.
In this example, vector w1 has four word vector units. The first vector unit has a value of “d”, the second vector unit has a value of “c”, the third vector unit has a value of “b”, while the fourth vector unit has a value of “a”. Vector w2 has a first vector unit with a value of “D”, a second vector unit with a value of “C”, a third vector unit with a value of “B” and a fourth vector unit with a value of “A”. The register r2 has a 32-bit value of “E”.
Consider now the following instructions:
Execution of these instructions produces the following results of Table IV:
The first row instruction (1) specifies the addition (addv.w) of vector w1 and w2 with the results being placed in vector w5. Table IV shows the result of this operation. For example, the upper right corner shows the value “d+D”, where the value “d” is from the first vector unit of w1 and the value “D” is from the first vector unit of w2, as shown in Table III.
The second row instruction (2) specifies the movement of the value in register r2 into vector w6. Table IV shows that the register value of “E” from r2 is placed in each vector unit of w6.
The third row instruction (3) specifies the addition of 17 to the values associated with the vector units of vector w1, with the result placed in vector w7. Table IV shows vector w7 with a first vector unit of “d+17”, a second vector unit of “c+17”, a third vector unit of “b+17” and a fourth vector unit of “a+17”.
The fourth row instruction (4) specifies the selection of index value 2 from vector units of vector w2, with the results placed in vector w8. Table IV shows the value “B” placed in each vector unit of vector w8. The value “B” is shown in Table III and corresponds to the value in the third vector unit of vector w2 (the indexing scheme specifies 0, 1, 2, 3, so the specification of unit 2 corresponds to the third vector unit).
This example demonstrates that the invention operates on vector units. Operations are performed in connection with individual vector units, regardless of the number of units in the vector. The same 4 instructions operate not only on the above 4 word/128-bit vectors, but also on 8 word/256-bit vectors. Consequently, a single set of instructions may be used to process vectors that are of different sizes.
An embodiment of the invention utilizes an instruction format that specifies the vector unit for a result produced by the instruction. For example, the signed dot product instruction
Table V shows that vector w9 has two double word vector units (each 64 bits), which are used to store the dot product operation on word vector units associated with vectors w1 and w2 of Table III.
The processor 100 includes an execution unit 102 connected to registers 104. At least one register stores the size of the vector.
The configuration of processor 100 is exemplary. The vector unit size agnostic processing may be implemented in any number of configurations. The common operations across all such configurations is the the handling of vector units in a uniform manner, regardless of the vector size. The size is fetched from a register, may be loaded at start-up, or may be written by software.
The processing of the invention allows a single set of instructions to be used for vectors of any size. Consequently, vector sizes may be continuously changed without impacting installed software bases.
While various embodiments of the invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant computer arts that various changes in form and detail can be made therein without departing from the scope of the invention. For example, in addition to using hardware (e.g., within or coupled to a Central Processing Unit (“CPU”), microprocessor, microcontroller, digital signal processor, processor core, System on chip (“SOC”), or any other device), implementations may also be embodied in software (e.g., computer readable code, program code, and/or instructions disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software. Such software can enable, for example, the function, fabrication, modeling, simulation, description and/or testing of the apparatus and methods described herein. For example, this can be accomplished through the use of general programming languages (e.g., C, C++), hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs. Such software can be disposed in any known non-transitory computer usable medium such as semiconductor, magnetic disk, or optical disc (e.g., CD-ROM, DVD-ROM, etc.). It is understood that a CPU, processor core, microcontroller, or other suitable electronic hardware element may be employed to enable functionality specified in software.
It is understood that the apparatus and method described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, the apparatus and methods described herein may be embodied as a combination of hardware and software. Thus, the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.