A processor may receive instructions to execute, and may comprise an instruction decoder to decode instructions into micro-operations (“u-ops”). The instruction decoder may comprise a programmable logic array (PLA) to generate u-op templates from instructions, and an aliasing mechanism, constructed from a field locator and an alias multiplexers array, to receive the u-op templates, to replace fields of u-op templates with fields extracted directly from the instruction, and to output the u-ops.
The frequency at which a PLA operates may depend upon the area of the PLA and the amount of information stored therein. The frequency at which the PLA operates may affect the ability of the processor as a whole to operate at a desired frequency.
Embodiments of the invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like reference numerals indicate corresponding, analogous or similar elements, and in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the invention. However it will be understood by those of ordinary skill in the art that the embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the embodiments of the invention.
A processor may receive instructions to execute, and may comprise an instruction decoder to decode instructions into micro-operations (“u-ops”). The instruction decoder may comprise a programmable logic array (PLA) to generate u-op templates from instructions, and an aliasing mechanism, constructed from a field locator and an alias multiplexers array, to receive the u-op templates, to replace fields of u-op templates with fields extracted directly from the instruction, and to output the u-ops. As will be explained hereinbelow, u-ops decoded by the instruction decoder may be “simple” u-ops or “fused” u-ops.
In one embodiment of the invention, which will be explained with respect to
Embodiments of the invention will be described for particular examples of an instruction decoder. However, it should be understood that embodiments of the invention may be used in other instruction decoder designs as well.
Embodiments of the present invention may be used in any apparatus having a processor. For example, the apparatus may be a portable device that may be powered by a battery. A non-exhaustive list of examples of such portable devices includes laptop and notebook computers, handheld computers, mobile telephones, personal digital assistants (PDAs), and the like. Alternatively, the apparatus may be a non-portable device, such as, for example, a desktop computer or a server computer.
As shown in
Design considerations, such as, but not limited to, processor performance, cost and power consumption, may result in a particular processor design, and it should be understood that the design of processor 4 shown in
A non-exhaustive list of examples for processor 4 includes a central processing unit (CPU), a digital signal processor (DSP), a reduced instruction set computer (RISC), a complex instruction set computer (CISC) and the like. Moreover, processor 4 may be part of an application specific integrated circuit (ASIC) or may be part-of an application specific standard product (ASSP).
A non-exhaustive list of examples for system memory 6 includes a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a flash memory, a double data rate (DDR) memory, RAMBUS dynamic random access memory (RDRAM) and the like. Moreover, system memory 6 may be part of an application specific integrated circuit (ASIC) or may be part of an application specific standard product (ASSP).
System memory 6 may store instructions to be executed by processor 4. System memory 6 may also store data for the instructions, or the data may be stored elsewhere. An instruction decoder 10 may receive instructions from system memory 6, and may decode those instructions into u-ops. An execution subsystem 12 may receive the u-ops from instruction decoder 10 and may receive the data for those u-ops from system memory 6 or elsewhere, and may execute the u-ops.
A u-op may comprise one or more sources and one or more op-codes, where “op-code” is a field of the u-op defining an operation to be performed on “operands”, and “source” is a field of the u-op that may contain an operand or may point to a location where an operand may be found.
The physical traces used to carry u-ops from instruction decoder 10 to execlution subsystem 12 may comprise a number of signal groups.
In the exemplary processor of
“Simple” U-ops and “Fused” U-ops
Instruction decoder 10 may decode instructions into “simple” u-ops, and may decode instructions into “fused” u-ops.
In the exemplary design of processor 4, a “simple” u-op is a u-op that includes a single op-code. When instruction decoder 10 outputs a simple u-op, the “OP1” signal group may carry the op-code. In addition, signal group “OP2 VALID” may carry a value, for example the value “0”, to indicate that signal group “OP2” does not carry an op-code.
For example, a first group of instructions may define an “add” operation between two registers. The general form of instructions of the first group of instructions is shown in (1), and a particular example is shown in (1.a):
Instruction (1) may instruct processor 4 to perform an add operation between the value stored in the register defined in the “reg2” field and the value stored in the register defined in the “reg1” field, and to store the result in the register defined in the “reg1” field.
Instruction decoder 10 may decode instructions that belong to the first group of instructions into simple u-ops. When instruction decoder 10 outputs a simple u-op decoded from instruction (1), the physical traces used to carry u-ops from instruction decoder 10 to execution subsystem 12 may carry the values in the general form shown below in TABLE 1 at instruction (1.b). In the particular example of instruction (1.a), the “reg1” field defines a register named “eax”, and the “reg2” field defines a register named “ebx”, as shown below in TABLE 1 at instruction (1.c).
In the exemplary design of processor 4, a “fused” u-op is a u-op that combines the operations of two simple u-ops and includes two op-codes, one for each operation. When instruction decoder 10 outputs a fused u-op, the “OP1” signal group may carry one op-code, and the “OP2” signal group may carry the other op-code. In addition, signal group “OP2 VALID” may carry a value, for example the value “1”, to indicate that signal group “OP2” carries an op-code.
It should be noted that in other processor designs, a fused u-op may combine the operations of two or more simple u-ops and may include two or more op-codes.
For example, a second group of instructions may define an “add” operation between one register and a value stored in a memory location. The general form of instructions of the second group of instructions is shown in (2), and a particular example is shown in (2.a):
Instruction (2) may instruct processor 4 to load a value from a memory location defined by the fields “base”, “index”, “scale” and “disp”, to perform an add operation between that value and the value stored in the register defined in the “reg1” field, and to store the result in the register defined in the “reg1” field. The “index” and “base” fields of instruction (2) specify registers, which store the address space, the address index and address base values, respectively. The “scale” and “disp” fields of instruction (2) specify an address scaling factor and an address displacement, respectively.
Instruction decoder 10 may decode instructions that belong to the second group of instructions into fused u-ops. When instruction decoder 10 outputs a fused u-op decoded from instruction (2), the physical traces used to carry u-ops from instruction decoder 10 to execution subsystem 12 may carry the values in the general form shown below in TABLE 2 at instruction (2.b). In the particular example of instruction (2.a), the “reg1” field defines a register named “eax”, the “base” field defines a register named “ecx”, the “index” field defines a register named “edx”, the “disp” field defines the value FF2A, and the “scale” field defines the number 2, as shown below in TABLE 2 at instruction (2.c).
The “OP2” signal group may carry the op-code “load”, which is common to all instructions of the second group of instructions.
Structure of Exemplary Instruction Decoder of
Instruction decoder 10 may comprise a programmable logic array (PLA) 14, a field locator 16, and an alias multiplexers group 18. Alias multiplexers group 18 may comprise multiplexers 22 and 26, and may optionally comprise a decoder 28. The output of multiplexers 22 and 26 are the signal groups OP2 and SRCF, respectively. Instruction decoder 10 may further comprise additional multiplexers, decoders or other logic elements, which for clarity are not shown in
Aliasing Fields
Field locator 16 may receive instructions as input, and for a received instruction, field locator 16 may output a group of fields denoted “aliasing fields”. An aliasing field may comprise bits that field locator 16 extracts directly from the instruction and/or bits that are encoded from the instruction and the architectural machine state. Additionally, an aliasing field may comprise bits derived from a field of a u-op template generated by PLA 14 (described below). A non-exhaustive list of examples of the content of an aliasing field includes a logical register, a code address size, a data address size, a data size, a stack address, a stack address size, immediate, scale and displacement data, branch information and a portion of various op-codes. In the exemplary processor of
When instruction decoder 10 receives an instruction from the first group of instructions, “AL1” and “AL4” may not carry relevant information.
When instruction decoder 10 receives an instruction from the second group of instructions, “AL1” may carry the op-code “load”. In the example of instruction (2.a), “AL1” may carry the op-code “load_with_scale_2”, while “AL4” may carry the values of the parameter “index”.
For clarity, the information carried by the aliasing fields “AL1” and “AL4” when instruction decoder 10 receives an instruction from the first group of instructions and when instruction decoder 10 receives an instruction from the second group of instructions is summarized in TABLE 3:
U-op Templates
PLA 14 may store u-op templates. PLA 14 may receive instructions as input, and for a received instruction, PLA 14 may output a particular u-op template. It should be noted that the same u-op template may be addressed by more than one instruction.
A u-op template may comprise fields that explicitly or implicitly define fields of the u-op. In the exemplary processor of
In the exemplary processor of
TABLE 4 summarizes the field content of the simple template and the fused template.
Determination of “OP2” Signal Group
In the exemplary processor of
Multiplexer 22 may receive some of its control input signals from bits of the C-OP2 field and some of its control input signals from bits of the “OP2 VALID” signal group. In addition, multiplexer 22 may receive a first group of data input signals from bits of the “AL1” aliasing field.
In an exemplary embodiment of the invention, the instructions of the first group of instructions may all address the same simple template, and the instructions of the second group of instructions may all address the same fused template.
When instruction decoder 10 receives an instruction of the first group of instructions, PLA 14 outputs the simple template, which has the value “0” for the “FUSED” field. Therefore, the “OP2 VALID” signal group carries the value “0”, and the value output by multiplexer 22 to be carried by the “OP2” signal group will be ignored by execution subsystem 12.
When instruction decoder 10 receives an instruction of the second group of instructions, PLA 14 outputs the fused template, which has the value “1” for the “FUSED” field. Therefore, the “OP2 VALID” signal group carries the value “1”. Having the value “1” carried by the “OP2 VALID” signal group and the value “load” in the C-OP2 field may result in multiplexer 22 outputting the value of the first group of data input signals into the “OP2” signal group.
In a specific example, the C-OP2 field may comprise a number of bits that implicitly define the op-code “load”, and the “AL1” field and the output of multiplexer 22 may comprise a larger number of bits that provide a fall representation of the op-code “load”.
Consequently, a field (e.g. OP2) of a fused u-op having a particular number of bits may be generated using a u-op template field (e.g. C-OP2) having a lower number of bits.
Moreover, if PLA 14 stores two or more u-op templates that are addressed during decoding of instructions into fused u-ops, then the number of bits in each of the u-op templates that are used to select values for a particular field of the fused u-ops may be less than the maximal number of bits in that particular field.
Determination of “SRCF” Signal Group
In the exemplary processor of
Multiplexer 26 may receive some of its control input signals from bits of the “OP2 VALID” signal group. In addition, multiplexer 26 may receive a first group of data input signals from bits of the “AL4” aliasing field.
Having the value “1” in the “OP2 VALID” signal group may result in multiplexer 26 outputting the value of the first group of data input signals (bits of the “AL4” aliasing field) into the “SRCF” signal group. In the example of instructions from the second group of instructions, this value is “index”. Having the value “0” in the “OP2 VALID” signal group may result in multiplexer 26 outputting into the “SRCF” signal group a value that is ignored by execution subsystem 12.
As shown above, for instructions of the second group of instructions the value of the “OP2 VALID” signal group is sufficient for selecting bits of aliasing field “AL4” to be outputted to the “SRCF” signal group. However, other instructions to be decoded into fused u-ops yet which do not belong to the second group of instructions may require other aliasing fields to be outputted to the “SRCF” signal group. Therefore, optional decoder 28 may decode the C-OP2 field and possibly other information to generate an optional group of signals 30 that together with the “OP2 VALID” signal group may control multiplexer 26 to select the appropriate aliasing field for each of these instructions. In another embodiment, optional decoder 28 may decode a field of the u-op template used to generate an operand of a u-op.
Consequently a field (e.g. SRCF) of a fused u-op may be generated without having a respective field in the u-op template (e.g. there is no C-SRCF field in the u-op template).
Structure of Exemplary Instruction Decoder of
Aliasing Fields
In the exemplary processor of
When instruction decoder 11 receives an instruction from the first group of instructions, “AL2” may carry an identifier of the register in the “reg1” fields of the instruction. In the example of instruction (1.a), “AL2” may carry the register identifier “eax”, while “AL1”, “AL3” and “AL4” may not carry relevant information
When instruction decoder 11 receives an instruction from the second group of instructions, “AL1” may carry the op-code “load”. In the example of instruction (2.a), “AL1” may carry the op-code “load_with_scale_2”, while “AL3” and “AL4” may carry the values of the parameters “base” and “index”, respectively, and “AL2” may not carry relevant information.
For clarity, the information carried by the aliasing fields when instruction decoder 11 receives an instruction from the first group of instructions and when instruction decoder 11 receives an instruction from the second group of instructions is summarized in TABLE 5:
U-op Templates
A u-op template may comprise fields that explicitly or implicitly define fields of the u-op. In the exemplary processor of
In the exemplary processor of
Decoder 20 may receive the “COLLAPSE” and “FUSED” u-op template fields from PLA 14, may additionally receive the “MOD” bits directly from the instruction, and may generate the “OP2 VALID” signal group. For a simple template or a fused template, decoder 20 may ignore the “MOD” bits and may generate the “OP2 VALID” signal group according to the value of the “FUSED” u-op template field.
In an exemplary embodiment of the present invention, PLA 14 may include a collapsed template to be addressed by instructions of both the first and second groups of instructions.
When instruction decoder 11 receives an instruction of the first group of instructions or an instruction of the second group of instructions, PLA 14 may output the same collapsed template. For a collapsed template, decoder 20 may output a value on the “OP2 VALID” signal group according to the value of the “MOD” bits.
The value of the “MOD” bits of an instruction from the first group of instructions may have a binary value, for example “11”, indicating an operation between two registers. Consequently, decoder 20 may output the value “0” on the “OP2 VALID” signal group to indicate that instruction decoder 11 outputs a simple u-op and that the “OP2” signal group does not carry an op-code.
However, the value of the “MOD” bits of an instruction from the second group of instructions may have a binary value, for example not “11”, indicating an operation between a register and a memory location. Consequently, decoder 20 may output the value “1” on the “OP2 VALID” signal group to indicate that instruction decoder 10 outputs a fused u-op and that the “OP2” signal group carries an op-code.
TABLE 6 summarizes the field content of the simple template, the fused template and the collapsed template.
Determination of “OP2” Signal Group
The determination of the “OP2” signal group via control input signals for multiplexer 22 may occur as described hereinabove with respect to
Determination of “SRCF” Signal Group
The determination of the “SRCF” signal group via control input signals for multiplexer 26 may occur as described hereinabove with respect to
Determination of “SRC1” Signal Group
In the exemplary processor of
The value carried by the “SRC1” signal group for a simple u-op may differ from that for a fused u-op. If the simple u-op and the fused u-op are generated from the same collapsed template, then additional information may be needed in order to determine from which group of data input signals multiplexer 24 is to output a value to be carried by the “SRC1” signal group. As will now be described, that additional information is provided by the “OP2 VALID” signal group and bits of the C-SRC1 field.
Multiplexer 24 may receive some of its control input signals from bits of the C-SRC1 field and some of its control input signals from bits of the “OP2 VALID” signal group. In addition, multiplexer 24 may receive a first group of data input signals from bits of the “AL2” aliasing field and a second group of data input signals from bits of the “AL3” aliasing field.
When instruction decoder 11 receives an instruction of the first group of instructions, PLA 14 may output the collapsed template, and the value of the “MOD” bits is “11”. Therefore, the “OP2 VALID” signal group has the value “0”. Having the value “0” in the “OP2 VALID” signal group and the value “reg1” in the C-SRC1 field may result in multiplexer 24 outputting the value of the first group of data input signals (namely “reg1”) into the “SRC1” signal group. A similar result would have occurred if the instruction of the first group of instructions addressed a simple template in PLA 14.
When instruction decoder 11 receives an instruction of the second group of instructions, PLA 14 may output the collapsed u-op template, and the value of the “MOD” bits is different from “11”. Therefore, the “OP2 VALID” signal group has the value “1”. Having the value “1” in the “OP2 VALID” signal group and the value “base” in the C-SRC1 field may result in multiplexer 24 outputting the value of the second group of data input signals of multiplexer 24 (namely “base”) into the “SRC1” signal group. A similar result would have occurred if the instruction of the second group of instructions addressed a fused template in PLA 14.
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.