This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2014-068415, filed on Mar. 28, 2014, the entire contents of which are incorporated herein by reference.
The present invention relates to an arithmetic processing unit and a control method for an arithmetic processing unit.
A CPU (Central Processing Unit) serving as an arithmetic processing unit (an operation processing unit or a processor) employs various processing speed increasing techniques. These processing speed increasing techniques include, for example, a pipeline processing system in which consecutive instructions are divided into a plurality of stages or cycles and processed successively, a superscalar system in which operation processes are executed in parallel, an out-of-order execution system in which instructions are executed as soon as input data, operators, and the like used to execute the instructions are ready instead of executing the instructions in a sequence specified by a program, or in other words executing the instructions in order, and so on.
The out-of-order execution system includes a register renaming technique in which output data obtained when execution of an instruction is complete are stored temporarily in a renaming register, and once instructions that come earlier in the processing sequence are completed, the output data are stored in a destination register specified by the instruction as a register in which to hold operation results.
An SIMD (Single Instruction Multiple Data) processing system, in which a plurality of data are processed in parallel in response to a single instruction, is available as a further technique for increasing processing speed by performing a plurality of processes in parallel. In the case of 4-SIMD, in which four sets of data are processed in parallel in response to a single instruction, the CPU that realizes the SIMD processing system decodes a single instruction code (operation code), reads data (source operand data) respectively from first to fourth source side registers identified by identical addresses, inputs the read data respectively into first to fourth operators (arithmetic logic units), and outputs four obtained operation results (arithmetic operation results) respectively to first to fourth destination side (storage destination) registers.
A CPU in which the out-of-order system and the SIMD processing system are incorporated realizes the out-of-order system by including both a destination register (a storage destination register) specified by an instruction as a register in which final processing results are stored, and a renaming register in which processing results are stored temporarily, and realizes the SIMD processing system by including sets of an operator (an arithmetic logic unit), a destination register, a renaming register, and a register renaming unit that stores associations between the destination registers and the renaming registers in a number of sets that can be processed in parallel by SIMD.
Japanese Laid-open Patent Publication No. 2011-34450 and Japanese Laid-open Patent Publication No. 2007-234011, for example, describe CPUs in which the out-of-order system and the SIMD processing system are incorporated.
A CPU in which the out-of-order system and the SIMD processing system are incorporated is preferably able to make effective use of extended operators (arithmetic logic units) and registers provided to process an SIMD instruction (also referred to as a multi-data instruction) for processing a plurality of data sets in response to a single instruction likewise when a non-SIMD instruction (also referred to as a non-multi-data instruction) for processing a single data set for a single instruction is executed. The reason for this is that by making effective use of hardware resources, a larger number of non-SIMD instructions (or non-multi-data instructions) are processed.
However, when an attempt is made to increase a degree of freedom of using hardware resources so that the all of the plurality of sets of operators, destination registers, renaming registers, and register renaming units provided to process an SIMD instruction (or a multi-data instruction) can also be used to process a non-SIMD instruction, a circuit volume of hardware circuits increases. An increase in the circuit volume of the register renaming units storing the associations between the registers is particularly noticeable since there is no need to reference the associations between all of the registers on maps provided in the register renaming units when processing an SIMD instruction (a multi-data instruction).
In other words, by increasing a degree of parallelism of the SIMD processing, processing an application that executes instructions to compute a large amount of data can be increased in speed, but when an attempt is made at the same time to secure a high degree of freedom in the use of hardware resources during processing of non-SIMD instructions (non-multi-data instructions), the hardware circuits increase in scale. Hence, it is desirable to increase the degree of parallelism of the SIMD processing while suppressing the scale of the hardware circuits to a reasonable level.
One aspect of embodiments is an arithmetic processing unit comprising:
an instruction decoder configured to decode an instruction;
three or more operators configured to, when the instruction decoded by the instruction decoder is a multi-data instruction in which plural data processing is implemented parallel in response to a single instruction, process in parallel the plural data, and when the instruction decoded by the instruction decoder is a non-multi-data instruction in which singular data processing is implemented in response to a single instruction, process the singular data individually;
a plurality of storage destination register groups that are provided to correspond respectively to the plurality of operators and are configured to store operation results from the operators;
a plurality of renaming register groups that are provided to correspond respectively to the plurality of operators and are configured to store the operation results; and
a register renaming unit configured to store an association between a specified storage destination register specified by an instruction from the storage destination register group and an allocated renaming register allocated from the renaming register group,
wherein a register set having the storage destination register group and the renaming register group includes a basic register set used to operate the multi-data instruction and to operate the non-multi-data instruction, a first extended register set used to operate the multi-data instruction and to operate the non-multi-data instruction, and a second extended register set used to operate the multi-data instruction but not used to operate the non-multi-data instruction, and
the register renaming unit stores the association of the basic register set and the association of the first extended register set.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
The operation processing unit 20 includes, for example, four CPU cores (operation processing units) 30A to 30D, a secondary cache 24 shared by the four CPU cores, an input/output interface 26, and a memory access controller (MAC) 28 that controls access to the main memory 14.
More particularly, the CPU core 30 depicted in
The CPU core 30 of
The CPU core 30 also includes a register renaming unit REG_REN that stores associations between the storage destination registers and the renaming registers allocated thereto, a reservation station (Reservation Station for Address generate: RSA) for generating a main storage operand, a reservation station (Reservation Station for Execute: RSE) for a fixed point arithmetic operation, a reservation station (Reservation Station for Floating: RSF) for a floating point arithmetic operation, a reservation station (Reservation Station for Branch: RSBR) for branching, and a commit stack entry (CSE).
The respective reservation stations RS are queues of instructions issued by the instruction decoder 305, and are provided in association with execution units that execute the instructions. The fixed point arithmetic operation reservation station RSE and the floating point arithmetic operation reservation station RSF in particular issue the instructions to corresponding operators (arithmetic logic units) out of order, or in other words as soon as input data and operators for executing the instructions are ready. The commit stack entry CSE, meanwhile, determines instruction completion in relation to all instruction entries so that an instruction started out of order is completed in order.
The CPU core 30 further includes an operand data selection unit 310, an operand address generator 311, a primary data cache 312, and a storage buffer 313. Furthermore, the CPU core 30 includes an operator (an arithmetic logic unit) 320 that performs a fixed point arithmetic operation, an SIMD operator (an SIMD arithmetic logic unit) 330 that performs a floating point arithmetic operation, a fixed point renaming register 321, a floating point renaming register FR_REG, a fixed point register 322, a floating point SIMD register FS_REG, and the program counter PC.
The instruction fetch address generator 301 selects an instruction address on the basis of a count value of the program counter PC or information from the branch prediction unit 302, and issues an instruction fetch request to the primary instruction cache 303. The branch prediction unit 302 performs branch prediction on the basis of entries in the branch reservation station RSBR. The primary instruction cache 303 stores in the instruction buffer 304 an instruction read in response to the instruction fetch request. Instructions are then supplied from the instruction buffer 304 to the instruction decoder 305 in an instruction sequence specified by a program, or in other words in order, whereupon the instruction decoder 305 decodes the instructions supplied from the instruction buffer 304 in order.
The instruction decoder 305 creates a required entry in one of the four reservation stations RSA, RSE, RSF, and RSBR in accordance with the type of the decoded instruction. The instruction decoder 305 also creates entries corresponding to all of the decoded instructions in the commit stack entry CSE. Further, the instruction decoder 305 allocates a register in a renaming register 321, FR_REG to a register in an architecture register 322, FS_REG specified by the instruction.
When an entry is created in the reservation station RSA, RSE, or RSF, the register renaming unit REG_REN stores the address of the renaming register allocated to the architecture register specified by the instruction. An association between the specified architecture register and the allocated renaming register is registered in a renaming map stored in the register renaming unit REG_REN. The CPU core 30 includes the fixed point register 322 and the floating point SIMD register FS_REG as architecture registers. These registers are specified by the instruction as storage registers in which to store operation processing results. Further, the CPU core includes the fixed point renaming register 321 and the floating point renaming register FR_REG as renaming registers.
When the fixed point register 322 is used as a storage destination register, the instruction decoder 305 allocates the address of the fixed point renaming register 321 as the renaming register. Further, when the floating point SIMD register is used as the storage destination register, the instruction decoder 305 allocates the floating point renaming register FR_REG as the renaming register. The renaming register address allocated to the address of the storage destination register is output to the reservation station RSA, RSE, RSF corresponding to the instruction and the commit stack entry CSE as an association.
The reservation stations RSA, RSE, RSF output the entries held therein as soon as resources required to process the entries, for example data and operators, are ready, whereupon processing corresponding to the entries is executed in later stage blocks such as operators. Accordingly, the instructions are initially executed out of order, and therefore processing results obtained in relation to the instructions are stored temporarily in the fixed point renaming register 321 or the floating point renaming register FR_REG.
Entries corresponding to floating point arithmetic operation instructions, for example, are stored in the floating point reservation station RSF. The SIMD operator 330 selects input data to be computed on the basis of an entry from the reservation station RSF, and executes a floating point arithmetic operation thereon. During execution of the floating point instruction, an operation result from the SIMD operator 330 is stored temporarily in the floating point renaming register FR_REG.
Further, during execution of a floating point storage instruction, the SIMD operator 330 outputs data selected as an operation subject to the storage buffer 313. The storage buffer 313 specifies an operand address output from the operand address generator 311, and writes the data output from the SIMD operator 330 to the primary data cache 312.
The commit stack entry CSE holds entries corresponding to all of the instructions decoded by the instruction decoder 305, and manages execution conditions of the processing corresponding to the respective entries such that the instructions are completed in order. For example, when the commit stack entry CSE determines that the result of the processing corresponding to the entry to be completed next is stored in the fixed point renaming register 321 or the floating point renaming register FR_REG and that the instructions coming earlier in the sequence are completed, the commit stack entry CSE outputs the data stored in the renaming register to the fixed point register 322 or the floating point SIMD register FS_REG. As a result, the instructions executed out of order in the respective reservation stations are completed in order.
The fixed point renaming register 321 and the floating point renaming register FR_REG include a plurality of registers in an identical number to or a smaller number than the number of entries in the commit stack entry CSE.
The SIMD operator 330 includes a basic operator and an extended operator. The basic operator includes an operation circuit that is capable of executing a large number of kinds of operations, for example. The extended operator includes an operation circuit that is capable of handling a part of the operations. In the case of 4-SIMD processing, for example, in which four data sets are processed in parallel by a single instruction, the SIMD operator 330 includes a single basic operator and three extended operators.
The floating point SIMD register FS_REG includes basic registers and extended registers in respectively identical numbers. Likewise, the floating point renaming register FR_REG includes basic renaming registers and extended renaming registers in respectively identical numbers.
In
The floating point reservation station RSF, the SIMD operator 330, the floating point SIMD register FS_REG, and the floating point renaming register FR_REG, which together constitute a floating point operation unit in
Likewise in response to a non-SIMD instruction, meanwhile, the processing result from the operator is stored temporarily in the floating point renaming register FR_REG, and when the commit stack entry CSE detects completion of the aforesaid instructions, the processing result stored temporarily in a register of the floating point renaming register FR_REG is stored in a register of the floating point SIMD register FS_REG.
[Problems Involved in Improving Degree of Parallelism in SIMD Processing and Degree of Freedom in Non-SIMD Processing]
Next, problems arising when an attempt is made to improve a degree of parallelism of the SIMD processing and improve a degree of freedom of the non-SIMD processing simultaneously will be described.
Similarly, the floating point renaming register FR_REG includes a single group of basic renaming registers BR_REG and a single group of extended renaming registers ER_REG. The groups of basic renaming registers BR_REG and extended renaming registers ER_REG respectively have an 8-byte width and include identical numbers of registers. In
The register renaming unit REG_REN, meanwhile, includes a single basic register renaming map BRRM. The basic register renaming map BRRM includes entries corresponding to register numbers 0 to 127 of the basic registers B_REG in the floating point SIMD register FS_REG, and holds register numbers or addresses of the basic renaming registers BR_REG allocated respectively to the basic registers B_REG. As described above, this basic renaming register allocation processing is performed by the instruction decoder 305.
In the 2-SIMD configuration depicted in
The register renaming processing performed during execution of a non-SIMD instruction, depicted in
Next, the register renaming processing performed during execution of an SIMD instruction, depicted in
Likewise in the floating point renaming register FR_REG, meanwhile, a basic renaming register BR_REG and an extended renaming register ER_REG having identical register numbers, among the register numbers 0 to a certain number, are used as a set. The basic renaming register BR_REG is used by the first of the two pieces or sets of 8-byte data processed in parallel, while the extended renaming register ER_REG having the same register number is used by the second piece or set of data.
In the register renaming unit REG_REN, the allocated register number in the floating point renaming register FR_REG is stored in the basic register renaming map BRRM in the entry that corresponds to the register number specified by the floating point SIMD register FS_REG. The allocated register number does not necessarily have to be identical to the register number of the floating point SIMD register.
In the example depicted in
In the register renaming unit REG_REN, meanwhile, the basic renaming register BR_REG and the extended renaming register ER_REG having the same register number “0” or a different register number are allocated to the basic register B_REG and the extended register E_REG having the register number “0”. In the example of
The basic and extended register renaming maps BRRM, ERRM of the register renaming unit REG_REN include entries corresponding to the basic registers B_REG 0 to 127 and entries corresponding to the extended registers E_REG in the floating point SIMD register FS_REG. The basic register renaming map BRRM holds the register numbers or addresses of the basic renaming registers BR_REG allocated respectively to the basic registers B_REG. Further, the extended register renaming map ERRM holds the register numbers or addresses of the extended renaming registers ER_REG allocated respectively to the extended registers E_REG.
In the 2-SIMD configuration of
The register renaming processing performed during execution of a non-SIMD instruction in
Meanwhile, in the register renaming unit REG_REN, the register number or address of a basic renaming register BR_REG or an extended renaming register ER_REG is allocated to the basic register B_REG or the extended register E_REG specified by the non-SIMD instruction. In the example of
An extended SIMD operator among the basic SIMD operators and the extended SIMD operators in the floating point SIMD operator 330 is then used and stores the processing result in the extended renaming register ER_REG having the register number “1”. When the processing is complete, the processing result is stored in the extended register E_REG having the register number “128”.
Hence, during execution of a non-SIMD instruction, the extended registers E_REG and the extended renaming registers ER_REG is also used, and as a result, the degree of hardware resource freedom of the non-SIMD instruction is increased.
Next, the register renaming processing performed during execution of a 2-SIMD instruction in
Likewise in the floating point renaming register FR_REG, meanwhile, the register allocated from the basic renaming registers BR_REG and the register allocated from the extended renaming registers ER_REG are used as a set.
Accordingly, in the basic register renaming map BRRM of the register renaming circuit REG_REN, the register number of the basic renaming register BR_REG allocated to the basic register B_REG is stored in the entry corresponding to the basic register B_REG, and the register number of the extended renaming register ER_REG allocated to the extended register E_REG is stored in the entry corresponding to the extended register E_REG.
For example, when the register number “0” of the floating point SIMD register FS_REG is specified by the SIMD instruction as the destination operand, the CPU core executes identical processing in parallel on the two pieces or two sets of 8-byte data specified by the SIMD instruction. Two sets of processing result data are then written temporarily to the allocated basic and extended renaming registers BR_REG, ER_REG of the floating point renaming register FR_REG, and when the instruction is completed, the processing result data are written to the specified basic register B_REG and extended register E_REG of the floating point SIMD register FS_REG. In this case, in the floating point SIMD register, one piece of processed 8-byte data is stored in the basic register B_REG having the register number “0”, and the other piece of processed 8-byte data is stored in the extended register E_REG having the register number “0”.
In the register renaming unit, meanwhile, a basic renaming register BR_REG and an extended renaming register ER_REG having different register numbers may be allocated respectively to the basic register B_REG and the extended register E_REG having identical register numbers. For example, in the example of
Therefore, for example, when the register number “0” in the floating point SIMD register FS_REG is specified by the SIMD instruction as the destination operand and renaming register allocation is performed as depicted in
In the examples depicted in
Hence, a 3-SIMD configuration, in which the degree of parallelism of the SIMD instruction is even further improved, will now be described.
The basic register renaming map BRRM and the first and second extended register renaming maps ERRM1, ERRM2 of the register renaming unit REG_REN include entries corresponding to the basic registers B_REG having register numbers 0 to 127 and entries corresponding to the first and second extended registers E_REG1, E_REG2 having register numbers 128 to 255 and 256 to 383, respectively, in the floating point SIMD register FS_REG. The basic register renaming map BRRM holds the register numbers or addresses of the basic renaming registers BR_REG allocated respectively to the basic registers B_REG. Further, the two extended register renaming maps ERRM1, ERRM2 hold the register numbers or addresses of the extended renaming registers ER_REG1, ER_REG2 allocated respectively to the extended registers E_REG1, E_REG2.
In the 3-SIMD configuration of
During execution of a non-SIMD instruction in
During execution of an SIMD instruction in
When, as depicted in
It is therefore preferable to realize improvements in the degree of parallelism of the SIMD instruction and the degree of freedom with which hardware is used by the non-SIMD instruction while suppressing the circuit scale to a reasonable level.
The CPU core depicted in
In accordance with the three operators, the floating point renaming register FR_REG includes a single basic renaming register BR_REG and two extended renaming registers ER_REG1, ER_REG2. Similarly, the floating point SIMD register FS_REG serving as the architecture register includes a single basic register B_REG and two extended registers E_REG1, E_REG2.
Further, the primary data cache 312 includes, in addition to a cache memory and a cache control unit not depicted in the drawing, a single basic load register 312_B and two extended load registers 312_E1, 312_E2 for storing data loaded from the cache memory.
Input data input into the operator is selected from the data stored in any of the total of twelve registers including the three load registers in the primary data cache 312, the three basic result registers, the three floating point renaming registers, and the three floating point SIMD registers. Accordingly, the basic operand data selector B_SEL and the two extended operand data selectors E_SEL1, E_SEL2 select one of the twelve registers. When a number of pieces of data that is input into the operator is N, N selectors are provided in each operator.
Although the CPU core 30 in
Meanwhile, the instruction decoder 305 allocates the renaming register such that a third association between the address or register number of the second extended register E_REG2 and the address or register number of the second extended renaming register ER_REG2 allocated to the second extended register is the same as either the first association stored in the basic register renaming map BRRM or the second association stored in the extended register renaming map ERRM1. Hence, the floating point reservation station RSF obtains the address or register number of the register in the second extended renaming register ER_REG2 where the operation result obtained by the second extended operator E_EXC2 is temporarily stored, by referring to either the basic register renaming map BRRM or the extended register renaming map ERRM1.
To execute a 3-SIMD instruction, the CPU core of
To execute a non-SIMD instruction, on the other hand, the CPU core uses either the basic operator E_EXC or the first extended operator E_EXC1, either the basic renaming register BR_REG or the first extended renaming register ER_REG1, and either the basic register B_REG or the first extended register E_REG1. Hence, when a non-SIMD instruction is executed, the first extended renaming register ER_REG1 is used in addition to the basic renaming register BR_REG so that execution of the instruction is started out of order, and as a result, the degree of freedom of hardware use is improved.
Note, however, that when a non-SIMD instruction is executed, the second extended renaming register ER_REG2 is not be used. Because of this restriction, only the single extended register renaming map ERRM1 need be provided in the register renaming unit REG_REN in addition to the basic register renaming map BRRM. The number of renaming maps is therefore reduced, and as a result, an increase in the circuit scale is suppressed.
In this embodiment, as described above, the first extended renaming register ER_REG1 is used as a register for temporarily storing operation results during an SIMD instruction operation and a non-SIMD instruction operation, while the second extended renaming register ER_REG2 is used as a register for temporarily storing operation results during an SIMD instruction operation but not used as such a register during a non-SIMD instruction operation.
In other words, the CPU core according to this embodiment includes, as register sets for storing operation results, that are the floating point SIMD register FS_REG and the floating point renaming register FR_REG, a basic register set used during both an SIMD instruction operation and a non-SIMD instruction operation, a first extended register set used during both an SIMD instruction operation and a non-SIMD instruction operation, and a second extended register set used during an SIMD instruction operation but not used during a non-SIMD instruction operation.
Note that the register sets of the floating point SIMD register and the floating point renaming register are used as a register set including a basic register B_REG and a basic renaming register BR_REG, a register set including a first extended register E_REG1 and a first extended renaming register ER_REG1, and a register set including a second extended register E_REG2 and a second extended renaming register ER_REG2.
In
Register renaming processing performed during execution of a non-SIMD instruction in
Meanwhile, the register renaming unit REG_REN stores the register number or address of the basic renaming register BR_REG or the first extended renaming register ER_REG1 allocated to the basic register B_REG or first extended register E_REG1 of the floating point SIMD register FS_REG that is specified by the non-SIMD instruction.
In the example of
Register renaming processing performed during execution of an SIMD instruction in
As depicted in
In the register renaming circuit REG_REN, meanwhile, a basic renaming register BR_REG and a first extended renaming register ER_REG1 having different register numbers are allocated respectively to the basic register B_REG and the first extended register E_REG1 having identical register numbers. Note, however, that the second extended renaming register ER_REG2 having the same number as the basic renaming register BR_REG is allocated to the second extended register ER_REG2. It is therefore not possible to allocate a basic renaming register BR_REG and a second extended renaming register ER_REG2 having different register numbers.
In the example of
Hence, in a case where a register having the register number “0” in the floating point SIMD register FS_REG is specified by the SIMD instruction as the destination operand and three renaming registers BR_REG, ER_REG1, ER_REG2 in the floating point renaming register FR_REG are allocated, as depicted in
In the example of
The register renaming processing performed during execution of an SIMD instruction depicted in
[Operations of CPU Core According to this Embodiment]
Next, operations of the CPU core during execution of a floating point arithmetic operation instruction will be described specifically. An example of operations performed in relation to a floating point arithmetic operation instruction will be described below as an example, but similar register renaming processing is performed in relation to a floating point load instruction and a floating point store instruction.
When the instruction decoder 305 decodes the floating point arithmetic operation instruction, the CPU core reads data from a register specified by a source operand, executes the operation instruction, and writes the operation result to the register specified by the destination operand.
In the case of a floating point arithmetic operation instruction, for example, it is assumed that a following instruction requiring six cycles to execute the operation is executed. An instruction code of a floating point SIMD instruction (referred to hereafter as an SIMD operation instruction) is described as follows, for example.
Simd−fmad % f127×% f100+% f50=% f10
In this instruction, three registers, namely % f127, % f100, and % f50, are specified as the source operands. Three pieces of 8-byte data are read from the specified registers, whereupon three-system multiplication and addition processing are executed thereon in parallel. In other words, three sets of data respectively including three pieces of data are read, whereupon the three sets of data are processed in parallel by operators of three systems. Respective operation results are then written to the floating point SIMD register FS_REG specified by % f10 serving as the destination operand.
An instruction code of a floating point non-SIMD instruction (referred to hereafter as a non-SIMD operation instruction), meanwhile, is described in an identical format to that described above, albeit with a different operation code. In response to this instruction, a single-system operation is performed on each of the registers specified by the source operand, whereupon an operation result is written to the register specified from the floating point SIMD register as the destination operand.
In the SIMD operation instruction of
A D cycle is an instruction decoding cycle. In the D cycle, the instruction decoder 305 decodes the floating point SIMD instruction, and on the basis of the decoding result registers corresponding entries respectively in the commit stack entry CSE and the floating point reservation station RSF (S1, S2). Entries corresponding to all instructions other than the floating point SIMD operation instruction are registered in the commit stack entry CSE. Further, an entry corresponding to a floating point instruction is registered in the floating point reservation station RSF.
The instruction decoder 305 mainly registers information relating to the write destinations of the operation results in the entries of the commit stack entry CSE. Further, the instruction decoder 305 allocates three registers in the floating point renaming register FR_REG to the three write destination registers in the floating point SIMD register FS_REG, and registers the associations between the three registers in the basic register renaming map BRRM and the extended register renaming map ERRM1 of the register renaming unit REG_REN (S3). More specifically, the instruction decoder 305 writes the register numbers or addresses of the allocated basic renaming register BR_REG and the first extended renaming register ER_REG1 in entries of the two maps BRRM, ERRM1 corresponding to the register numbers specified as the write destinations in the floating point SIMD register FS_REG. The instruction decoder 305 then registers the register numbers or addresses of the registered renaming registers in the entries of the commit stack entry CSE (S4).
Further, the instruction decoder 305 registers information relating to source data of the source operand in an entry of the floating point reservation station RSF. When an address of the source data of the source operand is a register in the floating point SIMD register FS_REG, for example, and data stored temporarily in the floating point renaming register allocated to the register are to be input and computed, the instruction decoder 305 obtains the address of the floating point renaming register by referring to the map in the register renaming unit, and registers the address in an entry in the RSF (S4)
A P cycle is a priority cycle. In the P cycle, the floating point reservation station RSF performs queuing control on the data in the registered entries. The RSF issues the oldest entry, from among the registered entries for which the required input data are ready, to the SIMD operator 330 (S10). Next, the processing advances to
A following B cycle is a buffer cycle. In the B cycle, the basic operand data selector B_SEL and the first and second extended operand data selectors E_SEL1, E_SEL2 select source operand data from any of the load registers 312_B, 312_E1, 312_E2, the result registers Br_reg, Er_reg1, Er_reg2, the renaming registers BR_REG, ER_REG1, ER_REG2, and the registers B_REG, E_REG1, E_REG2, and input the selected data into the corresponding operator B_EXC, E_EXC1, E_EXC2 (S11). When the input is an execution result relating to an instruction that has completed the load processing or the operation by the operator but has not yet undergone the completion processing by the CSE, the input data are input from the load registers, the result registers, or the renaming registers. Further, a processing result relating to an instruction that has completed execution is input from the registers B_REG, E_REG1, E_REG2.
X1 to X6 denote six operation execution cycles. In the X1 to X6 cycles, the basic operator B_EXC and the first and second extended operators E_EXC1, E_EXC2 execute operation processing on the input data selected by the operand data selectors. The respective operators then store operation results in the respective result registers Br_reg, Er_reg1, Er_reg2 (S12). Further, when having stored the operation results in the result registers, the respective operators output an operation completion report to the commit stack entry CSE (S13).
A U cycle is an update cycle. In the U cycle, the operation results stored in the result registers are stored in the corresponding renaming registers BR_REG, ER_REG1, ER_REG2 (S14).
A C cycle is an instruction completion cycle. In the C cycle, the commit stack entry CSE determines that the SIMD operation instruction is complete on the basis of an operation report from the floating point SIMD operator 330 (S15).
Finally, a W cycle is a register update cycle. The commit stack entry CSE stores the operation results of the renaming registers BR_REG, ER_REG1, ER_REG2 in the three registers B_REG, E_REG1, E_REG2 of the floating point SIMD register FS_REG at a timing when the current SIMD operation instruction is ready to be completed on the basis of the instruction sequence (S16). The commit stack entry CSE then provides the renaming registers with information indicating the registers of the floating point SIMD register FS_REG in which the respective operation results in the registers of the renaming registers should be stored.
As described above, when a floating point SIMD operation instruction is executed, the three registers B_REG, E_REG1, E_REG2 of the floating point SIMD register FS_REG and the three renaming registers BR_REG, ER_REG1, ER_REG2 of the floating point renaming register FR_REG allocated thereto are used.
When a non-SIMD operation instruction is executed, the basic register B_REG or the first extended register E_REG1 of the floating point SIMD register FS_REG, and the basic renaming register BR_REG or the first extended renaming register ER_REG1 of the floating point renaming register FR_REG, allocated thereto, are used. The second extended register E_REG2 and the second extended renaming register ER_REG2 are not used. In the example of
In the D cycle, the instruction decoder 305 decodes the floating point non-SIMD instruction, and on the basis of the decoding result, registers corresponding entries respectively in the commit stack entry CSE and the floating point reservation station RSF (S1, S2). Further, the instruction decoder 305 allocates a first extended renaming register ER_REG1 of the floating point renaming register FR_REG to the write destination first extended register E_REG1 of the floating point SIMD register FS_REG, and registers the association between the registers in the extended register renaming map ERRM1 of the register renaming unit REG_REN (S3). The instruction decoder 305 then registers the register number or address of the registered renaming register in an entry of the commit stack entry CSE (S4). All other processing is similar to that performed in relation to the SIMD operation instruction in
In the P cycle, the floating point reservation station RSF issues the oldest entry, from among the registered entries for which the required input data are ready, to the SIMD operator 330 (S10). Next, the processing advances to
In the following B cycle, the first extended operand data selector E_SEL1 selects source operand data from any of the load registers 312_B, 312_E1, 312_E2, the result registers Br_reg, Er_reg1, Er_reg2, the renaming registers BR_REG, ER_REG1, ER_REG2, and the registers B_REG, E_REG1, E_REG2, and inputs the selected data into the first extended operator E_EXC1 (S11).
In the X1 to X6 cycles, the first extended operator E_EXC1 executes operation processing on the input data selected by the operand data selector E_SEL1. The first extended operator then stores an operation result in the result register Er_reg1 (S12). Further, when having stored the operation result in the result register, the first extended operator outputs an operation completion report to the commit stack entry CSE (S13).
In the U cycle, the operation result stored in the result register Er_reg1 is stored in the corresponding first extended renaming register ER_REG1 (S14).
In the C cycle, the commit stack entry CSE determines that the SIMD operation instruction is complete on the basis of an operation report from the floating point SIMD operator 330 (S15).
Finally, in the W cycle, the commit stack entry CSE stores the operation result of the first extended renaming register ER_REG1 in the first extended register E_REG1 of the floating point SIMD register FS_REG at a timing when the current non-SIMD operation instruction is ready to completed on the basis of the instruction sequence (S16).
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2014-068415 | Mar 2014 | JP | national |