1. Field
The present disclosure relates to an arithmetic processing unit provided with a register file of a register window scheme, and more particularly, to an arithmetic processing unit which can perform out-of-order execution.
2. Description of the Related Art
A processor implementing a RISC (Reduced Instruction Set Computer) architecture (hereinafter referred to as “RISC processor”) mainly performs register-register arithmetic. A RISC processor intends to accelerate processes by reducing memory accesses. Such architecture is referred to as “load-store architecture”. The RISC processor is provided with a large register file in order to make the register-register arithmetic more efficient. A register file of a register window scheme configured to reduce overhead of passing an argument (save/return of the argument) at the time of invoking a subroutine is known.
A register file 1000 shown in
Wk outs is used for passing an argument to a subroutine invoked by its own routine. Moreover, Wk ins is used for receiving an argument from a parent routine which has invoked its own routine. Since Wk ins and Wk+1 outs (k+1=0 if k=7) as well as Wk outs and Wk−1 ins (k−1=7 if k=0) are configured to overlap in the register file 1000, the passing of the argument and the securing of the register used for the argument can be accelerated at the time of a subroutine call. Wk locals is used as a working register set by each subroutine, that is, a child routine invoked by its parent routine.
Each subroutine uses any one of the 8 register windows W0 to W7 at the time of executing a process. Here, the register window Wk used by a running subroutine (referred to as “current window”) rotates clockwise (in a direction of a dashed arrow shown by “SAVE”) for two windows each time the subroutine call occurs, and rotates counterclockwise (in a direction of a dashed arrow shown by “RESTORE”) for two windows when the subroutine returns.
Each register window Wk in the register file 1000 is managed with a register window number (referred to as “window number”) assigned thereto. For example, a window number k is assigned to the register window Wk. The number k of the register window Wk used by the running subroutine is retained in Current Window Pointer CWP. A value of CWP is incremented with execution of a SAVE instruction or occurrence of a trap, and decremented with execution of a RESTORE instruction or returning from the trap with a RETT instruction. In
The register file 1000 shown in
However, a size and a speed of a circuit for reading the data from such a large register file 1000 could be a problem. An arithmetic processing unit having a configuration shown in
An arithmetic processing unit 2000 shown in
Generally, an increase in the number of the register windows in the register file of the register window scheme increases the number of included registers, which makes it difficult to supply an operand to the arithmetic device quickly. Consequently, a processor shown in
However, if the arithmetic processing unit 2000 has such a configuration, it is not possible to supply an operand required for an instruction following the window switching instruction from the WRF 2002 when the window switching instruction such as the SAVE instruction or the RESTORE instruction is executed, since the WRF 2002 retains only the data in the current window specified by CWR Consequently, the necessary register window data must be transferred from the MRF 2001 to the WRF 2002, causing a problem in which execution of a subsequent instruction stalls until the transfer process is completed.
Moreover, an order in which instructions are executed in a processor provided with an out-of-order execution function is not limited to their order in a program. Processable instructions, rather, are executed first. However, the instruction following the window switching instruction cannot be executed until the necessary register window data is transferred to the WRF 2002, after the window switching instruction has been executed, even if the instruction following the window switching instruction becomes processable.
Such a constraint causes considerable performance deterioration in a processor of a superscalar scheme. A superscalar processor issues a large number of instructions simultaneously, and can perform out-of-order execution of the instructions. Performance deteriorates in such a superscalar processor because an out-of-order execution scheme increases throughput of instruction execution by fetching many instructions, having accumulated those instructions in a buffer, and executing the instructions from the buffer in which the instructions have been accumulated, in order from executable instructions, independently of their execution order in the program.
Consequently, an arithmetic processing unit as shown in
The arithmetic processing unit 3000 can execute the instruction following the window switching instruction out of order by previously transferring data in register windows specified by CWP+1 and CWP−1 from the MRF 3001 to the WRF 3002 with forecast transfer. It should be noted that, in
It is assumed that CWP specifies the current register window W3. An arithmetic device 3003 can execute instructions using the register windows W2 to W4 since data in the register windows W2, W3 and W4 is retained in the WRF 3002. Subsequently, the CWP is incremented to specify the register window W4 after the SAVE instruction has been executed, if the SAVE instruction is executed. Then, data in the register window W5 is transferred from the MRF 3001 via the register group 3113 to the WRF 3002, and the data in the register windows W3 to W5 is retained in the WRF 3002. Thereby, the arithmetic device 3003 can execute instructions using the register windows W3 to W5.
However, it is necessary for the WRF 3002 to be provided with 64 registers since the WRF 3002 in the arithmetic processing unit 3000 retains three lines of register windows. Moreover, since a latch register group is provided with 8 registers, 72 registers are required in total. The WRF 2002 is provided with 32 registers since the WRF 2002 in the arithmetic processing unit 2000 of
Therefore, the arithmetic processing unit 3000 is provided with 40 more registers than those in the arithmetic processing unit 2000, making its circuit size larger. Furthermore, in the arithmetic processing unit 3000, an area, or a circuit size, of a selection circuit (not shown) for transferring the data to the WRF 3002 and the arithmetic device 3003 increases, and also a process of reading the data from the WRF 3002 by the arithmetic device 3003 slows down.
In order to solve this problem, the present applicant has focused on control of an instruction pipeline of the out-of-order execution scheme, and has devised an information processing apparatus having a configuration for transferring/retaining only any one of CWP+1 and CWP−1 with respect to the WRF (see Japanese Patent Laid-Open No. 2007-87108 [US Patent Application Publication 2007-067612]).
The CWR 4020 is provided with register groups 4021 to 4024 which retain data in windows globals (G), locals (L), ins (Io0) and outs (Io1) of the current window. Since each register group is provided with 8 registers, the CWR 4020 is provided with 32(=4×8) registers. The CRB 4030 is provided with register groups 4031 and 4032 which retain only data in windows which do not overlap the data retained in the CWR 4020, among the data in the register window following the current window. The register 4031 retains data in the window locals (L) of the following register window, and the register 4032 retains data in the window ins (Io0) or outs (Io1) of the following register window. Since each of the register groups 4031 and 4032 is provided with 8 registers, the CRB 4030 is provided with 16(=8×2) registers. Therefore, the WRF of the information processing apparatus 4000 is configured with 48 registers.
However, the WRF retains the copy of one register window in the MRF, or storage such as the CRB and the CWR, in both the arithmetic processing unit 3000 and the information processing apparatus 4000. This is costly in hardware and also makes the circuit size larger. Moreover, the information processing apparatus 4000 consumes power for transferring the data from the work buffer ORB 4030, which is provided between the MRF 4010 and the CWR 4020, to the CWR 4020.
It is an object of the present disclosure to realize an arithmetic processing unit which is provided with the register file of the register window scheme and can perform the out-of-order execution of the instruction following the window switching instruction, with a smaller circuit size and lower power consumption than the conventional unit.
According to one aspect of this disclosure, an arithmetic processing unit comprises a register file provided with multiple register windows, in which an arithmetic executor executes an instruction with data retained in said register file as an operand; a current window pointer retains address information specifying a register window which becomes a current window, among the multiple register windows included in said register file, and a controller controls such that said address information retained by said current window pointer is updated, when a window switching instruction for indicating switching of said current window has been decoded, and also, said arithmetic executor reads data in a first register window specified by the address information before being updated and data in a second register window specified by said updated address information from said register file, after the decoding of said window switching instruction has been started until commit of said window switching instruction is started.
The above-described embodiments are intended as examples, and all embodiments are not limited to including the features described above.
a is a diagram showing a configuration example of an MRF_RA1;
b is a diagram showing a configuration example of an MRF_RA2;
Reference may now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
An embodiment of an information processing apparatus will be described below with reference to the drawings.
An arithmetic processing unit has a register file having register windows. The arithmetic processing unit is provided with an out-of-order execution function according to this embodiment. The out-of-order execution of an instruction following a window switching instruction is also enabled while securing a data reading speed in an arithmetic section, by devising data reading from an MRF, without providing a WRF. According to such a configuration, the arithmetic processing unit of this embodiment realizes lower power consumption by reducing a circuit area of the arithmetic processing unit, as well as reducing power consumption by eliminating data transfer between work buffers (between a CWR and a CRB).
The arithmetic processing unit of this embodiment differs from a conventional arithmetic processing unit in that it is not provided with the WRF. In this embodiment, as shown in
As shown in
By employing a circuit configuration discussed above, unless data retained in the MRF_RA1 or the MRF 10 is not updated, the data is read in one cycle from the MRF_RA2 to the arithmetic section 30, and the out-of-order execution of the instruction in the arithmetic section 30 is enabled. Moreover, after the data retained in the MRF_RA1 or the MRF 10 is updated, if it is assumed that it takes N cycles for the update to have an effect on the data read out to the arithmetic section 30, the out-of-order execution of the instruction in the arithmetic section 30 is enabled in all cases by causing dispatch of a following instruction to be stalled for N-1 cycles after the data retained in the MRF_RA1 or the MRF 10 is updated.
Moreover, data in a register in which an arithmetic result has been once retained at an Update Buffer stage in the instruction pipeline, that is, a reorder buffer (ROB) 31 in
As described above,
An arithmetic processing unit 1 shown in
The arithmetic processing unit 1 of this embodiment accesses the register window in the MRF 10 specified by CWP to read/write the data with respect to the register window. The arithmetic processing unit 1 of this embodiment accesses the register window with a combinational circuit provided in the control section 20 and the registers provided in the MRF 10 (for example, the MRF_RA1 and the MRF_RA2). The control section 20 outputs a signal for indicating arithmetic execution of the instruction with respect to the arithmetic section 30.
The MRF 10 is a register file of the register window scheme. The register window in the MRF 10 is specified by the CWP register. The arithmetic section 30 reads the data from the register window in the MRF 10 and uses the read data to execute an arithmetic operation instruction, a logical operation instruction or the like. Then, a result of executing the instruction is written in the specified register window in the MRF 10.
First, a configuration of the instruction pipeline shown in
Functions of the respective stages are as follows.
The Fetch stage: read the instruction from a memory.
The Issue stage: decode the instruction and register a result of the decoding in a reservation station.
The Dispatch stage: issue the instruction from the reservation station.
The Operand Read stage: read an operand to an arithmetic device.
The Execute stage: execute the instruction.
The Update Buffer stage: Wait for an execution result.
The Commit stage: Complete the instruction.
The Fetch stage is a stage for reading the instruction from the memory, and the Issue stage is a stage for decoding the instruction and registering the result in the reservation station. The Fetch stage and the Issue stage are executed in order (IO.FD).
The Dispatch stage is a stage for issuing the instruction from the reservation station. Moreover, the Execute stage is a stage for executing the instruction issued from the reservation station. The Update Buffer stage is a stage for waiting for the result of the execution at the Execute stage, in order to realize in-order completion. The Dispatch stage, the Execute stage and the Update Buffer stage are executed out of order (OOO.PBXU).
The Commit stage is a stage for completing the instruction. At the Commit stage, the reservation station is used to realize the in-order completion (IO.W). The reservation station has saved information on whether or not the completion has been performed with respect to the instruction executed by the arithmetic device, or the execution result. At the Commit stage, the instruction is completed in order with reference to the reservation station.
In this way, the instruction pipeline of the arithmetic processing unit 1 has a configuration for executing out-of-order processes for the out-of-order instruction issuing/the in-order completion.
{Configuration of MRF 10}
As shown in
The control section 20 of
The register control section 210 is provided with a port assignment control section table 211, a SET register 212, the CWP register 213 and a set, cwp control device 214.
The port assignment control section table 211 is a table in which a value to be set to the MRF_RA1 that is, a port assignment state which will be described later has been stored.
The MRF 10 of this embodiment is provided with 8 register windows which are logically configured in a ring shape, similarly to the MRF 4010 shown in
The readout ports 10 and 11 are ports for reading data in a local register of the register window specified in the MRF_RA1. A multiplexer 231 is provided at the readout port 10 and a multiplexer 232 is provided at the readout port 11. To the multiplexers 231 and 232, data in local registers of windows for respective local registers (W0 locals to W7 locals) of 8 register windows (W0 to W7) is inputted.
The readout ports io0, io1 and io2 are ports for reading data in an in-register or an out-register of the register window specified in the MRF_RA1. A multiplexer 241 is provided at the readout port io0 and a multiplexer 242 is provided at the readout port io1. Moreover, a multiplexer 243 is provided at the readout port io2. To the multiplexers 241 to 243, data in in-registers/out-registers of windows for respective in-registers/out-registers (W0 ins to W7 ins, W0 outs to W7 outs) of the 8 register windows (W0 to W7) is inputted.
The MRF_RA1 is a register which stores the value outputted from the control section 20, that is, the port assignment state which will be described later. The value to be set to the MRF_RA1 is updated at the Issue stage or the Commit stage of the instruction for updating the CWP register 213 provided in the register control section 210 (“window switching instruction” or “register window switching instruction”). The arithmetic processing unit 1 uses the value set to the MRF_RA1 to read the data in the register window specified by a current window pointer value of the CWP register 213, from the MRF 10.
The MRF_RA2 is a register which specifies the number of the register read out for each operand by the arithmetic device in the arithmetic section 30, and is controlled by the register control section 210. The MRF_RA2 has a value determined at the Dispatch stage in the instruction pipeline, and indicates the data in the register of the register window read out from the MRF 10, which is used by the arithmetic section 30 at the following Execute stage.
a is a diagram showing a configuration example of the MRF_RA1.
The MRF_RA1 shown in
b is a diagram showing a configuration example of the MRF_RA2.
The MRF_RA2 shown in
The multiplexer provided at each of the five readout ports I0, I1, io0, io1 and io2 of the MRF 10 has its output controlled with the port ID as a selection signal outputted from the MRF_RA1.
Data in the window 251 for the in-register/out-register of the register windows W0 to W7, that is, the data in the eight local registers is outputted to the readout ports io0, io1 and io2 Data in the window 252 for the local register of the specified register window is outputted to the readout ports I0 and I1. The window data output to these readout ports is selectively output by the multiplexers 241 to 243, 231 and 232 provided at the respective ports.
The window data in the register windows selectively outputted from the five multiplexers 231, 232 and 241 to 243 is inputted to a multiplexer 261. Data in the window 253 for the global register is also inputted to the multiplexer 261. According to the port specification address and the register specification address outputted from the MRF_RA2, the multiplexer 261 selects the data in one window among the data in the windows selectively outputted from the five multiplexers 231, 232 and 241 to 243 and the window 253 for the global register, and further selects one among the data in the eight registers included in the data in the selected window. Then, the multiplexer 261 outputs the data in the selected register to the arithmetic section 30.
The MRF 10 is further provided with a multiplexer 271. This multiplexer 271 outputs write data such as an arithmetic result with respect to the MRF 10, which is inputted from the arithmetic section 30, to the selected register. The multiplexer 271 is controlled by the register control section 210, and outputs the write data to the register specified by the arithmetic section 30.
If the register file 100 included in the MRF 10 is configured with the eight register windows like this embodiment, when the value of the CWP register 213 (hereinafter referred to as “cwp”) has been changed with the window switching instruction such as a SAVE instruction or a RESTORE instruction, a minimum number of states which cover all combinations for assigning the window for the out-register (outs) of a current window before being switched and the window for the in-register (Ins) of the current window after being switched, which are physically the same register, to the same readout port of the MRF 10 is 24. Since cwp takes eight values of “0” to “7”, when cwp goes around the values of “0” to “7” for three times, the state returns to an original state. Consequently, cwp going around “0” to “7” for three times is regarded as one set. Therefore, “0” to “2” are assigned as values in the SET register 212 (hereinafter referred to as “set”) and these values are cyclically changed each time the window switching instruction is executed.
As shown in
As shown in
The port assignment state 215 is configured with the five port IDs corresponding to the five readout ports I0, I1, io0, io1 and io2 of the MRF 10 shown in
% I is set to the port IDs I0 and I1, and % i or % o is set to the port IDs io0 to io2. % I, % i and % o are addresses for specifying the window for the local register, the window for the in-register and the window for the out-register of the register window of the MRF 10 specified by cwp, respectively. % I is the address for specifying the window for the local register of any one of the eight lines of register windows W0 to W7 included in the MRF 10. % i is the address for specifying the window for the in-register of any one of the eight lines of register windows W0 to W7. Moreover, % o is the address for specifying the window for the out-register of any one of the eight lines of register windows W0 to W7.
In each state, two fields are blank among five fields in the port assignment state 215. These blanks represent “no address specification”. % I is inputted as a selection signal to the multiplexers 231 and 232 provided at the respective readout ports I0 and I1 of the local register in the MRF 10. % i and % o are inputted as selection signals to the multiplexers 241 to 243 provided at the respective readout ports io0, io1 and io2 of the in-registerlout-register in the MRF 10.
Therefore, for example, when the port assignment state 215 of (set,cwp)=(0,2) is set to the MRF_RA1, the window for the local register Wk locals, the window for the in-register Wk ins, and the window for the out-register Wk outs, which are specified by % I, % i and % o, are outputted from the readout ports I0, io2 and io0 of the MRF 10, respectively. In this condition, if the window switching instruction is decoded and cwp is incremented by “1” to transit to (set,cwp)=(0,3), the window for the local register Wk locals, the window for the in-register Wk ins, and the window for the out-register Wk outs, which are specified by % I, % i and % o, are outputted from the readout ports I1, io0 and io1 of the MRF 10, respectively. In this case, the window for the local register Wk locals and the window for the in-register Wk ins, which have been specified by the port assignment state 215 of (set,cwp)=(0,2), are outputted from the readout ports I0 and io2 of the MRF 10, respectively. This enables the out-of-order execution of an instruction preceding the window switching instruction, which uses the register window W2 specified by cwp=2, and an instruction following the window switching instruction, which uses the register window W3 specified by cwp=3. Subsequently, when the commit of the window switching instruction is started, only the port assignment state 215 of (set,cwp)=(0,3) becomes valid, and the readout ports I0 and io2 of the MRF 10 are closed. This prohibits the execution of the instruction preceding the window switching instruction. This is because the instruction pipeline of this embodiment has the in-order completion.
In this embodiment, two readout ports of the local register are provided in the MRF 10, and each time the switching of the register window occurs, the local register of the current window is read out alternately from these two readout ports I0 and I1. Moreover, in the MRF 10, three readout ports of the in-register/out-register are provided. In this case, since the out-register of the current window before being switched and the in-register of the current window after being switched are physically the same register, these registers perform reading from the same readout port, and each time the window switching instruction is executed, the readout port of the in-register is switched cyclically as io0→io1→io2→io0→io1. In this embodiment, control of reading the window data from the five readout ports of the MRF 10 is enabled by storing the port assignment state 215 in the 24 entries of the port assignment control section table 211 in a form as shown in
The set, cwp control device 214 controls setting of the values of the SET register 212 and the CWP register 213. Window switching information is inputted to the register control section 210 from the instruction control section 220. This information is, for example, information showing whether the instruction to be decoded is the SAVE instruction or the RESTORE instruction. If the instruction to be decoded is the SAVE instruction, the set, cwp control device 214 increments cwp by “1”. By incrementing cwp, if cwp becomes “8”, cwp is reset to “0” and set is incremented by “1”. By incrementing set, if set becomes “3”, set is reset to “0”.
First, a flowchart of
The set, cwp control device 214 checks the window switching information inputted from the instruction control section 220 and determines whether or not the instruction to be decoded is the SAVE instruction (S11). If the instruction to be decoded is not the SAVE instruction, the process is completed. On the other hand, if it is determined that the instruction to be decoded is the SAVE instruction at operation S11, cwp is incremented by “1” and subsequently a result of the increment is divided by the number of the register windows (eight in the case of this embodiment) to obtain the remainder. Then, the remainder is set to cwp and cwp is updated (S12).
Next, it is determined whether or not cwp is “0” (S13), and if cwp is not “0”, the process is completed. If it is determined that cwp is “0” at operation S13, set is incremented by “1” and next a result of incrementing set is divided by “3”. Then, the remainder is set to set, set is updated (S14), and the process is completed.
Next, a flowchart of
The set, cwp control device 214 checks the window switching information inputted from the instruction control section 220 and determines whether or not the instruction to be decoded is the RESTORE instruction (S21). If the instruction to be decoded is not the RESTORE instruction, the process is completed. On the other hand, at operation S22, its decrement result is divided by the number of the register windows to obtain the remainder, Then, the remainder is set to cwp and cwp is updated (S22).
Next, it is determined whether or not cwp is “7” (S23), and if cwp is not “7”, the process is completed. If it is determined that cwp is “7” at S23, set is decremented by “1” and next a result of decrementing set is divided by “3” Then, the remainder is set to set, set is updated (S24), and the process is completed.
Initial values of set and cwp are “0”. According to the processes of
When the value of the SET register 212 (set) and the value of the CWP register 213 (cwp) of
Next, a configuration of the instruction control section 220 will be described.
The instruction control section 220 is provided with an execution timing control function 221 for the instruction following the window switching instruction, a rename register release control function 222 and an MRF_RA2 control function 223.
The execution timing control function 221 is a control function of causing the decoding of the instruction following the window switching instruction to be stalled until the MRF_RA1 is updated and the reading of the register file from the MRF 10 is enabled. The instruction control section 220 uses this control function to control the arithmetic section 30 to perform the stalling. This control will be described in detail later.
The rename register release control function 222 is a control function of releasing a source of a rename register 31 with the completion of the instruction and causing the released source to be available to an instruction to be newly decoded. The instruction control section 220 uses this control function to control the arithmetic section 30 to execute the release of the source of the rename register.
The MRF_RA2 control function 223 is a function of interpreting an operand register number included in the instruction. Via the register control section 210, the instruction control section 220 controls the MRF_RA2 to cause the data in the register specified by the operand register number to be selectively outputted from the multiplexer 261.
The arithmetic section 30 is provided with an instruction pipeline mechanism of
If transfer of the register data from the MRF 10 to the arithmetic section 30 requires multiple cycles, a timing at which the register data in the register window which has been newly switched cannot be read occurs.
In
When the arithmetic section 30 executes three instruction columns in the instruction pipeline, the arithmetic section 30 executes the instructions in order of the instruction of cwp=3, the SAVE instruction and the instruction of cwp=4, until IO.FD. At this time, when the arithmetic section 30 decodes the SAVE instruction at the D stage (if the arithmetic section 30 executes the Issue stage in a period of b of
In cycle 1 the window switching instruction is decoded (D) at the arithmetic section 30, and in cycle 2, a signal for indicating modification of the MRF_RA1 is transmitted from the instruction control section 220 to the register control section 210 (a of
In this embodiment, if the transfer of the data in the register from the MRF 10 to the arithmetic section 30 requires multiple cycles, a timing at which the arithmetic section 30 cannot read the data occurs, which is triggered by writing the data to the MRF 10. In the case where the arithmetic section 30 cannot read the data, a pipeline bubble occurs in the instruction pipeline if there is no other instruction to be assigned with an execution right. In this embodiment, this pipeline bubble is suppressed by controlling the release of the rename register 31.
It is assumed that the arithmetic section 30 executes an instruction column of instructions A to F shown in
In the execution of the instruction column, a result of executing the instruction A is stored in the rename register 31 in cycle 4 and stored in the MRF 10 in cycle 5. Consequently, the result of executing the instruction A can be read from the MRF 10 in cycle 6 or later. Therefore, the instruction B following the instruction A uses the result of executing the preceding instruction A (a) by bypassing it in cycle 3, and the following instruction C uses the result of executing the preceding instruction A by reading it from an arithmetic result register (b) in cycle 4. Moreover, the following instruction D uses the result of executing the preceding instruction A by reading it from the rename register (c) in cycle 51 and the following instructions E and F use the result of executing the preceding instruction A by reading it from the MRF 10 (d) in cycles 6 and 71 respectively.
Next,
In this embodiment, after the data has been written in the MRF 10, if the data cannot be read from the MRF 10 for a certain period of time, the data is controlled to be read from the rename register 31 instead of the MRF 10 during the period.
As shown in
In this way, in this embodiment, although a start timing in which the data can be read from the MRF 10 delays, problems associated with it are prevented by delaying the release of the rename register 31.
This embodiment has been applied to the control of reading the data from the MRF 10 before and after switching the window, by using the port assignment control section table 211, with respect to the MRF 10 provided with the eight lines of register windows.
A method of reading the register from the readout ports of the MRF 10 in this embodiment will be described with reference to
Although the MRF 10 is provided with the local register, the in-register, the out-register and the global register, since the global register is common to all register windows and the window switching has no effect on it, the global register will be omitted in the following description.
The MRF 10 of this embodiment is provided with two local register ports (I0 and I1) for reading the data in the local register, and three in-register/out-register ports (io0, io1 and io2) for reading the data in the in-register/out-register.
One local register port and two in-register/out-register ports are used with respect to one CWP. Consequently, one remaining local register port and one remaining in-register/out-register port are not used. When the value of CWP is switched, in order to read the local register and the out-register (in-register) of the register window specified by a new value of CWP, the unused readout ports of the MRF 10 are assigned to the respective registers.
As shown in
Now, as shown in table A of
In this condition, if the SAVE instruction is executed ((1) of
Furthermore, when the SAVE instruction is committed (W), the register reading from the readout port I0 and the readout port io2 of the MRF 10 is not performed. This is because the instruction of cwp=2 preceding the SAVE instruction in a program does not subsequently refer to the register since the commit is performed in order. The present disclosure is not limited thereto, and the reading can also be continuously performed depending on the situation.
Subsequently, when the RESTORE instruction is executed, cwp is decremented by “1” to transit to a condition of (set,cwp)=(0,2). The contents of the MRF_RA1 are modified at a timing of decoding (D) this RESTORE instruction, and until this RESTORE instruction is completed (table D of
As described above, the MRF_RA1 is controlled by the register control section 210, and the out-of-order execution of multiple instructions existing before and after the window switching instruction in the program is enabled. It should be noted that although OOO.PBXU is shown by one line in
An arithmetic processing unit 300 shown in
There are two lines of fixed-point arithmetic pipelines. One pipeline is provided with an ALU (Arithmetic Logic Unit), a SHIFT arithmetic device (SFT), a multiplier (MPY), a divider (DVD) and a VIS (Virtual Instruction set) arithmetic device, and the other pipeline is provided with the ALU and the SHIFT arithmetic device. Moreover, there are two lines of address arithmetic pipelines separately from the fixed-point pipelines.
The MRF 10 is provided with the eight lines of register windows. The program performs tasks on the register belonging to the current window, and the window switching is mainly performed by the window switching instruction at the time of invoking and returning of a subroutine. The data in the current window has been previously selected from the MRF 10, and when the arithmetic is executed, source data (source operand) can be supplied to the arithmetic device in one cycle. Furthermore, with the decoding of the window switching instruction as a trigger, data in a register window of a switching destination is also controlled to be previously selected from the MRF 10, and the instructions are not delayed even in the case of invoking the subroutine.
The configuration of the arithmetic processing unit 300 will be described in more detail.
An MRF 301 is a block showing the eight lines of register windows shown particularly in
The arithmetic processing unit 300 shown in
Data retained in the register 321 is outputted to a multiplexer 341, and data retained in the register 322 is outputted to a multiplexer 342. Moreover, data retained in the register 323 is outputted to a multiplexer 343, and data retained in the register 324 is outputted to a multiplexer 344. To the multiplexers 341 to 344, the data in the primary data cache is also inputted from the register 320. To the multiplexers 341 and 342, the arithmetic results retained in the registers 361 and 362 are also inputted.
The multiplexer 341 selects one of three input data and outputs it as operand data to an ALU/SFTNIS arithmetic device 331, a multiplier (MPY) 332 or a divider (DVD) 333. The multiplexer 342 selects one of three input data and outputs the selected data as the operand data to an ALU/SFT arithmetic device 334. The multiplexer 343 selects any one of two input data and outputs it to an address generator (AGEN) 335. The multiplexer 344 selects any one of two input data and outputs it to an address generator (AGEN) 336.
The ALU/SFTNIS arithmetic device 331, the multiplier 332 and the divider 333 output the arithmetic results to a multiplexer 351. The ALU/SFT arithmetic device 334 outputs the arithmetic result to a multiplexer 352. The address generator 335 outputs the arithmetic result (address) to a multiplexer 353. The address generator 336 outputs the arithmetic result (address) to a multiplexer 354.
The multiplexer 351 inputs the arithmetic results of the ALU/SFTNIS arithmetic device 331, the multiplier 332 and the divider 333, selects one of those arithmetic results, and outputs the selected arithmetic result to the register 361. The multiplexer 352 inputs the arithmetic result of the ALUISFT arithmetic device 334 and outputs the arithmetic result to the register 362. The multiplexer 353 inputs the arithmetic result of the address generator 335 and outputs the arithmetic result to a register 363. The multiplexer 354 inputs the arithmetic result of the address generator 336 and outputs the arithmetic result to a register 364.
The register 361 outputs the arithmetic result inputted from the multiplexer 351 to the ROB 31, the multiplexers 311 to 314, and the multiplexers 341 and 342. The register 362 outputs the arithmetic result inputted from the multiplexer 352 to the ROB 31, the multiplexers 311 to 314, and the multiplexers 341 and 342.
The register 363 outputs the arithmetic result inputted from the multiplexer 353 as the address to the primary data cache. The register 364 outputs the arithmetic result inputted from the multiplexer 354 as the address to the primary data cache.
The selectively outputted data of the multiplexers 341 and 342 are outputted to a multiplexer 371. The multiplexer 371 outputs the selectively outputted data to a register 381. The register 381 retains the selectively outputted data and outputs it as the data to the primary data cache.
Incidentally, the registers 321, 322, 323 and 324 provided between the multiplexers 311, 312, 313 and 314 and the arithmetic devices 331 to 333, the arithmetic device 334, the arithmetic device 335 and the arithmetic device 336 are provided in order to separate the B stage from the X stage in the instruction pipeline shown in
The register file is not limited to the register file of the overlap window scheme as the MRF 10. For example, the register file can also be applied to a huge register file having a flat configuration as shown in
A register file 400 shown in
In this way, the register file 400 can be used as alternative of the MRF 10 by dividing the register file 400 having the flat configuration into multiple sequential windows.
As described above, the arithmetic processing unit I of this embodiment can quickly supply the operand data from the register file 100 of the overlap window scheme in the MRF 10 to the arithmetic section 30 without providing the MRF or the CRB and the CWR as in the conventional arithmetic processing unit. Moreover, this embodiment realizes this high-speed reading of the data from the register file 100 by providing the MRF_RA1, the MRF_RA2 and the readout ports io0 to io2, I0 and I1 within the MRF 10, and providing a control circuit for reading the data in the register from the MRF 10, outside the MRF 10.
The register control section 210 is configured with the port assignment control section table 211, the SET register 212, the CWP register 213 and the set, cwp control device 214. However, the CWP register 213 is CWP which has been also included in the conventional arithmetic processing unit, and the MRF_RA1, the MRF_RA2, the port assignment control section table 211 and the SET register 212 can be constructed with smaller circuits in comparison with storage included in the conventional arithmetic processing unit.
Moreover, the set, cwp control device 214 can be realized with the combinational circuit, and its circuit size can be small. Moreover, the execution timing control function 221 for the instruction following the window switching instruction, the rename register release control function 222 and the MRF_RA2 control function 223 included in the instruction control section 220 can also be realized with the small combinational circuit. Moreover, the number of the readout ports provided in the MRF 10 is also a small number, that is, five lines (io0 to io2, I0 and I1). Therefore, in the case of considering the unit as a whole, the arithmetic processing unit 1 of this embodiment can have a smaller circuit size than that of the conventional arithmetic processing unit.
Moreover, the arithmetic processing unit 1 of this embodiment also has lower power consumption since power consumption is not required for transferring the data in the register window between the CRB and the CWR. Moreover, this embodiment has a lower hardware cost than that of the conventional arithmetic processing unit.
It should be noted that the present disclosure is not limited to the above described embodiment, and can be variously transformed and implemented in a range not deviating from the gist of the present disclosure.
Therefore, the register file to which the present disclosure can be applied is not limited to the above described register file. For example, the present disclosure can also be applied to a register file of the overlap window scheme provided with multiple lines of windows for the global register. Moreover, the present disclosure can also be applied to a register file having a configuration in which the address of the current window specified by the current window pointer is randomly updated, instead of being serially updated, each time the window switching instruction is executed.
Although a few preferred embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2007-069613 | Mar 2007 | JP | national |