The invention relates to a method and to an apparatus for pipeline processing a chain of processing instructions, in particular to instruction scheduling and result forwarding logic of Reduced Instruction Set Computer (RISC) architectures.
Processor instruction pipelines, which split the processing of individual instructions into several (sub)stages and thus reduce the complexity of each stage while simultaneously increasing the clock speed, are typical features of RISC architectures. Such pipeline has a throughput of one instruction per cycle but a latency of several, or ‘n’, cycles per instruction. Such behaviour causes two implications relevant for the invention:
RAW hazards can be avoided by using a ‘scoreboard’, which scoreboard typically features an individual entry per address of above register file. Once an instruction enters the pipeline, a flag is set at the address of the destination address (i.e. the result address) of this particular instruction. This flag signals that an instruction inside the pipeline wants to write its result to the respective register address. Hence the result is unavailable as long as the flag is set. It is cleared after the instruction process has successfully written the result into the register file. Any subsequent instruction that wants to enter the pipeline must check whether the flag is set for at least one of its source (i.e. operand) register addresses. The instruction is not allowed to enter the pipeline as long as these flags are not cleared. Therefore the scoreboard must be accessed every cycle.
E.g. in John L. Hennessy, David A. Patterson: “Computer Architecture: A Quantitative Approach”, Morgan Kaufmann Publishers, ISBN: 1558605967, 3rd edition 15 May 2002, scoreboard architectures are described in detail.
A disadvantage of known scoreboard solutions is that they use comparably costly and communication-intensive low-speed implementations of the forwarding and instruction scheduling logic. To implement such forwarding for each instruction intending to enter the pipeline, it must be checked for each operand, whether the operand address shows up as destination register on one of the pipeline stages following generation of results. Especially in case of processing units featuring differing delays, quite a few pipeline stages carry results suitable for forwarding. The known forwarding implementation requires concurrent communication with all of them.
According to the invention, not only a single flag but the number, or a corresponding codeword, of the pipeline stage, which currently carries the instruction that wants to write its result (or operand) to the particular register file address, and the type of the respective instruction (or operand, whereby this type can be a binary encoded code word) is stored in the corresponding scoreboard or register file address at the address of the destination address (i.e. result address) of the particular instruction (or operand). On one hand this feature requires slightly more storage space in the scoreboard, but on the other hand it simplifies RAW-hazard detection and in particular instruction forwarding. In other words, while known scoreboard architectures use a single bit for marking that a particular destination register address is being used by an instruction currently processed within the instruction pipeline, the invention employs a more complex data item designating the number of the current pipeline stage of the respective instruction and the type of that instruction. Advantageously, this specific information item can be used to calculate the necessary number of stall cycles to prevent a RAW hazard and/or the pipeline stage from which the result (or operand) can be forwarded. Otherwise the results (or operands) of all pipeline stages used for forwarding would need to be monitored and the issue logic would need to access the scoreboard each cycle for checking whether the respective flag is set. Logic and wiring required for such purposes would be costly and processing speed slow.
A problem to be solved by the invention is to facilitate increased processing speed in pipeline processing.
Advantageously, costly and potentially low-speed bus snooping logic used for result forwarding in RISC architectures becomes obsolete. The efficiency of Read after Write (RAW) pipeline hazard detection is also increased.
In principle, the inventive method is suited for pipeline processing a chain of processing instructions, including the step:
In principle the inventive apparatus is suited for pipeline processing a chain of processing instructions and includes:
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
In
The forwarding of the FWDB bus outputs from stages STG3-STGn-1 to bus FWDB is controlled by respective stage output control signals STG3OC to STGN-1OC, which are provided by scoreboard SCB. Because of the general principles of pipeline processing, normally it makes no sense that stages STG1 and STG2 forward any intermediate or partial results to bus FWDB. But, depending on the application as mentioned above, any of stages STG2, STG1, STG4, STG5, . . . , may in addition or may not be accompanied by respective stage output control signals STG2OC, STG1OC, STG4OC, STG5OC, . . . .
E.g. a value ‘0’ is written at the address of the destination register in the scoreboard SCB upon an instruction entering the pipeline (pipeline stage STG0). All stage counter entries related to destination register addresses of instructions that had previously entered the first pipeline stage are incremented every new cycle if the pipeline is not stalled, e.g. due to an RAW hazard. Therefore the current stage number is always kept up-to-date. When the corresponding instruction leaves the pipeline (pipeline stage STGn-1) the counter is incremented to value ‘n’. An entry value ‘n’ is not incremented.
In other words, the current pipeline stage counting number is kept up-to-date, and upon a processed processing instruction leaving the last pipeline stage STGn-1 of the chain of pipeline stages, the pipeline stage counting number is set to an end value that is no more incremented.
This kind of processing can be carried out by using an individual incrementer within CTRL for each register address. Control stage CTRL provides the control signals STG3OC to STGN-1OC mentioned above in connection with
Let x be the final number of the pipeline stage that generates the results, which number—depending on the instruction type—is also stored in the scoreboard SCB.
Let y be the scoreboard entry of an operand address of an instruction intended for entering the pipeline. Then, the number of required stall cycles can easily be calculated by just subtracting y from x. If the result is smaller than or equal to ‘0’, no stall is required. If y does not equal n, forwarding is required. The pipeline stage actually forwarding the result is directly pointed to by y, i.e. signal OC-STGy.
Hence, no communication with the individual pipeline stages is required for forwarding. The scoreboard SCB is accessed by stage STG0 only. All communication is kept local, which saves global wiring (such wiring makes processing slow in modern sub-μ silicon technologies). Potentially costly and low-speed logic for communication is also saved.
For example, a SPARC V8 RISC processor can be used to implement the invention whereby an internal interface for the floating point unit can be redesigned according to the invention in order to achieve better performance. The floating point pipeline can have a length of eight stages, wherein the floating point operations can generate their results in the 6th stage and the load operation can take place already in the 2nd stage. Hence, especially the load instructions require extensive forwarding.
The implementation has been fully verified using VHDL-simulations on Register Transfer Level and by rapid proto-typing implementations on FPGA-boards.
The inventive pipeline processing is preferably performed electronically and/or automatically.
Instead of using hardware the invention can also be carried out by using corresponding software.
Number | Date | Country | Kind |
---|---|---|---|
03090089.8 | Mar 2003 | EP | regional |