The disclosures herein relate generally to processors, and more particularly, to speeding up the execution of shift/rotate instructions in processors.
Processors execute software programs that include a series of instructions. Typical instructions include an opcode and one or more operands. An opcode tells the processor to perform a particular function such as LOAD, STORE, ADD, PUSH, POP and SHIFT/ROTATE. The operand tells the processor on which object or objects to carry out the function that the opcode specifies.
Shift instructions instruct the processor to shift an operand in a data field by a specified amount either to the left or to the right. For example, a shift right instruction instructs the processor to move a quantity in a data field by a shift amount of 1 bit to the right. Another shift instruction may instruct the processor to move a quantity in a data field by a shift amount of 3 bits to the left. The processor fills with zeros, or other data, those bits within the data field that become empty as a result of a simple shift operation. Rotate instructions are a special type of shift instruction that instructs the processor to shift data within the data field. However, with rotate instructions, the processor performs a wraparound operation such that data that falls off one end of the data field as a result of the shift rotates back to the other end of the data field.
Modern processors utilize the technique of pipelining to divide each instruction of a program into a series of smaller steps. By using pipelining, the processor performs the steps in parallel with other steps to increase the effective execution speed of the processor. A typical pipeline for processing a shift/rotate instruction includes the stages shown in TABLE 1 below:
In this conventional pipelining technique, an execution unit in the processor receives a shift/rotate instruction to execute, as TABLE 1 indicates above in the ISS or issue stage. Next, in a register file (RF) stage, the processor reads operands for the shift/rotate instruction from a register file. In the following EX or execute stage, the processor both decodes a shift amount associated with the shift/rotate instruction and actually performs, or executes, the shift operation. Next, in the write back (WB) stage, the processor writes the result of the shift/rotate operation to the register file. Ultimately, the processor may send the result to a main system memory for storage. In this conventional processor pipelining approach, the shift/rotate instruction requires several processor cycles to complete the execution of the instruction. The latency of the longest stage in the pipeline limits the execution speed of the processor. As seen in Table 1, since the EX execute stage of the pipeline includes both shift decode and shift execute, the EX execute stage limits the execution speed or frequency of the processor.
What is needed is a method and apparatus that executes shift/rotate instructions more quickly and efficiently.
Accordingly, in one embodiment, a method is disclosed for processing instructions in a processor The method includes receiving, by an instruction unit, an instruction stream including a plurality of instructions. The method also includes determining, by the instruction unit, if a shift/rotate instruction in the instruction stream is an immediate shift/rotate instruction or a register dependent shift/rotate instruction. The method still further includes immediately executing, by a shift/rotate functional unit, the shift/rotate instruction if the instruction unit determines that the shift/rotate instruction is an immediate shift/rotate instruction. The method also includes substituting, by the instruction unit, first and second substitute instructions in the instruction stream in place of the shift/rotate instruction if the instruction unit determines that the shift/rotate instruction is a register dependent shift/rotate instruction. The first substitute instruction instructs that a shift amount be stored in a shift amount register in the shift/rotate functional unit. The second substitute instruction instructs that the shift/rotate functional unit shift data by the shift amount stored in the shift amount register.
In another embodiment, a processor is disclosed that includes an instruction unit that receives an instruction stream including a plurality of instructions. The instruction unit determines if a shift/rotate instruction in the instruction stream is an immediate shift/rotate instruction or a register dependent shift/rotate instruction. The processor includes a shift/rotate functional unit, coupled to the instruction unit, that immediately executes the shift/rotate instruction if the instruction unit determines that the shift/rotate instruction is an immediate shift/rotate instruction. The instruction unit also includes a substitution apparatus that substitutes first and second substitute instructions in the instruction stream in place of the shift/rotate instruction if the instruction unit determines that the shift/rotate instruction is a register dependent shift/rotate instruction. The first substitute instruction instructs that a shift amount be stored in a shift amount register in the shift/rotate functional unit. The second substitute instruction instructs that the shift/rotate functional unit shift data by the shift amount stored in the shift amount register.
The appended drawings illustrate only exemplary embodiments of the invention and therefore do not limit its scope because the inventive concepts lend themselves to other equally effective embodiments.
A ROTATE instruction is a type of SHIFT instruction typically in one of the forms shown in
In one conventional processor 100, the processor decodes a SHIFT/ROT instruction and executes the SHIFT/ROT instruction in the same pipeline stage. More particularly, as seen in TABLE 1 above, the processor reads operands from the register file in the RF stage of the pipeline. Then, in the next processor cycle, the processor both decodes the SHIFT/ROTATE instruction to determine the shift amount and executes the instruction. The decoding and execution occur in the same pipeline stage, namely the EX execute stage. Decoding and execution represent serial tasks in that the processor performs one before the other, thus resulting in a lengthy EX execute pipeline stage that limits processor performance. In other words, in this approach the processor serializes the decoding and shifting.
The disclosed processor 400 of
This pipeline enables immediate SHIFT/ROT instructions to execute more quickly than the conventional pipeline of TABLE 1. In this embodiment, the processor decodes the shift amount of immediate SHIFT/ROT instructions in the pipeline stage before the EX execute stage, namely in the RF register file stage. Thus, the processor is ready to execute the immediate SHIFT/ROT instruction when the processor reaches the EX execute stage without waiting for decoding in that stage. Processor 400 may perform the decoding task in the RF register file pipeline stage in parallel with other tasks.
Processor 400 can also handle register dependent SHIFT/ROTATE instructions depicted in
In more detail, processor 400 includes an instruction cache 420 and a data cache 425. Instruction cache 420 stores instructions from a software program that processor 400 executes. Data cache 410 stores data that processor 400 requires to execute instructions. Processor 400 includes functional units such as an arithmetic logic unit (ALU) 430 that performs arithmetic operations such as ADD and SUBTRACT. Processor 400 also includes a SHIFT/ROTATE functional unit or engine 405 that performs shift and a rotate operations. Processor 400 may include other functional units, such as load and store functional units (not shown), for example.
Instruction cache 420 couples to an instruction unit 435 that decodes instructions in an instruction stream that it receives from instruction cache 420. Processor 400 handles register dependent SHIFT/ROT instructions in a different manner than immediate SHIFT/ROT instructions.
A control unit 455 couples instruction unit 435 to SHIFT/ROTATE functional unit 405 and register file 415 as shown. Control unit 455 controls the processes carried out by SHIFT/ROTATE engine 405 and register file 415 in the course of executing SHIFT/ROTATE instructions. Control unit 455 includes an instruction register 460 that provides a decoded SHIFT/ROTATE instruction to functional unit 405.
SHIFT/ROTATE functional unit 405 includes the shift amount register (SAR) 410 that stores the shift amount, namely Ramount, specified by a register dependent SHIFT/ROTATE instruction that processor 400 executes. When processor 400 encounters such a register dependent SHIFT/ROTATE instruction, instruction unit 435 decodes the shift amount as a quantity stored at a location in register file 415, namely the Ramount register 445 therein. In response to a request by control unit 455, processor 415 sends the contents of Ramount register 445 to shift amount register (SAR) 410. Thus, SAR 410 stores the shift amount needed by register dependent SHIFT/ROTATE instructions while instruction register 460 stores the shift amount specified by immediate SHIFT/ROTATE instructions, namely the shift amount contained within the instruction itself. The IMMED signal applied to the IMMED input of multiplexer 465 determines whether multiplexer (MUX) 465 sends the shift amount in SAR 410 to shift amount decoder 470 or the shift amount from instruction register 460 to shift amount decoder 470. Shift amount decoder 470 couples MUX 465 to shifter/rotator 475. Shifter/rotator 475 shifts the data stored in a data field specified by a SHIFT/ROTATE instruction by an amount that shift amount decoder specifies to shifter/rotator 475. Shifter/rotator 475 sends the result of the shift operation to register file 415 for storage at a destination such as destination register Rdest 450.
The following describes the operation of processor 400 when processor 400 encounters an immediate SHIFT/ROTATE instruction in the instruction stream provided by instruction cache 420. When instruction unit 435 receives and decodes such an immediate SHIFT/ROTATE instruction, processor 400 enters an immediate mode of operation for that instruction. More particularly, a microcode unit 480 in instruction unit 435 monitors the instructions in the instruction stream to locate any register dependent SHIFT/ROTATE instructions. When microcode unit 480 locates a register dependent SHIFT/ROTATE instruction, processor 400 enters a register dependent mode of operation for that instruction. In actual practice, processor 400 may operate in both immediate mode and register dependent mode concurrently in the sense that pipeline stages of each mode may overlap.
However, when instruction unit 435 receives an immediate SHIFT/ROTATE instruction such as that of
In contrast, the following describes the operation of processor 400 when processor 400 encounters a register dependent SHIFT/ROTATE instruction in the instruction stream provided by instruction cache 420. When instruction unit 435 encounters a register dependent SHIFT/ROTATE instruction, processor 400 enters a register dependent mode. Programming in microcode unit 480 monitors the instruction stream passing through instruction unit 435. When microcode unit 480 encounters a register dependent instruction such as the SHIFT Rdata, Ramount, Rdest instruction depicted in
When microcode unit 480 intercepts a register dependent SHIFT/ROTATE instruction and processor 400 enters register dependent mode, control unit 455 causes the IMMED signal to go low to instruct MUX 465 to send the shift amount, Ramount, stored in SAR 410 to shift amount decoder 470. The second substitute instruction, namely SHIFT (Rdata), SAR, Rdest now executes because all information needed to execute the instruction is known and available. Register file 415 provides the data to be shifted/rotated from Rdata register 440 to shifter/rotator 475. Execution of the first substitute instruction already moved the shift amount, Ramount, to shifter/rotator 475. Register file 415 also provides the destination register, Rdest 450 to shifter/rotator 475 so that shifter/rotator 475 knows the destination in which to store the results of the SHIFT/ROTATE instruction. When the second substitute instruction executes, register file 415 stores the result of the shift operation in the Rdest destination register 450.
Once the first substitute instruction of
If decision blocks 605 determines that the current instruction is a shift/rotate instruction, then process flow continues to decision block 625. At decision block 625, microcode unit 480 in instruction unit 435 performs a test to determine if the current instruction is a register dependent shift/rotate instruction. In other words, microcode unit 480 performs a test to determine if the current shift/rotate instruction is an instruction that involves a register dependent shift amount. If microcode unit 480 determines that the current shift/rotate instruction does not involve a register dependent shift amount, then that instruction is an immediate shift/rotate instruction. In this event, processor 400 operates in an immediate mode wherein instruction unit 435 issues the immediate shift/rotate instruction, as per block 630, for immediate execution. Shift/rotate engine 405 then executes the instruction and stores the results in register file 415, as per block 635. In a simplified case, the process ends at end of block 640. However, in actual practice, process flow may continue back to block 600 that processes the next instruction in the instruction stream.
Microcode unit 480 of instruction unit 435 continues to monitor the instruction stream for register dependent shift/rotate instructions, as per decision block 625. When decision block 625 finds such a register dependent shift/rotate instruction, then processor 400 operates in a register dependent mode wherein microcode unit 480 breaks the register dependent instruction into a first substitute instruction and a second substitute instruction, as per block 645. More particularly, microcode unit 480 breaks the instruction into a first substitute instruction, MOVE Ramount to SAR that retrieves and moves the shift amount specified in the Ramount register 445 in the register file 415 to the special shift amount register (SAR) 410. Microcode unit 480 also breaks the instruction into a second substitute instruction, SHIFT (Rdata), SAR, Rdest. Then, instruction unit 435 issues the second substitute instruction, SHIFT (Rdata), SAR, Rdest, to SHIFT/ROTATE functional unit 405, as per block 650. In response, SHIFT/ROTATE functional unit 405 executes the second substitute instruction to shift the data in the data field, Rdata, by the amount specified in the shift amount register (SAR) 410, as per block 655. SHIFT/ROTATE functional unit 405 provides the result to destination register Rdest 450 when shifter/rotator 475 executes the second substitute instruction, also as per block 655. In a simplified case, process flow ends at end block 660. However, in actual practice, process flow may continue back to block 600 at which the instruction unit 435 continues processing instructions from the instruction cache 420.
While in the embodiment discussed above, microcode unit 480 monitors the instruction stream for immediate SHIFT/ROTATE instructions and register dependent SHIFT/ROTATE instructions, in another embodiment a portion of the instruction unit 435 external to the microcode unit 480 may monitor the instruction stream for such instructions. However, in that embodiment, once the instruction unit locates such a register dependent SHIFT/ROTATE instruction, then microcode unit 480 performs the function of breaking the register dependent instruction into the first and second substitute instructions depicted in
The foregoing discloses a processor that may provide improved efficiency in processing immediate and register dependent shift rotate instructions.
Modifications and alternative embodiments of this invention will be apparent to those skilled in the art in view of this description of the invention. Accordingly, this description teaches those skilled in the art the manner of carrying out the invention and is intended to be construed as illustrative only. The forms of the invention shown and described constitute the present embodiments. Persons skilled in the art may make various changes in the shape, size and arrangement of parts. For example, persons skilled in the art may substitute equivalent elements for the elements illustrated and described here. Moreover, persons skilled in the art after having the benefit of this description of the invention may use certain features of the invention independently of the use of other features, without departing from the scope of the invention.