The MPU according to the present invention includes the nop detecting circuit for detecting the nop from the fetched instruction data from the instruction memory, the F/Fs placed between each of the pipelines in order to send the nop signal to each of the pipelines, and the clock control circuits for halting the clock at each of the pipeline stages based on the nop signal.
In the MPU according to the present invention, the clocks of the pipeline registers and the memories, etc. are halted and simultaneously the input data of each of pipeline stages are held during sending the nop to each of the pipelines, by the first process for outputting the nop signal, of logic level “H” when the nop is detected by the nop detecting circuit, the second process for sending the above detected nop signal to each of the pipelines, and the third, process for halting the clocks by the clock control circuits placed in each of the pipelines.
The above general configuration diagram shows an example of five-stage pipeline having the five stages of FE/DC/EX/MEM/WB as in the conventional case of
As in the conventional case of
The first embodiment is characterized by including additionally to the conventional MPU configuration;
a nop detecting circuit 41 for detecting the nop instruction from a fetched data (instruction data) S22 from the instruction memory 22;
clock control circuits 42-45 placed in each of the pipeline stages; and
F/Fs 46-48 placed between each of the pipeline stages so as to send one-bit nop signal S41 from the nop detecting circuit 41 indicting that the instruction is the nop instruction.
Each of the F/Fs 46-48 outputs one-bit nop signals S46-S48. The instruction memory 22 and clock control circuits 42-45 operate synchronized with the clock CK. Each of clock control circuits 42-45 is provided with each of one-bit nop signals S41, S46-47 as enable signals (activating signal) 7, and generates gated clocks S42-S45 based on the clock CK. Each of the pipeline register 28-31 is configured to operate synchronized with each of the gated clock S42-S45, the register set 24 is configured to operate based on the gated clock S42, and the data memory 26 is configured to operate based on gated clock S44. The above configurations is a characteristic of the first embodiment and a different point from the conventional MPU.
Each of the clock control circuits 42-45 of
As shown in
The above gated clock S42 is inputted to the clock input terminal of the FE/DC pipeline register 28 and the clock input terminal of the register set 24 of the DC stage. Similarly, in the EX, MEM, WB stages after the above stage, there is a configuration that the nop signals S46, S46, S48 sent from the previous stage are inputted to the clock control circuits 43, 44, 45, and the above output signals S43, S44, S45 of the above clock control circuits 43, 44, 45 are inputted to the pipeline register 29, 30, 31 of the next stages and the data memory 26.
The whole operation of the MPU of
When the address number 2 (A2) generated by the PC27 is provided the instruction memory 22, the instruction data S22 (D2) corresponding to the nop instruction is outputted at the rising edge of the next clock CK from the instruction memory 22, and the nop signal S41 is outputted from the nop detecting circuit 41. Subsequently, the gated clocks S42-S45 are outputted from the clock control circuits 42-45 of each of the FE, DC, MEM, WB stages, respectively, and the above gated clocks are sent to the pipeline registers 28-31, the register set 24, and the data memory 26. The timing of generating the above nop signal S41 and the signals flow of each of the nop signals S46-S48 of the FE, DC, MEM, WB stages and the gated clocks S42-S45 are shown in
Providing the next-stage pipeline registers 28-31, etc. with the gated clocks S42-S45 can be halted corresponding to sending the nop to each of the FE, DC, EX, MEM, WB stages, by the circuit configuration of the MPU according to the first embodiment having the above mentioned signal flows.
According to the first embodiment, the power consumption of the pipeline registers 28-31, the register set 24, and the data memory 26 being unnecessary to operate during the nop operation can be reduced by halting the gated clocks S24-S45 of the pipeline registers 28-31, etc. being unnecessary to operate, corresponding to sending of the nop. Furthermore, by halting the gated clocks S24-S45 of the pipeline registers 28-31, the input data of each of the FE, DC, EX, MEM, WB stages can be held and the operations of the combinational circuits of each of the FE, DC, EX, MEM, WB stages can be halted, therefore, further more reduction of the power consumption can be expected.
The MPU according to the second embodiment is configured to include a nop-only bit S22a indicating logic level “H” in the case of the nop instruction, in the instruction data S22 outputted from the instruction memory 22 instead of the nop detecting circuit 41 according to the first embodiment, and is configured to input the above nop-only bit S22a directly to the clock control circuits 42 and the F/F 46 between the FE/DC stages. Other configurations are the same as in the first embodiment.
In the case where the instruction data 522 fetched from the instruction memory 22 is the nop, the nop-only bit S22a is set to logic level “H” . Therefore, in the FE stage, the gated clock S42 of the FE/DC pipeline register 28 and the register set 24 can be halted by inputting directly the one-bit nop-only bit S22a to the clock control circuit 42. At the same time, the same gated-clock control can be done in the subsequent EX, MEM, WB stages by inputting the nop-only bit S22a to the F/F 46 between the FE/DC stages.
In the case where the clock frequency is high, there is some possibility that the delay time in the path from the instruction memory 22 of the first embodiment to the nop detecting circuit 41, the clock control circuit 42 or the F/F 46 between the FE/DC stages becomes a problem.
In the above mentioned case, the delay time can be eliminated by including a nop-only bit S22a in the instruction data S22 as in the second embodiment and by using directly the above nop-only bit S22a as a clock control signal, and therefore a higher frequency operation becomes possible. Furthermore, the power consumption consumed in the nop detecting circuit 41 as in the first embodiment can be reduced.
The MPU according to the third embodiment includes;
an inverter 51 for inverting the clock CK;
an instruction memory 52 for outputting an instruction data S52 assigned by the address from the PC27 based on a gated clock S54;
an instruction memory 53 for outputting a nop-only bit S53 assigned by the address from the PC 27 based on the inverted clock;
a clock control circuit 54 for outputting a gated clock S54 based on the clock CK and the nop-only bit S53; and
a F/F 55 for inputting the nop-only bit S53 and outputting a nop signal S55 to the clock control circuit 42 and the F/F 46.
Other configurations thereof are the same as in the second embodiment.
In other words, the MPU according to the third embodiment includes two instruction memories 52, 53, in addition to the configuration of the MPU according to the second embodiment. The instruction memory 52, one of the above instruction memories, stores the instruction data S52 other than the nop-only bit. The instruction memory 53, the other of the above instruction memories, stores only the nop-only bit S53, and is a one-bit memory. The above instruction memories 52, 53 are provided the same program addresses at the same timing by the PC 27. As described before, the output of the instruction memory 53 represents the nop-only bit S53, however, the output of the instruction memory 53 is inputted to the clock control circuit 54 and the clock CK is halted corresponding to the state thereof. The gated clock S54 of the output from the clock control circuit 54 is used as a clock of the instruction the memory 52. Meanwhile, the one-bit of nop-only bit S53 from the instruction memory 53 is provided the F/F 55 placed in the FE stage, and then the output from the F/F 55 of the one-cycle delayed signal is inputted to the clock control circuit 42, and the F/F 46 between the FE/DC stages, as the nop signal S55.
It is assumed that the address number 2 of the PC27 represents the nop instruction. The same address outputted from the PC 27 is inputted the instruction memory 52 and the instruction memory 53, however, since the clock CK is inverted by the inverter 51 and provided the instruction memory 53 storing the nop-only bit S53, the instruction memory 53 outputs the nop-only bit 553 at the falling edge of the clock during when the address number 2 A2 is being inputted.
Since the nop-only bit 553 from the instruction memory 53 is inputted to the clock control circuit 54 controlling the clock of the instruction memory 52, in the case where the instruction data S52 outputs logic level “H” indicating the nop, the clock control circuit 54 halts the next cycle of the gated clock S54. In the case where the instruction data S52 outputs logic level “L” not indicating the nop, the gated clock S54 is inputted. The above operation means that the nop-only bit S53 is only outputted a half cycle ahead, and in the case where the nop-only bit S53 indicates the nop, the next cycle of the gated clock S52 is halted, that is, during the nop operation, the instruction data S52 except the nop-only bit S53 is not fetched.
Meanwhile, the nop-only bit 553 from the instruction memory 53 is inputted to the F/F 55 placed in the FE stage, and is simultaneously delayed by one cycle of the clock to be provided the F/F 46 between the FE/DC stages. The output from the above F/F 46 is further delayed by one cycle of the clock and is used as the nop signal S46 of the DC stage. The following operations are the same as in the second embodiment of the invention.
According to the third embodiment, by including the instruction memory 53 for storing the nop-only bit S53, reading out the above nop-only bit S53 at a half cycle of the clock ahead and halting the fetch of other instruction data S52 becoming unnecessary when the instruction data S52 being read out at a half cycle of the clock ahead indicates the nop, the power consumption consumed by the instruction memory 52 during the nop can be reduced. At the same time, the same clock control operations become possible as in the first and the second embodiments, and then, a larger effect of reducing the power consumption can be expected.
The forth embodiment of the invention includes a control signal generating circuit 61, disjunction gates (hereinafter referred to as ‘OR gate’) 62, 65, 67 and F/Fs 63, 64, 66, in addition to the configuration of the third embodiment, and other configuration is the same as in the third embodiment.
In other words, according to the forth embodiment of the invention, the control signal generating circuit 61 is included in the DC stage, in addition to the configuration of the third embodiment, and then, a plural of clock enable signals S61a, S61b, S61c for controlling clocks after each of the stages are outputted from the results of the instruction decoder 23. The S61a is the clock enable signal for controlling after the DC stage, the S61b is the signal for after the EX stage, and the S61c is the signal for after the MEM stage, respectively.
The clock enable signal S61a for controlling the clock after the DC stage is logically added to the nop signal S46 sent from the F/F 46 of the FE stage by the OR gate 62, and the logical operation result thereof is provided the clock control circuit 43 and is additionally sent by the F/F 47 as the nop signal S47 of after the EX stage. The clock enable signal S61b for controlling the clock after the EX stage is inputted to the F/F 64 placed between the DC/EX stages and the output from thereof is logically added to the nop signal sent S47 in the EX stage by the OR gate 65 in the EX stage as in the DC stage, and the logical operation thereof is provided the clock control circuit 44 and is additionally sent to the MEM stage. The clock enable signal S61c for controlling the clock after the MEM stage is sent to the F/F 63 placed between the DC/EX stages and to the F/F 66 placed between the EX/MEM stages, and the same operation as in the above mentioned case is done by the configuration thereof.
In the case where a branch instruction operating in the DC stage is detected by the instruction decoder 23, for example, since the branch instruction is done in the DC stage and is passing through the subsequent EX, MEM, WB stages without operations, no problem arises even when the instruction thereof is recognized the nop. Therefore, the control signal generating circuit 61 sets the clock enable signal S61a to logic level “H” in order to recognize the branch instruction as the nop in the pipeline stages after the EX stage.
Meanwhile, the clock enable signal S61a and the nop signal S46 sent from the F/F 46 of the FE stage has logic level “L” due to the branch instruction, however, the above both signals is logically added to the clock enable signal S61a generated in the DC stage by the OR, gate 62, and then the logic operation result thereof inputted to the clock control circuit 43 becomes logic level “H”. Consequently, the clock provided the pipeline register 29 between the DC/EX stages is halted and at the same time, the output signal of the logical adding result thereof is sent to the EX stage by the F/F 47 as the nop signal of next EX stage.
According to the forth embodiment, in the case where the operation is finished in the middle of the pipeline based on the detected instruction by the instruction decoder 23 in the DC stage (for example, in the cases of branch instruction, store instruction, and comparison instruction, etc. having no writing operation to the register set 24 at the end of the instruction cycle), by changing the subsequent operations to equivalent instructions of the nop by the control signal generating circuit 61, etc. (that is, gating the pipeline registers 29, 30, 31 as in the nop of first embodiment), the application cam be expanded to a lot of instructions other than the nop (for example, branch instruction, store instruction, and comparison instruction, etc. finishing the operations in the middle of the pipeline), and then a higher effect of the low power consumption can be expected.
The fifth embodiment of the invention includes a control signal generating circuit 71, a F/F set 72, clock control generating circuits 73, 74 and EX/MEM pipeline registers 75, 76, instead of the control signal generating circuit 61 and the F/F 64.
In other words, according to the fifth embodiment, in addition to the configuration of the forth embodiment, a plural of the clock halting control signals (for example, clock enable signals) S71b besides the clock enable signals S71a, S71c are outputted from the control signal generating circuit 71 for generating the clock enable signal, and are provided the F/F set 72 placed between the DC/EX. The clock control circuits 73, 74 provide the EX/MEM pipeline registers 75, 76 with the clock. The subsequent configurations are the same as in the forth embodiment.
The reason why the F/F set 72, a plural of the clock control circuits 73, 74, and the EX/MEM pipeline registers 75, 76 needs to be included, respectively, are as follows. In the pipeline register 30 and the MEM/WB pipeline register 31, a plural of registers exist, respectively, and then whether the registers thereof are activated or not, is determined by the instruction. Consequently, the registers to be clock-controlled are selected by the instruction, and therefore, the clock control circuits 73, 74 become necessary with respect of each of the above registers.
The characteristic operation of the fifth embodiment will be explained as below. The control signal generating circuit 71 outputs the clock enable signal S71b of the control signal for halting the EX pipeline registers 75, 76, based on the instruction detected by the instruction decoder 23. The F/F set 72 receives the above clock enable signal S71b and delay the clock enable signal thereof by one clock cycle to adjust the timing thereof with the operating instruction. If the above enable signal is not delayed, the current instruction detected in the DC stage conducts gating the EX/MEM pipeline register 30 being used by the one-cycle-ahead instruction. The purpose of the aforementioned delaying the clock enable signal by one clock cycle is to avoid the above malfunction.
The clock control circuits 73, 74 receive the signal from the register set 72 and halt the clock to the EX/MEM pipeline registers 75, 76.
According to the fifth embodiment, when the instruction is not the nop, the clock of non active registers in the pipeline registers 28-31 (for example, the EX/MEM pipeline register 75, 76) is halted so as not to change the data.
For example, three of the address, WBV, BPR pipeline registers are assumed to be in the EX/MEM stages. The pipeline register address is the output to the data memory 26. Consequently, in the case of the operating instruction, the pipeline register address is not activated. Therefore, in the case of the operating instruction, by halting the clock of the pipeline register address and not changing the data, the output of the pipeline register is configured not to toggle.
By the aforementioned configuration, the power consumption of each of the pipeline registers 28-31 can be reduced, and then the reducing effect of the power consumption can be achieved in the larger part thereof.
The present invention is not limited to from the first to the fifth embodiment, and various applications and modifications are possible. The following (a)-(b) are examples of the above applications and the modifications.
(a) The embodiments show examples of the case of five-stage pipeline, however, the present invention is applicable independently from numbers of the pipeline stages.
(b) The present invention can be broadly applied to all circuits, for example, digital signal processors, etc. having pipeline systems.
(c) According to the embodiments, logic level “H” is used as the control signal indicating the nop, however, the control signal is not limited to the above level.
(d) According to the third embodiment, a power consumption reducing method by controlling the gated clock S54 being inputted to the clock input terminal of the instruction memory 52 is shown, however, for example, in the case where the instruction memory 52 includes enable signal input terminal, etc., by inputting the gated clock S54 to the above enable signal input terminal, etc., the nop becomes unnecessary to be fetched, and therefore, reducing power consumption becomes possible.
Number | Date | Country | Kind |
---|---|---|---|
2006-129046 | May 2006 | JP | national |