As more and more transistors are placed on central processing unit (CPU) chips with smaller and smaller feature sizes and lower voltage levels, the need for on-chip fault-tolerance features is increased. In particular, CPU execution units, such as floating point units (FPUs), are especially susceptible to potential failure mechanisms because they take up large areas of the CPU.
Typically, error correction coding (ECC) may be used to detect and correct errors. ECC provides single-bit and multi-bit error detection, and also provides single-bit error correction. However, ECC requires a setting in a computer system's BIOS utility program to be enabled as well as special chipset support. In addition, it is often difficult to implement ECC through CPU execution units such as FPUs.
One conventional solution for providing fault-tolerance in digital processing by CPUs is using a computer system with multiple CPUs. For example, the multiple CPUs may be operated in full lock-step to achieve a level of fault-tolerance in their computations. That is, multiple CPUs each execute the same computation and then the results are compared to determine if an error has occurred. However, such a solution may not only waste hardware from a performance perspective, but is also often expensive in that it typically requires additional hardware and support infrastructure and consumes more power.
Another conventional solution for providing fault-tolerance in digital processing by CPUs is software verification. The software verification is performed by executing an entire program multiple times on the same computer or on different computers, and then comparing the results for errors. However, this solution is often expensive in that it requires a longer run-time or requires multiple computers.
Other solutions address the problem by having a program compiler schedule redundant execution unit operations in the CPU at compile time to compare and test the results from the execution units for errors. However, these solutions often require the use of a special compiler; therefore, code compiled with a different compiler often must be recompiled with the special compiler. In addition, these solutions require that code be recompiled before the computer can take advantage of the additional fault-tolerance. This not only requires a longer run-time due to the scheduling of redundant execution unit operations and the recompiling of code, but it also requires additional hardware such as the special compiler.
Furthermore, comparison of the outputs of the execution units in the above solutions typically sacrifices performance in all cases, even in those programs that do not require fault-tolerance. This is because the above solutions typically provide fault-tolerance for every instruction of every program that is run on the computer system. As a result, the entire computer system is unnecessarily slowed down because programs that do not require fault-tolerance are being run with fault-tolerance.
An embodiment of the invention provides a microprocessor including a plurality of execution units of a same type, and a first register operable to select between a first and a second mode of operation, wherein the microprocessor utilizes at least one of the execution units as a redundant execution unit during the first mode of operation and utilizes none of the execution units as a redundant execution unit during the second mode of operation.
The components shown in
The instruction cache 34 stores instructions that are frequently being executed by the microprocessor 16. Similarly, a data cache (not shown) may store data that is frequently being accessed by the microprocessor 16 to execute the instructions. In some implementations, the instruction and data caches may also be combined into one memory. There is also typically access (not shown) by the microprocessor 16 to random access memory (RAM), disk drives, and other forms of digital storage.
Addresses of instructions in memory may be generated by the instruction fetch unit 32. For example, the instruction fetch unit 32 may include a program counter that increments from a starting address within the instruction cache 34 serially through successive addresses in order to read out instructions stored at those addresses. The instruction decode/issue 36 receives instructions from the cache 34 and decodes and/or issues the instructions to one or both of the FPUs 40A and 40B for execution. The mode register 38 determines in which mode the microprocessor 16 is operating. The FPUs 40A and 40B may be configured to output the results of the execution to specific registers 42 in the microprocessor 16. In addition, the outputs of the FPUs 40A and 40B are coupled to a comparator 44. The comparator 44 compares the values at its two inputs and then outputs a value to the comparison flag 46, which indicates whether the input values are the same or different. Other circuitry, such as that to supply operands for the instruction execution, is not shown.
In accordance with an embodiment of the invention, the circuitry of
For example, when the mode register 38 is set to a first value (e.g., a logic “0”), the microprocessor 16 operates in the performance mode where all fault-tolerant operations are turned off to maximize the speed of the microprocessor 16. In this mode, the comparator 44 and the comparison flag 46 are deactivated, and the microprocessor 16 utilizes both FPUs 40A and 40B as scheduled by a program compiler (not shown). The instruction decode/issue 36 may issue a first instruction to only the FPU 40A during a clock cycle, or the instruction decode/issue 36 may issue first and second instructions in parallel to both of the FPUs 40A and 40B during a clock cycle. The outputs of the FPUs 40A and 40B may then be retired without having to wait for the comparator 44 or the comparison flag 46.
Alternatively, when the microprocessor 16 is operating in the performance mode, the comparator 44 and the comparison flag 46 may be activated. In this case, the instruction decode/issue 36 still utilizes both FPUs 40A and 40B as scheduled by the compiler. However, the microprocessor 16 simply ignores any results from the comparator 44 and does not perform any type of error comparison before retiring the outputs of the FPUs 40A and 40B. As a result, there is no degradation in the speed of the microprocessor 16.
When the mode register 38 is set to a second value (e.g., a logic “1”), the microprocessor 16 operates in the HA mode where fault-tolerant operations are turned on to increase the fault-tolerance of the microprocessor 16. In this mode, the comparator 44 and the comparison flag 46 are activated, and the FPU 40B now functions as a redundant execution unit parallel to the FPU 40A. As a result, if the compiler schedules a first instruction to be executed by the microprocessor 16, the instruction decode/issue 36 issues the first instruction to the FPU 40A and also to the redundant FPU 40B. That is, both the FPU 40A and the FPU 40B execute the same instruction. The comparator 44 then compares the outputs of the FPUs 40A and 40B so that if the outputs match, then the comparator 44 provides a signal to the comparison flag 46 indicating that the result is correct, and the outputs of the FPUs are retired. If the outputs of the FPUs 40A and 40B do not match, then the comparator 44 provides a signal to the comparison flag 46 indicating that there is an error. At this point, the instruction from the instruction decode/issue 36 may be re-executed by the FPUs 40A and 40B until the FPU results match.
Alternatively, if the compiler schedules first and second instructions to be executed in parallel by the microprocessor 16 in the HA mode, then the instruction decode/issue 36 issues the first instruction to both the FPU 40A and the redundant FPU 40B during a first clock cycle and the comparator 44 compares the outputs of the FPUs. Then immediately afterwards, the instruction decode/issue 36 issues the second instruction to both the FPU 40A and the redundant FPU 40B during a second clock cycle and the comparator 44 compares the outputs of the FPUs.
Alternatively, the redundant FPU 40C, the comparator 44 and the comparison flag 46 may also be activated when the microprocessor 16′ is operating in the performance mode. In this case, the instruction decode/issue 36 still utilizes the redundant FPU 40C along with the FPUs 40A and 40B. However, the microprocessor 16′ simply ignores any results from the comparator 44 and does not perform any type of error comparison before retiring the outputs of the FPUs 40A and 40B. As a result, there is no degradation in the speed of the microprocessor 16′.
Referring to
Alternatively, the value in the mode register 38 may be set by user control. A user may determine through a user interface that specific programs require the microprocessors 16 and 16′ to run in either the HA mode or the performance mode, and set the value in the mode register 38 accordingly through the user interface. In addition, the user may modify the table described above that specifies the mode register settings for specific programs through the user interface. In this way, the user can manually set the value in the mode register 38 and override the OS so that a program is forced to run in either the HA mode or the performance mode.
In an alternative embodiment, the microprocessor 16, 16′ may include other mode registers in addition to the mode register 38 in order to incorporate different levels of HA operation. For example, a second mode register may be used to implement error correction coding (ECC) on all data or on data coming from certain units within the microprocessors 16 and 16′. A third mode register may be used to implement parity checking again on all data or on data coming from certain units within the microprocessors 16 and 16′. Besides being independently controllable using separate mode registers, these different levels of HA operation may also be designed to be implemented in various combinations or sub-combinations.
In another embodiment, the computing circuit 12 in
Still referring to
In another embodiment, the microprocessors 16 and 16′ may insert a comparison instruction at an optimal location within the instruction flow. An advantage of this embodiment is that the comparison instruction is not required to immediately follow the actual and redundant FPU instructions. Instead, the microprocessors 16 and 16′ are allowed to pre-fetch a number of instructions to determine the least costly location to insert the compare instruction. The cost of the location within the pre-fetched instruction flow may be determined as a function of resource utilization, performance and coverage. The actual FPU result is not retired until the comparison instruction is completed and no error is signaled.
In another embodiment, the microprocessors 16 and 16′ may retire the actual FPU results before a comparison operation is completed. This increases the processing speed of the microprocessors 16 and 16′ because the results of the FPU instructions are retired immediately upon their completion. If no error is detected when the comparison is completed, then the instruction flow continues as usual. However, if an error is detected, then the system reverts back to a known “good” state and resumes processing from there. Assuming the frequency of errors detected from the comparison is low, this embodiment potentially experiences less performance degradation than the two embodiments above.
Therefore, a standard program does not need to be rewritten or recompiled in order for it to take advantage of the microprocessors 16 and 16′ operating in HA mode. While in the HA mode, the microprocessors 16 and 16′ implement the fault tolerant operations in hardware, and as a result, these operations are transparent to the software program. In addition, because the operation of the microprocessors 16 and 16′ in either HA mode or performance mode is configurable, high performance and increased fault-tolerance may both be maintained in the same computer system with the same microprocessor and the same program.
From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention.