Claims
- 1. A self-correcting computer system, comprising:
three or more processors; a controller adapted to receive signals from the processors and being further adapted to determine a majority value for a signal; and a scrubbing module adapted to resynchronize the processors at a predetermined milestone, the resynchronization being in accordance with the majority value.
- 2. The computer system according to claim 1, wherein the predetermined milestone is a time interval.
- 3. The computer system according to claim 1, wherein the predetermined milestone is a number of clock cycles.
- 4. The computer system according to claim 1, wherein the predetermined milestone is adapted to be reached during an operating system idle time.
- 5. The computer system according to claim 1, wherein the controller is adapted to record an error when signals from one or more processors disagrees with the majority value.
- 6. The computer system according to claim 5, wherein the controller is adapted to suspend operation of a processor yielding a signal in disagreement with the majority value.
- 7. The computer system according to claim 5, wherein the controller is adapted to change the predetermined milestone.
- 8. The computer system according to claim 7, wherein the controller is adapted to change the predetermined milestone based on the frequency of recorded errors.
- 9. The computer system according to claim 7, wherein the controller is adapted to change the predetermined milestone based on system requirements.
- 10. The computer system according to claim 1, wherein the scrubbing module is adapted to resynchronize the processors by:
a) flushing selected processor state elements for each processor into a main memory; and b) providing each processor with restoration data, said restoration data corresponding to a majority value of each of said selected processor state elements.
- 11. The computer system according to claim 10, wherein said state elements include registers.
- 12. The computer system according to claim 10, wherein said state elements include cache memory.
- 13. The computer system according to claim 12, wherein at least a portion of said cache memory is configured as write-through.
- 14. The computer system according to claim 1, wherein the controller includes field-programmable gate arrays (FPGAs).
- 15. The computer system according to claim 14, wherein the FPGAs are individually fault-tolerant.
- 16. The computer system according to claim 1, wherein the controller includes application-specific integrated circuits (ASICs).
- 17. The computer system according to claim 1, wherein each processor is provided with a radiation-mitigating shield.
- 18. The computer system according to claim 1, further comprising:
a radiation-mitigating shield adapted to shield substantially all components of the computer system.
- 19. The computer system according to claim 1, further comprising:
a memory module adapted to store data, said memory module being in communication with said processors.
- 20. The computer system according to claim 19, wherein the memory module further comprises:
three or more mirrored memory elements; and a memory scrub module adapted to detect an error in one or more of said memory elements when an entry at a selected address of said one or more memory elements differs from an entry at said selected address of a majority of said memory elements.
- 21. The computer system according to claim 20, wherein said memory scrub module is further adapted to reconfigure said one or more memory elements in which an error is detected to contain an entry at the selected address identical to said entry at the selected address of said majority of said memory elements.
- 22. The computer system according to claim 20, wherein said memory scrub module is adapted to test for errors at each read or write to said selected address.
- 23. The computer system according to claim 20, wherein said memory scrub module is adapted to test for errors at regular intervals and to reconfigure an entry at an address of one or more of said memory elements when said entry differs from an entry at said address of a majority of said memory elements.
- 24. The computer system according to claim 19, wherein the memory module includes error detection and correction logic.
- 25. The computer system according to claim 24, wherein the error detection and correction logic includes Reed-Solomon error correction.
- 26. A fault-tolerant computer system, comprising:
three or more mirrored memory elements; and a memory scrub module adapted to detect an error in one or more of said memory elements when an entry at a selected address of said one or more memory elements differs from an entry at said selected address of a majority of said memory elements wherein said memory scrub module is adapted to test for errors at regular predetermined intervals.
- 27. The computer system according to claim 26, wherein said memory scrub module is further adapted to reconfigure said one or more memory elements in which an error is detected to contain an entry at the selected address identical to said entry at the selected address of said majority of said memory elements.
- 28. The computer system according to claim 26, wherein said memory scrub module is adapted to test for errors at each read or write to said selected address.
- 29. The computer system according to claim 26, wherein said memory scrub module comprises:
an array of AND gates, each gate in said array having as a first input a signal from one of said memory elements, and as a second input a signal from a different one of said memory elements, each gate in said array yielding a first output; and an OR gate having as inputs the first output from each gate in said array of AND gates, said OR gate having a second output; wherein said memory scrub module is adapted to detect an error based on said second output.
- 30. A method of self-correcting by a computer, comprising:
a) flushing selected processor state elements from three or more processors of said computer when a predetermined milestone is reached; b) storing restoration data on a system memory, said restoration data being indicative of majority value of each of said selected processor state elements; and c) restoring said selected processor state elements using said restoration data.
- 31. The method according to claim 30, wherein the three or more processors are reset prior to step c).
- 32. The method according to claim 30, wherein the predetermined milestone is a time interval.
- 33. The method according to claim 30, wherein the predetermined milestone is a number of clock cycles.
Parent Case Info
[0001] This application is related to U.S. Provisional Patent Application No. 60/451,041 (Atty Docket No. 026471-0201, Express Mail No. EV003428368), filed Feb. 28, 2003, from which priority is claimed, and which is hereby incorporated by reference in its entirety, including all tables, figures, and claims.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60451041 |
Feb 2003 |
US |