Claims
- 1. An improved operating system method comprising the steps of:
storing a primary bootable operating system in storage accessible by a computing system wherein during normal operation the computing system accesses the primary operating system so that the primary operating system is the present operating system; storing one or more alternate operating systems in storage accessible by the computing system; monitoring the operation of the present operating system; and based on a set of boot instructions, accessing the alternate operating system upon sensing an error in the operation of the primary operating system so that the alternate operating system becomes the present operating system.
- 2. The method of claim 1 wherein the alternate operating system is a bootable copy of the primary operating system that is stored when the computing system is first booted, thereby creating a first boot alternate operating system.
- 3. The method of claim 1 wherein the alternate operating system is a bootable copy of the primary operating system that is overwritten with a current version of the primary operating system at periodic intervals to create an updated operating system.
- 4. The method of claim 3 wherein the periodic intervals are determined based on a number of times the computing system has been booted since a prior overwrite.
- 5. The method of claim 3 wherein the updated operating system is stored by imaging the operating system.
- 6. The method of claim 1 wherein the alternate operating system is a bootable copy of the primary operating system that is a real time copy of the primary operating system.
- 7. The method of claim 6 wherein the real time copy of the primary operating system is a mirror copy of the operating system created using RAID 1.
- 8. The method of claim 1 comprising the step of storing one or more non-bootable shadow copies of the primary operating system and the wherein the alternate operating system is a recovery operating system that comprises instructions for converting the shadow copy to a bootable operating system.
- 9. The method of claim 1 wherein the step of monitoring the operation of the primary operating system is performed using a watchdog timer that monitors the operation of the present operating system being used by the computing system.
- 10. The method of claim 9 wherein the watchdog timer causes the computing system to boot based on the boot instructions when the present operating system is non-functional for a predetermined time period.
- 11. The method of claim 1 comprising the step of maintaining a list of locations of operating systems that lists the operating systems in an order in which they should be accessed in the event that an error is sensed in the present operating system and wherein the boot instructions direct the computing system to access an operating system by referring to the list of locations of alternate operating systems.
- 12. The method of claim 1 comprising the step of initializing a set of fail-over operating systems by detecting operating systems, storing the locations of the operating systems in a fail-over memory location according to an order in which they are to be accessed, and storing an associated boot counter in the fail-over memory location for each operating system.
- 13. The method of claim 12 wherein the fail-over memory location resides in non-volatile random access memory.
- 14. The method of claim 12 wherein the boot instructions comprise a fail-over algorithm that increments the boot counter of the present operating system each time the present operating system is booted and wherein the algorithm directs the computing system to boot on a next stored operating system upon a predetermined value of the boot counter associated with the present operating system.
- 15. The method of claim 12 wherein the algorithm monitors a watchdog timer that resets upon operating system activity and directs the computing system to boot when the watchdog timer reaches a predetermined time value and wherein the algorithm increments the error counter each time the watchdog timer reaches the predetermined time value.
- 16. The method of claim 15 comprising the steps of, prior to the initializing step, validating currently stored operating system locations if the watchdog timer caused the most recent computing system boot and setting the boot counters of any invalid operating system locations to a predetermined value.
- 17. The method of claim 1 comprising the step of prompting a computing system user to access the alternate operating system upon sensing an error in the operation of the present operating system.
- 18. The method of claim 8 wherein the recovery operating system comprises instructions for prompting a user to take steps to convert the shadow copy to a bootable operating system.
- 19. The method of claim 1 wherein the computing system is a headless server.
- 20. The method of claim 1 comprising the step of alerting a computer user that an alternate operating system has been accessed.
- 21. The method of claim 20 wherein the user is alerted by a visual cue on the computing device.
- 22. The method of claim 20 comprising the step of e mailing an alert message to the computer user.
- 23. The method of claim 1 wherein the alternate operating system is a recovery operating system for recovering the computing system to a working state.
- 24. The method of claim 23 wherein the step of accessing the alternate operating system upon sensing an error is performed by monitoring a boot bit that is reset at each successful boot.
- 25. The method of claim 24 comprising the step of monitoring a diagnostic bit that is set based on the value of the boot bit.
- 26. The method of claim 25 comprising the step of performing diagnostic tests on the operating system when the diagnostic bit is set.
- 27. The method of claim 24 wherein the recovery operating system is accessed when the boot bit is set.
- 28. The method of claim 25 wherein the recovery operating system is accessed when the boot bit and diagnostic bit are both set.
- 29. The method of claim 23 comprising the step of prompting a user to access the recovery operating system.
- 30. An improved operating system method comprising the steps of:
storing a primary bootable operating system in storage accessible by a computing system wherein during normal operation the computing system accesses the primary operating system as a present operating system so that the primary operating system is the present operating system; storing a plurality of alternate operating systems in storage accessible by a computing system; storing a location and associated boot counter for each alternate operating system in an order in which the alternate operating systems are to be accessed; periodically updating at least one of the alternate operating systems; monitoring the operation of the present operating system; incrementing the boot counter of the present operating system each time an error is sensed in the operation of the present operating system; and accessing a next alternate operating system when the boot counter of the present operating system reaches a threshold value so that the next alternate operating system becomes the present operating system.
- 31. For use with a computing system having a set of boot control instructions, a fail-over recovery system comprising:
a primary operating system stored in a first memory partition; at least one alternate operating system stored in corresponding memory partitions; an operating system table that lists each operating system, its location, and an order in which the alternate operating systems should be accessed; wherein the boot control instructions access the operating system table to determine which operating system should be booted to control the computing system as the present operating system.
- 32. The fail-over recovery system of claim 31 comprising a watchdog timer that monitors the functioning of the present operating system and generates an error signal when the present operating system becomes unresponsive for a predetermined amount of time that causes the boot instructions to boot the computing system.
- 33. The fail-over recovery system of claim 32 wherein the operating system table comprises a boot counter associated with each operating system that is incremented each time the computing system is booted using the operating system.
- 34. The fail-over recovery system of claim 33 wherein the boot instructions access a next alternate operating system based on the boot counter of the present operating system.
- 35. The fail-over recovery system 31 comprising a mirroring module that creates a mirror of the present operating system for storage as one of the alternate operating systems.
- 36. The fail-over recovery system of claim 31 comprising an imaging module that creates an image copy of the present operating system at predetermined intervals for storage as an alternate operating system.
- 37. The fail-over recovery system of claim 31 comprising a first boot operating system storage module that stores a copy of the present operating system at the first boot of the computing system as an alternate operating system.
- 38. The fail-over recovery system of claim 31 wherein the operating system table is stored in non volatile random access memory.
- 39. An improved operating system method comprising the steps of:
storing a primary bootable operating system in storage accessible by a computing system wherein during normal operation the computing system accesses the primary operating system as a present operating system so that the primary operating system is the present operating system; storing a recovery operating system in storage accessible by a computing system; storing a boot register for tracking the operating of the present operating system; monitoring the operation of the present operating system; setting a boot bit in the boot register upon computing system start and resetting the boot bit at each successful boot; and accessing the recovery operating system when the boot bit is set at computing system start.
- 40. An improved operating system comprising;
means for storing a primary bootable operating system in storage accessible by a computing system wherein during normal operation the computing system accesses the primary operating system so that the primary operating system is the present operating system; means for storing one or more alternate operating systems in storage accessible by the computing system; means for monitoring the operation of the present operating system; and means for accessing the alternate operating system based on a set of boot instructions upon sensing an error in the operation of the primary operating system so that the alternate operating system becomes the present operating system.
- 41. The improved operating system of claim 40 wherein the means for storing an alternate operating system stores a bootable copy of the primary operating system when the computing system is first booted, thereby creating a first boot alternate operating system.
- 42. The improved operating system of claim 40 wherein the means for storing an alternate operating system stores a bootable copy of the primary operating system that is overwritten with a current version of the primary operating system at periodic intervals to create an updated operating system.
- 43. The improved operating system of claim 40 wherein the means for storing an alternate operating system stores a bootable copy of the primary operating system that is a real time copy of the primary operating system.
- 44. The improved operating system of claim 40 wherein the means for monitoring the operation of the primary operating system is a watchdog timer that monitors the operation of the present operating system being used by the computing system.
- 45. The improved operating system of claim 40 comprising means for maintaining a list of locations of operating systems that lists the operating systems in an order in which they should be accessed in the event that an error is sensed in the present operating system and wherein the boot instructions direct the computing system to access an operating system by referring to the list of locations of alternate operating systems.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This is related to co-pending application entitled “Operating System Update and Boot Failure Recovery” filed by Karl Denninghoff, Raju Gulabani, Mukesh Karki, Clark Nicholson, and Neel Malik on Sep. 26, 2000, Ser. No. 09/669,349 (corresponding to attorney docket number 15-750), which document is hereby expressly incorporated by reference.