Upon being powered up, or in response to a reset condition, conventional microcomputers, and other devices controlled by a programmable microprocessor or microcontroller, undergo an initialization process known as “booting”. The boot process generally involves initializing peripheral devices such as disk drives (if present), and loading an operating system or kernel into main memory (random access memory or “RAM”) so that program execution may be performed as required for operation of the system. The software that controls the boot process is sometimes referred to as boot code and may be stored in a ROM (read only memory) or a flash memory. One widely used type of boot code is the Basic Input Output System, known as “BIOS”.
Boot code may be subject to corruption under some circumstances, and if the boot code is corrupted, the boot process may fail.
It has been proposed to try to prevent some portion of the boot code from being corrupted by write-protecting the portion of flash memory in which that portion of boot code is stored. The protected portion may then be used to perform a validation process (e.g., a checksum) on all of the boot code to determine whether corruption has occurred. If so, a non-corrupted image of the boot code may be recovered from a storage device such as a floppy disk. A disadvantage of this approach is that user intervention may be required.
According to another proposal, two boot code images—a main image and a backup image—may be stored. The backup image may be a known good boot code version and/or may be write-protected. The backup image may be executed initially on power-up or reset to validate the main image, which is then executed to control the balance of the boot process. If the boot process fails while using the main image, the system may automatically switch back to the backup image, which then continues the boot process. The boot process may also continue with the backup image if the routine for validating the main image indicates that the main image may be corrupted.
The latter proposal allows the main boot code image to be updated, while relying on a somewhat different backup version to detect corruption in the main image and to carry on with booting if necessary, without user intervention. However, it cannot be absolutely assured that the backup is or will remain free of corruption. For example, it may be necessary to allow for updating of the backup boot code image in the case of a major boot code revision. The updating process, or even just the mechanism which allows for updating, may permit the backup image to be corrupted, in which case the entire boot process may fail.
According to a third proposal, two, possibly identical, boot code images are stored, but only a limited portion of each is write protected. The main image performs a self-validation using its protected portion (as in the first proposal described above). If the validation fails, then the protected portion of the main image causes the system to proceed with booting with the backup image, without user intervention. Again, however, the need to at least allow for updating of the protected portion of the main image also allows for the possibility that the protected portion may be corrupted. If this occurs, then, as in the second proposal, the entire boot process may fail.
Thus there is a need for a boot process that does not depend on a specific boot code image, either main or backup, to be free of corruption. This need is particularly pressing in the case of processor-controlled communication equipment, for which “high availability” may be desired.
In some embodiments, the processor 12 may include chipset functionality which includes, for example, a memory controller hub (MCH) and/or an I/O (input/output) controller hub (ICH). For example, the processor may connect to RAM (discussed below) with a MCH rather than directly. Similarly, a reset vector interface may connect to the processor either via an ICH (if present) or directly. Also, the control/switching logic which is discussed below may connect either directly to the processor or via an ICH.
The computer system 10 also includes one or more memory devices 14 (e.g. a ROM or ROMs or a flash memory device or devices), in which two or more BIOS images are stored. The BIOS images may be identical or interchangeable. BIOS images are to be considered “interchangeable” if stored from a single source, from identical sources or from sources configured to result in essentially identical operation of the processor 12. It will be appreciated that if only one memory device 14 is provided for BIOS storage, then all of the BIOS images, whether two or more than two, are stored in that one BIOS storage device. If more than one memory device is included in the system 10 for storage of the BIOS images, then at least one of the BIOS images may be stored in a first one of the memory devices and at least one other of the BIOS images may be stored in a second (different) one of the memory devices.
As used in this description of computer system 10 and in the appended claims, “BIOS” refers to the above-mentioned Basic Input Output System and/or to any other bootstrap firmware for a computer system motherboard or for another device that includes a processor or controller.
The computer system 10 further includes control logic 16 that is coupled to the processor 12 and to the memory device(s) 14 in which the BIOS images are stored. The control logic is provided in accordance with some embodiments to allow the computer system to boot notwithstanding that any one (or possibly more than one in some embodiments) of the BIOS images may be corrupted. The control logic 16 may, in some embodiments, be constituted by suitably configured PAL (programmable array logic) or PLD (programmable logic device) or by a suitably programmed IPMI (Intelligent Platform Management Interface) microcontroller. Details of operation of the control logic 16 will be provided below in connection with
Also included in the computer system 10 is a reset circuit 18 which is coupled to the processor 12 and the control logic 16. The reset circuit 18 may operate to initiate a reset condition under certain circumstances such as actuation by a user of a reset button (not shown). The reset condition initiated by the reset circuit 18 may involve assertion of an active signal on a reset pin or pins (not separately shown) that cause the processor to enter into a boot mode. As will be seen, the reset circuit may also operate to initiate a reset condition in response to a signal received by the reset circuit from the control logic.
In addition, the computer system 10 includes main memory (RAM) 20 coupled to the processor 12. Operating system software, device drivers, etc. may be loaded into the RAM 20 as part of the boot process. Also, in some embodiments, the computer system 10 may include one or more disk drives 22 (e.g., one or more floppy disk drives and/or one or more hard disk drives; shown in phantom) that are coupled to the processor 12. The disk drive(s) 22 may, for example, be the source of operating system software and/or other software loaded into the RAM 20 in the boot process.
The computer system 10 further includes a power supply 24 which may be a source of power for all of the above-enumerated electrical or electronic components of the computer system 10. In some embodiments, the power supply 24 may be turned on and/or off by one or more switches or buttons (not separately shown) that are actuatable by a user of the computer system. When the power supply 24 is turned from off to on, the computer system is said to be “powered up”, and a reset condition is entered by the computer system, followed by a boot mode.
In some embodiments, the computer system 10 may include one or more other nonvolatile memory devices, including nonvolatile program storage, in addition to the memory device(s) 14 used to store the BIOS images. Data and or software other than the BIOS images may be stored in the same memory device(s) with the BIOS images. The computer system may also include one or more input/output devices (not shown) coupled to the processor. Such input/output devices may include a computer monitor, a keyboard, a computer mouse.
Block 33 in
Also shown in
Inputs to the control and switching function 30 and details of operation of the control and switching function 30 will be described below.
At 50 in
As indicated at 52, the control logic 16 (
At 54 in
Also in response to the reset resulting from powering up of the computer system, the control and switching function 30 of the control logic 16 starts the timer 33 (
At 58 in
After sending the startup signature at 60, normal operation of the processor 12 and of the system 10 continues, as indicated at 62. In the case where the startup signature is sent to the control logic 16 upon determining that the currently executing BIOS image is valid and before completion of the boot procedure, the continuing normal operation indicated at 62 may include completion of the boot procedure.
Upon receiving the startup signature, control and switching function 30 (
If at 58 it is determined that the boot procedure failed and/or the currently executing BIOS image was not found to be valid, the startup signature is not sent to the control logic. Accordingly, the timer 33 is not disabled and times out. In response to the timing out of the timer 33, the control and switching function 30 de-selects the previously selected BIOS image and (as indicated at 64 in
In some embodiments, the timing out of the timer 33 may cause an event to be logged (e.g., to an IMPI System Event Log) to indicate that the previous BIOS image failed. In addition or alternatively, a suitable error notice may be displayed to a user.
Also in response to the timing out of the timer 33, the control and switching function 30 controls the reset circuit 18 (
In some embodiments, the control logic 16 may operate such that it completes its selection of another BIOS image and the mapping of that BIOS image to the processor reset vector before the reset condition is released, to assure that the processor executes another BIOS image as a result of the reset condition. Thus, the process of
The loop of functions 56, 58, 64, 66 may be reiterated indefinitely, or until the boot procedure is performed successfully using a currently executing one of the BIOS images 32 (
With this arrangement of stored BIOS images and with control logic operating as described above, even if the first BIOS image executed on power-up or other reset is corrupted, the system is able to switch without user intervention to another BIOS image from which the boot procedure may be successfully performed. More generally, if n (greater than one) BIOS images are stored in the memory device(s) 14, the system will boot properly without user intervention even if all but one of the BIOS images are corrupted. Theoretically, n may be any number (greater than one), limited only by the storage capacity of the memory device(s) 14. Moreover, this arrangement does not rely on a particular BIOS image being non-corrupted.
The loop of 56-58-64-66 in
In some embodiments, the control logic may store an indication as to which BIOS image most recently was used for a successful boot process, and the indicated BIOS image may then be used for booting upon subsequent resets or power-ons.
In some embodiments, if a determination is made at 58 that the currently executing BIOS is not valid (e.g., the checksum failed), then, instead of waiting for the timer 33 (
In some embodiments, the BIOS images may be configured such that a user boot set up option or other user input is delayed until after the startup signature is sent to the control logic. To do otherwise may risk allowing the timer 33 to time out (thereby causing selection of a new BIOS image and initiation of a reset) even though the currently executing BIOS image is executing normally.
The sequence of functions starting at 56 in
In some embodiments, when a BIOS image fails to be validated and/or fails to result in successful booting, and another BIOS image is selected and successfully boots, the “bad” BIOS image may be replaced (re-programmed) in the memory device 14 by a BIOS image that is believed to be “good”. For example, a suitable flag or flags may be set in the control logic 16 to identify a BIOS image or images that failed at one or another iteration (including the first iteration) of 58 in
In some embodiments, the control logic 16 is made aware of every reset, whether hard or soft or upon power-up, and in the case of every reset, the control logic sets the timer 33 (
In some embodiments the BIOS may cause several resets to occur. It may be desirable in such cases for the control logic to reset the timer on each occurrence of a reset even though the timer is already running, to give the BIOS adequate time to self-validate and/or to complete the boot procedure before the timer times out.
In some embodiments, the BIOS execution flow may bypass some BIOS code in the case of certain resets. In such embodiments, it may be desirable not to bypass the code which performs the functions of self-validation and sending the startup signature, as referred to above. In other embodiments, if the portion of the BIOS code which sends the startup signature is bypassed in response to some resets, and if the control logic is made aware of such resets, the control logic may be configured to mask off the BIOS switching capability in the case of such resets.
In some embodiments, one or more other BIOS images may be accessible to the software executed by the processor 12, in addition to the BIOS image currently mapped to the memory address range that covers the processor reset vector. This may be done to facilitate updating of a BIOS image that is not currently selected for use in the boot procedure. Assume for example that the processor reset vector is 0xFFFFFFF0 and that two 1-megabyte BIOS images are implemented. Then, in some embodiments, the first BIOS image may occupy the memory address range 0xFFF00000 to 0xFFFFFFFF, covering the processor reset vector, and the second BIOS image may occupy the memory address range 0xFFE00000 to 0xFFEFFFFF. Upon a reset, the system will try to boot using the first BIOS image. If the boot fails and the control logic switches to the second BIOS image, the second BIOS image may be mapped to the memory address range 0xFFF00000 to 0xFFFFFFFF and the first BIOS image may be mapped to the memory address range 0xFFE00000 to 0xFFEFFFFF. Upon the reset initiated by the control logic, the system will attempt to boot using the second BIOS image. If the boot procedure is now successful, the software which controls the processor has access to both BIOS images.
In some embodiments, the control logic 16 may be configured to switch between BIOS images in response to a software command (i.e. in response to a command issued by the processor 12 under the control of software which programs the processor), without a reset. In such cases the control logic may not initiate a reset upon switching between the BIOS images. In some embodiments, only one BIOS image may be visible in system memory at a given time, but an invisible BIOS image may become accessible by being mapped into the memory address range that covers the processor reset vector in response to a software command.
In some embodiments, the control logic may be configured to enable and disable write protection on one or more of the BIOS images on an individual basis.
In some embodiments, the system may be configured to permit the user to use a manual method (e.g., a jumper) to enable and disable write protection for one or more of the BIOS images on an individual basis.
In some embodiments, the system may be configured to permit the user to use a manual method (e.g., a jumper) to cause the control logic to switch between BIOS images.
In some embodiments, the control logic may examine the stored BIOS images to attempt to determine which one or ones of the BIOS images are “good”. The control logic may then select a “good” BIOS image in preference to other BIOS images. For example, if there are three or more stored BIOS images, the control logic may compare the BIOS images to each other, and if one or the stored BIOS images does not match the others, that one of the stored BIOS images may be the last one selected for attempted booting.
In some embodiments, in the case of a BIOS upgrade, if booting is unsuccessful with a new BIOS image the control logic may select an older BIOS image for the next boot attempt.
As used herein and in the appended claims, “computer system” refers to any device that includes a microprocessor, including servers, personal computers, laptop computers, and communication devices such as network controllers and routers.
As used herein and in the appended claims, “reset condition” includes one or more of (a) powering-on of a computer system, (b) a reset asserted by an active signal on a reset pin, and (c) a reset initiated by a software routine without assertion of an active signal on a reset pin.
The several embodiments described herein are solely for the purpose of illustration. The various features described herein need not all be used together, and any one or more of those features may be incorporated in a single embodiment. Therefore, persons skilled in the art will recognize from this description that other embodiments may be practiced with various modifications and alterations.