1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for applying firmware updates to servers in a data center.
2. Description Of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
One of the areas in which advances have been made is firmware upgrades. Current firmware update methods often involve rebooting a machine in order to run software to update the machine. Often several reboots must occur in order to update multiple firmware components. This problem occurs because multiple vendors of the various devices on a server have independent solutions for updating their firmware. A potential solution is to get all the vendors to agree to a standard firmware update mechanism. This solution is extremely difficult to achieve and does not solve the program of legacy machines that need updating. What is needed is a mechanism that works with the existing firmware update solutions to limit the number of reboots required of a server during a firmware upgrade.
Methods, apparatus, and products are disclosed for applying firmware updates to servers in a data center, where the servers include one or more active servers and a standby server with each server mapped to separate remote computer boot storage, including applying by a system management server the firmware updates to the standby server; selecting by the system management server an active server for firmware updating; powering off the selected active server by the system management server; remapping by the system management server the standby server to the remote computer boot storage for the selected active server; rebooting by the system management server the standby server from the remote computer boot storage for the selected active server, designating the standby server as an active server; remapping by the system management server the selected active server to the remote computer boot storage formerly mapped to the standby server; and rebooting by the system management server the selected active server from the remote boot storage formerly mapped to the standby server, designating the selected active server as a standby server.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts of exemplary embodiments of the invention.
Exemplary methods, apparatus, and products for applying firmware updates to servers in a data center in accordance with the present invention are described with reference to the accompanying drawings, beginning with
A ‘server,’ as the term is used in this specification, refers generally to a multi-user computer that provides a service (e.g. database access, file transfer, remote access) or resources (e.g. file space) over a network connection. The term ‘server,’ as context requires, refers inclusively to the server's computer hardware as well as any server application software or operating system software running on the server. A server application is an application program that accepts connections in order to service requests from users by sending back responses. A server application can run on the same computer as the client application using it, or a server application can accept connections through a computer network. Examples of server applications include file server, database server, backup server, print server, mail server, web server, FTP servers, application servers, VPN servers, DHCP servers, DNS servers, WINS servers, logon servers, security servers, domain controllers, backup domain controllers, proxy servers, firewalls, and so on.
The example system of
The SAN (112) is a network architecture that attaches remote computer storage devices such as disk arrays, for example, to servers in such a way that, to the server operating system, the remote storage devices appear as locally attached disk drives. That is, the separate remote boot storage (110) mapped to each server in this example is exposed by the SAN (112) to each server (104, 106) as a separate virtual drive. Such virtual drives are often referred to or referenced by a so-called logical unit number or ‘LUN.’ A LUN is an address for an individual disk drive and by extension, the disk device itself. A LUN, or the remote storage identified by a LUN, is normally not an entire disk drive but rather a virtual partition (or volume) of a RAID set—in this example a virtual disk drive that organized a portion of RAID (Redundant Array of Inexpensive Drives) storage and presents it to an operating system on a server as an actual disk drive. Most SANs use the SCSI protocol for communication between servers and disk drive devices, though they do not use its low-level physical interface, instead using a mapping layer. The mapping layer may be implemented, for example, with Fibre Channel (Fibre Channel Protocol or ‘FCP’ is Fibre Channel's SCSI interface), iSCSI (mapping SCSI over TCP/IP), HyperSCSI (mapping SCSI over Ethernet), Advanced Technology Attachment (‘ATA’) over Ethernet, and InfiniBand (supports mapping SCSI over InfiniBand and/or mapping TCP/IP over InfiniBand).
‘Firmware’ is a computer program that is embedded in a hardware device, for example a microcontroller or a read-only memory (‘ROM’) in a server. Firmware is viewed as a computer resource somewhere between hardware and software. Like software, it is a computer program which is executed by a computer, but it is also a piece of hardware. In practical terms, firmware updates can improve the performance and reliability, indeed even the basic available functionality of a server, and many servers benefit from regular firmware updates. Firmware has evolved to mean the programmable content of a hardware device, which can consist of machine language instructions for a server's processor or server configuration settings, for example. A common feature of firmware is that it can be updated post-manufacturing, including electronic updates under program control. Firmware has traditionally been stored in ROM; however cost and performance requirements have driven server manufacturers to adopt various replacements, including non-volatile memory such as EEPROM or ‘Flash memory.’ Examples of firmware include:
All of the servers (104, 106, 152) in the example of
The system of
Stored in RAM (168) is a system management server application program (126), a set of computer program instructions that operate the system management server so as to automatically under program control carry out processes required to manage servers in the data center, including capacity planning, asset tracking, preventive maintenance, diagnostic monitoring, troubleshooting, firmware updates, and so on. An example of a system management server application program (126) that can be improved to apply firmware updates to servers in a data center according to embodiments of the present invention is IBM's ‘Director.’
Also stored in RAM (168) is a server failover module (130), a module of computer program instructions for automatic administration of server failover. The transfer of operation from a failing active server to an available standby server so as to ensure uninterrupted data flow, operability, and data processing services for users of the data center is referred to in this specification as ‘failover.’ Failover is the automated substitution of a functionally equivalent standby server for a failing active server. Failures that lead to failover can include a loss of power to an active server, a memory fault in an active server, a processor defect in an active server, loss of network connectivity for an active server, and so on. The data center (120) in this example provides automated failover from a failing active server to a standby server through the server failover module (130) of the system management server (152). An example of a server failover module that can be improved for applying firmware updates to servers in a data center according to embodiments of the present invention is IBM's ‘Boot From SAN Blade Failover Extension for IBM Director.’
Also stored in RAM (168) is an update module (132), a module of computer program instructions that automates to process of applying firmware updates. The update module may be programmed, for example, to organize firmware updates by machine type and operating system type, track which firmware updates have already been installed on each server in the data center, and apply firmware updates newly received from vendors or manufacturers to servers that have not yet received such new firmware updates. An example of an update module that can be improved for applying firmware updates to servers in a data center according to embodiments of the present invention is IBM's ‘Director Update Manager.’ Also stored in RAM (168) are firmware updates (128), modules of computer program instructions for installation as firmware in servers of the data center.
Under control of the system management server application program (126), the system management server (152) in this example, operates generally to apply firmware updates to servers in the data center (120) by:
The active servers (106) can include a group of active servers (108) of a same type, where the standby server (102) is of the same type as the group of active servers of a same type, and the system management server can repeat the steps of:
once for each active server in the group of active servers of a same type until firmware updates are applied to all the active servers in the group of active server of a same type. The pool of (104) of available standby servers can include standby servers of more than one type—according to machine type and operating system type, for example. The system management server can then begin a firmware update process by first selecting the standby server (102) from the pool (104) of available standby servers in dependence upon the type of active servers in the group of active servers of a same type.
The system management server's applying the firmware updates to the standby server (102) can include booting the standby server (102) through a Preboot Execution Environment (‘PXE’) to an update service of the system management server, for example, an update service of the update management module (132). Booting to the update service means that the first application sought by the standby server (102) for execution after the boot is the update service of the update management module. The PXE is an environment for network booting, booting servers using a network interface card independently of available data storage devices (like hard disks) or installed operating systems. PXE is described in a specification (v2.1) published by Intel and Systemsoft on Sep. 20, 1999. It makes use of several network protocols like IP, UDP, DHCP and TFTP and of concepts like GUID/UUID and Universal Network Device Interface and extends the firmware of the PXE client (the server to be bootstrapped via PXE) with a set of predefined APIs.
Also stored in RAM (168) is an operating system (154). Operating systems useful for applying firmware updates to servers in a data center according to embodiments of the present invention include UNIX™, Linux™, Microsoft XP™, AIX™, IBM's i5/OS™, and others as will occur to those of skill in the art. The operating system (154), the firmware updates (128), the system management server application program (126), the update management module (132), and the server failure module (130) in the example of
The system management server (152) of
The example system management server (152) of
The exemplary system management server (152) of
The arrangement of servers and other devices making up the exemplary system illustrated in
For further explanation,
The method of
In the method of
The method of
The method of
The method of
Now the newly designated standby server has not had its firmware updated. The entire process can be repeated (216), however, applying firmware updates to the newly designated standby server, failing over a new selected active server to the standby server, and replacing the now-updated standby server with the new selected active server—continuing in a loop until all the active servers in the data center have applied firmware upgrades. In the method of
In the method of
may optionally be carried out by a server failover module (130) of a system management server (152). An update management module (132) and a server failover module (130) may cooperate, so that the update management module applies (202) firmware updates to a standby server, selects (204) an active server for further updating, and calls the server failover module (130), identifying the selected active server. The server failover module then fails the selected active server over to the standby server, remaps the boot storage, reboots the standby server and the selected active server, and notifies the update management module of completion of one upgrade. If there are more active servers to have firmware upgrades, the process loops until all active servers needed firmware upgrades have received them.
In view of the explanations set forth above, readers will recognize that the benefits of applying firmware updates to servers in a data center according to embodiments of the present invention include limiting the number of server reboots required to apply a firmware upgrade. For example: For the case of N active servers of a type updated according to embodiments of the present invention beginning with one standby server of the same type, the number of required reboots is N+1.
Exemplary embodiments of the present invention are described largely in the context of a fully functional computer system for applying firmware updates to servers in a data center. Readers of skill in the art will recognize, however, that the present invention also may be embodied in a computer program product disposed on signal bearing media for use with any suitable data processing system. Such signal bearing media may be transmission media or recordable media for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of recordable media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Examples of transmission media include telephone networks for voice communications and digital data communications networks such as, for example, Ethernets™ and networks that communicate with the Internet Protocol and the World Wide Web as well as wireless transmission media such as, for example, networks implemented according to the IEEE 802.11 family of specifications. Persons skilled in the art will immediately recognize that any computer system having suitable programming means will be capable of executing the steps of the method of the invention as embodied in a program product. Persons skilled in the art will recognize immediately that, although some of the exemplary embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present invention.
It will be understood from the foregoing description that modifications and changes may be made in various embodiments of the present invention without departing from its true spirit. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present invention is limited only by the language of the following claims.