1. Field of the Invention
The present invention relates generally to an improved data processing system, and in particular to a computer implemented method, data processing system, and computer program product for dynamically replacing a network adapter with minimal or no communications downtime.
2. Description of the Related Art
A network adapter is a piece of computer hardware which provides an interface between a computer and a network, such as a local area network (LAN). The network adapter allows computers to communicate over a network by controlling the transmission and receiving of data between the physical level (layer 1 of the Open Systems Interconnection (OSI) model) and the data link level (layer 2 of the OSI model). A media access control (MAC) address is attached to each network adapter and serves as a unique identifier for each adapter.
Network adapters may be replaced when the adapters fail in some way. For example, an adapter may fail completely, wherein no data is transmitted or received by the device, or the adapter may still be operating, but only partially. In these circumstances, the network adapter may be replaced to reestablish or improve the connectivity. One existing technology, Etherchannel, is a network aggregation technology which combines the bandwidth of multiple Ethernet adapters into a single logical link to increase the link speed beyond the limits of any one single cable or port and load balance traffic across those links. Etherchannel (IEEE 802.3ad) provides the capability of dynamically allocating and removing network adapters assigned to an Etherchannel link aggregation group. Implementing Etherchannel technology offers other advantages when removing adapters, including not having to modify the interface presented to the stack, as well as having no interruption in connectivity. However, in order to dynamically allocate and remove network adapters in a live system, Etherchannel and similar methods require planning and configuring multiple adapters under an Etherchannel Link Aggregation group at system deployment time. Thus, while Etherchannel allows for removing an adapter without losing connectivity, Etherchannel requires having a link aggregation group setup prior to removal or de-allocation of an adapter, and consequently requires that the user prepare ahead of time for an adapter replacement by preconfiguring the adapter at system setup time. In addition, Etherchannel does not provide for dynamic replacement of an adapter (i.e., switching a failed adapter with a new adapter), but only provides a failover mechanism that applies the load of the failed adapter to the other adapters in the aggregation group.
In the current art, the process of replacing network adapters is limited to performing multiple combinations of manual steps to remove the network adapter and then add the replacement adapter. The advantages of using these existing manual adapter replacement methods include needing only one network adapter in the system at setup time, implementing standard manual device removal and allocation, and having no requirements on the Ethernet switch. However, these manual adapter replacement methods are prone to operator error, and can exponentially increase connectivity downtime while an adapter is being replaced. For instance, in the manual adapter replacement methods, the operator or administrator will need to stop any critical traffic running over the interface before removing the interface. In addition, interface unconfiguration and adapter/driver state machine closure is required before a HotPlug adapter operation can occur.
The illustrative embodiments provide a computer implemented method, data processing system, and computer program product for dynamically replacing a network adapter with minimal or no communications downtime. When a notification to replace a first network adapter is received, the process in the illustrative embodiments detects a replacement network adapter and a network interface corresponding to the replacement network adapter. The process configures a replacement network adapter, and pauses all communications to the first network adapter by dropping all incoming data packets to the first network adapter. The network interface corresponding to the first network adapter is redirected to point to the replacement network adapter. The first network adapter and the network interface corresponding to the replacement network adapter are then removed. Communication flow to the replacement network adapter is restored.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures and in particular with reference to
In the depicted example, server 104 and server 106 connect to network 102 along with storage unit 108. In addition, clients 110, 112, and 114 connect to network 102. Clients 110, 112, and 114 may be, for example, personal computers or network computers. In the depicted example, server 104 provides data, such as boot files, operating system images, and applications to clients 110, 112, and 114. Clients 110, 112, and 114 are clients to server 104 in this example. Network data processing system 100 may include additional servers, clients, and other devices not shown.
In the depicted example, network data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other computer systems that route data and messages. Of course, network data processing system 100 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN).
With reference now to
In the depicted example, data processing system 200 employs a hub architecture including a north bridge and memory controller hub (NB/MCH) 202 and a south bridge and input/output (I/O) controller hub (SB/ICH) 204. Processing unit 206, main memory 208, and graphics processor 210 are coupled to north bridge and memory controller hub 202. Processing unit 206 may contain one or more processors and even may be implemented using one or more heterogeneous processor systems. Graphics processor 210 may be coupled to the NB/MCH through an accelerated graphics port (AGP), for example.
In the depicted example, local area network (LAN) adapter 212 is coupled to south bridge and I/O controller hub 204 and audio adapter 216, keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224, universal serial bus (USB) and other ports 232, and PCI/PCIe devices 234 are coupled to south bridge and I/O controller hub 204 through bus 238, and hard disk drive (HDD) 226 and CD-ROM 230 are coupled to south bridge and I/O controller hub 204 through bus 240. PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 224 may be, for example, a flash binary input/output system (BIOS). Hard disk drive 226 and CD-ROM 230 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. A super I/O (SIO) device 236 may be coupled to south bridge and I/O controller hub 204.
An operating system runs on processing unit 206 and coordinates and provides control of various components within data processing system 200 in
Instructions for the operating system, the object-oriented programming system, and applications or programs are located on storage devices, such as hard disk drive 226, and may be loaded into main memory 208 for execution by processing unit 206. The processes of the illustrative embodiments may be performed by processing unit 206 using computer implemented instructions, which may be located in a memory such as, for example, main memory 208, read only memory 224, or in one or more peripheral devices.
The hardware in
In some illustrative examples, data processing system 200 may be a personal digital assistant (PDA), which is generally configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data. A bus system may be comprised of one or more buses, such as a system bus, an I/O bus and a PCI bus. Of course the bus system may be implemented using any type of communications fabric or architecture that provides for a transfer of data between different components or devices attached to the fabric or architecture. A communications unit may include one or more devices used to transmit and receive data, such as a modem or a network adapter. A memory may be, for example, main memory 208 or a cache such as found in north bridge and memory controller hub 202. A processing unit may include one or more processors or CPUs. The depicted examples in
Turning now to
Network adapter 300 also includes electrically erasable programmable read-only memory (EEPROM) interface 308, register/configure/status/control unit 310, oscillator 312, and control unit 314. EEPROM interface 308 provides an interface to an EEPROM chip, which may contain instructions and other configuration information for network adapter 300. Different parameters and setting may be stored on an EEPROM chip through EEPROM interface 308.
Register/configure/status/control unit 310 provides a place to store information used to configure and run processes on network adapter 300. For example, a timer value for a timer may be stored within these registers. Additionally, status information for different processes also may be stored within this unit. Oscillator 312 provides a clock signal for executing processes on network adapter 300.
Control unit 314 controls the different processes and functions performed by network adapter 300. Control unit 314 may take various forms. For example, control unit 314 may be a processor or an application-specific integrated chip (ASIC). In these examples, the processes of the present invention used to manage flow control of data are executed by control unit 314. If implemented as a processor, the instructions for these processes may be stored in a chip accessed through EEPROM interface 308.
Data is received in receive operations through Ethernet interface 302. This data is stored in data buffer 304 for transfer onto the data processing system across PCI bus interface 306. For example, the data may be transferred onto a bus using a PCI local bus or via ICH 210 in
The illustrative embodiments provide a mechanism for replacing a network adapter in a data processing system with minimal or no communications downtime. In particular, the mechanism described in the illustrative embodiments allows for unplanned and also uninterrupted replacement of network adapters using the Common Data Link Interface (CDLI). Common Data Link Interface is used by AIX® as the interface between network adapters and the AIX® stack and serves as a layer through which all adapter communications and interfaces to the host operating system occur. AIX® (Advanced Interactive executive) is a UNIX operating system and is a product of IBM® Corporation. The illustrative embodiments exploit the relationship between the Common Data Link Interface and the network adapters and AIX stack to create the necessary functionality to dynamically replace an adapter in a live system. This unplanned adapter replacement feature does not requiring any prior planning or setup, such as required by Etherchannel. As previously mentioned, Etherchannel requires planning and configuring multiple adapters under an Etherchannel Link Aggregation group at system deployment time in order to dynamically allocate and remove network adapters in a live system. Additionally, in contrast with existing manual adapter replacement methods, the unplanned adapter replacement feature in the illustrative embodiments effectively provides a means to replace network adapters with no user intervention and a reduced error window. In other words, the process steps in the illustrative embodiments may be implemented independently of user interaction, thereby eliminating operator (user) errors often encountered in the manual adapter replacement methods. It is also possible to encapsulate an upgrade to an improved network adapter using the unplanned adapter replacement feature without interfering with connectivity.
In
Common Data Link Interface 404 is an interface layer between stack 402 and adapter 0410 and serves to encapsulate network adapter 0410 from stack 402. All adapter communications and interfaces to the host operating system pass through Common Data Link Interface 404. Common Data Link Interface 404 is modified to enable the interface to be notified when the system administrator reallocates adapter resources in the data processing system.
Network interface (en0) 406 provides an interface layer between common data link interface 404 and adapter driver (ent0) 408. When a network adapter is physically installed on the data processing system, the operating system automatically adds the appropriate network interface for the adapter. In this particular instance, when adapter 0410 was installed, the operating system assigned adapter 0410 the name (ent0), and also added network interface (en0) 406 to the adapter. Consequently, stack 402 may only access adapter 0410 via network interface (en0) 406.
Driver (ent0) 408 is a program which controls the communications between I/O device adapters and the processor adapter 0410. Driver (ent0) 408 enables stack 402 to interact transparently with adapter 0410 by providing commands to and/or receiving data from adapter 0410. Driver (ent0) 408 translates functions calls from the stack into device specific calls.
Adapter 0410 is a network adapter, such as network adapter 300 in
Device description entry 412 specifies configuration information of network adapter 0410, and in particular, the physical dependencies of the adapter. In one embodiment, device description entry 412 may be a Custom Description (CuDv) object class file which contains customized system objects describing the network adapter's current configuration. In this example, device description entry 412 indicates that the physical dependencies of adapter 0410 include en0 and ent0. Device description entry 412 may be managed by an Object Data Manager (ODM), which is a set of utilities employed by AIX® to manage configuration information. ODM may create, edit, and remove CuDv objects in a file.
As shown in
In
In
In
Removal of adapter 0410 in
Once the device description entry has been modified, the Media Access Control (MAC) address on the adapter is updated to reflect the adapter replacement. In one embodiment, the MAC address may be updated by overriding the unique MAC address assigned to the new adapter with the MAC address of the replaced adapter, thereby allowing the MAC address to remain the same for both adapters. Alternatively, the MAC address may be updated by broadcasting the new adapter's MAC address to the other devices in the network to reflect the adapter replacement. An Address Resolution Protocol (ARP) packet is transmitted to update the network switch. ARP broadcasts the MAC address assigned to the new adapter to other devices in the network. When the network switch is updated with the MAC address, the communication flow from the operating system or applications over network interface (en0) 406 to replacement network adapter 414 is restored and will continue in a normal manner.
While this particular example illustrates a replacement operation where there is an empty slot available to receive the replacement adapter, the illustrative embodiments also allow for replacing a network adapter when no empty slots are available. In this single adapter/slot replacement scenario, the existing adapter is first removed from its slot, and then the replacement adapter is inserted into that slot. Although this process results in a loss of connectivity for a short period of time, the replacement operation in the illustrative embodiments is still advantageous over existing adapter replacement methods based on reduced replacement time. The connectivity down time from when the adapter is physically removed in the single adapter/slot replacement scenario to the time the adapter is replaced may be consistent with the connectivity down time in the manual adapter replacement methods. However, the administrative time for removal and replacement may be decreased significantly using the illustrative embodiments. With the illustrative embodiments, the operator or administrator is not required to stop critical traffic, remove the interface, or close the adapter device driver. With existing adapter replacement methods, all of these steps are required to be performed manually, and use up a varying amount of time. Furthermore, the additional steps and user interaction required by the existing adapter replacement methods introduce the possibility of operator errors, in which case the replacement time would be increased exponentially. By implementing the process described in the illustrative embodiments, a user is only required to enter one or very few commands (implementation specific) to execute the same operations performed by the existing replacement methods, without having to stop critical traffic, remove the interface, or close the adapter device driver.
The process begins when a new adapter is placed in the data processing system and connected to a network switch (step 502). The replacement operation is then initiated by providing an instruction via command line, smit/WebSM, or HMC to the data processing system to remove a particular adapter and replace it with the new adapter (step 504).
The Command Data Link Interface executes a configuration command which detects the new adapter (adapter 1) and the new adapter's corresponding network interface (en1) (step 506). Upon detecting the new adapter, the Command Data Link Interface and the network interface (en0) drops all incoming traffic directed to the original adapter interface (e.g., adapter 0) (step 508). The network interface (en0) of the original adapter is redirected to point to the new adapter 1 via driver interface (ent1) (step 510).
The original adapter (adapter 0), its driver interface (ent0), and the new adapter's network interface (en1) are then removed (step 512). Removal of the adapter also removes the adapter device description entry on the object data manager. A change to the object data manager is performed to force the new driver interface (ent1) to occupy the device description entry of the previous driver interface ent0 (step 514).
The MAC address on the adapter is updated (step 516), and an ARP packet is transmitted to update the network switch (step 518). At this point, traffic over the network interface (en0) may be restored (step 520). The communication flow from the operating system or applications will continue in a normal manner (step 522), with the process terminating thereafter.
The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any-instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.