The present disclosure relates generally to computer networks, and, more specifically, to a system and method for managing virtual machines in a computer network.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to these users is an information handling system. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may vary with respect to the type of information handled; the methods for handling the information; the methods for processing, storing or communicating the information; the amount of information processed, stored, or communicated; and the speed and efficiency with which the information is processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include or comprise a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Computer systems, including servers and workstations, are often grouped in clusters to perform specific tasks. A server cluster is a group of independent servers that is managed as a single system and is characterized by high availability, manageability, and scalability, as compared with groupings of unmanaged servers. At a minimum, a server cluster includes two servers, which are sometimes referred to as nodes.
In server clusters designed for high availability applications, each node of the server cluster is associated with a standby node. When the primary node fails, the application or applications of the node are restarted on the standby node. Each of the primary node and the standby node may include one or more virtual machines. Each virtual machine typically includes an application, operating system, and all necessary drivers. The virtual machines run on virtualization software that executes on the host operating system of the node. In operation, each virtual machine resembles an encapsulated file. A single node may include multiple virtual machines, and each virtual machine could be dedicated to the handling of a single task. As an example, one virtual machine on a node could be mail server, while another virtual machine present on the same physical server could be a file server. With respect to virtual machines, the virtual machines may be organized such that one virtual machine is an active virtual machine and a second virtual machine is the standby virtual machine. The active virtual machine may reside on the same physical node, or the active virtual machine and the standby virtual machine may reside on separate physical nodes.
When a node of the cluster fails, the applications of the failed node must be restarted on the surviving or standby node. Often, the reinstantiation of applications of the failed node on the standby node requires that the restarted applications be provided access to resources that were present on the failed node. Often the process of restarting, or failing over, an application from a failed node to a standby node results in the loss of current state of the application. As an example, some or all of the current transactions of the application may be lost during the failover process. In the case of a failed node that includes one or more virtual machines, the current state of one or more of the virtual machines could be lost during the failover process.
In accordance with the present disclosure, a system and method is disclosed for the management of virtual machines in the nodes of a cluster network. An active virtual machine and a standby virtual machine are provided. In operation, a delta file is periodically created in the active node. The delta files include an indication of the changes between the virtual machine as measured at the present and at a preceding point in time. The delta files are transmitted to a standby virtual machine, where the files are applied to the standby virtual machine to synchronize the content of the active virtual machine and the standby virtual machine. The active virtual machine may reside in an active node, and the standby virtual machine may reside in the standby node. In the event of a failure in the active node, the standby virtual machine of the standby node is converted to an active virtual machine.
The system and method disclosed herein is technically advantageous because it enhances failover performances and minimizes downtime in the operation of virtual machines in high availability cluster server environments. Because an identical or near identical copy of the virtual machine of the active node also exists in the standby node, the standby node can serve as a failover node in the event of a failure to the active node. In the event of such a failure, downtime is minimized or eliminated entirely, as both nodes include an identical or a near identical copy of the entire virtual machine. In the event of a failure, the standby node can be used very quickly, as applications of the virtual machine do not need to be restarted in the standby node, and resources do not need to be reallocated in the standby node. In addition, IP addresses used by the virtual machine do not need to be rebounded, and clients of the virtual machine do not have reissue requests to the virtual machine.
Another technical advantage of the system and method disclosed herein is the system and method disclosed herein is transparent to clients or users of the server nodes, including clients or users of the virtual machines of the server nodes. In operation, the user or client is not aware that incremental changes to a virtual machine are being logged and applied to a virtual machine in a standby node. Because an identical or near identical version of the virtual machine is present on the standby node, the user may also not be aware that a failure has occurred in the active node. Because a virtual machine of a failed node can be restarted quickly at a virtual node, and with the same content as existed in failed node, the user may not be aware that a failure has occurred in the failed node. Other technical advantages will be apparent to those of ordinary skill in the art in view of the following specification, claims, and drawings.
A more complete understanding of the present embodiments and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
For purposes of this disclosure, an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer, a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communication with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, and a video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components. An information handling system may comprise one or more nodes of a cluster network.
The system and method disclosed herein provides a method for managing the virtual machines of a node in preparation for a potential failure of the node. A standby virtual machine is maintained on the standby node. As incremental changes are made to the virtual machine of the active node, those incremental changes are logged and periodically applied to the standby node. In the event of a failure of the active node, the current state or the near current state of the virtual machine is present on the standby node. Shown in
The servers of
Each of the server nodes includes a virtual machine 24. Virtual machine includes application software an emulated version of a computer system, including an emulated version of the hardware and operating system of a computer system. From the perspective of a user of the server node, the presence of a virtual machine permits a user to execute the application within an emulated computing environment. From the perspective of the virtualization layer or the physical server node, the virtual machine resembles a single file or data structure. In operation active virtual machine 24A and standby virtual machine 24B identical. Virtual machine 24B can by creating a clone of virtual machine 24B. The process of creating clones of virtual machines is described in U.S. application Ser. No. 10/984,397, which is titled “System and Method for Hot Cloning in a Distributed Network,” which is incorporated herein by reference in its entirety. At the time that the clone is made of the active virtual machine, the active virtual machine and the standby virtual machine are in sync, as the content of each is identical.
Log generator 28 is a software utility that takes incremental snapshots of the differential content of the data structure or file comprising the active virtual machine 24A. A differential snapshot is a log file that identifies the difference between the virtual machine at a first point in time and the virtual machine at an immediately preceding point in time. A representation of a log file is shown at 26. The differential snapshot is defined as the difference in the file image of the active virtual machine at time t+x and the file image of the active virtual machine at time t. The differential snapshot is sometimes referred to as a delta file because the file represents the difference between the active virtual machine at two points in time. Log generator 28 may produce differential snapshots of the active virtual machine at regular timed intervals. Log generator 28 could also be configured to generate a differential snapshot of the active virtual machine each time that the active virtual machine is modified. The creation of log files is accomplished such that each modification to the active virtual machine is recorded in a log file. The delta files are received on the active node by a log transport module 30. The log transport module collects the delta files and periodically transmits the files to the standby node. The transmission of the delta files between the active node and the standby node can occur through a communication link between the two nodes. One example of a suitable communications link is communications link 38 between the network interface cards 36 of each node.
In standby node B, the delta files are received at log receiver module 34. Log receiver module 34 transmits the log files 26 to a log applicator module 32. The function of the log applicator module 32 is to periodically apply the log files to the content of the standby virtual machine 24B so that the content or file image of the standby virtual machine is a duplicate or near duplicate of the content or file image of the active virtual machine. The process of creating a log file of the active virtual machine at the active node, transmitting the log file to the standby node, and updating the content of the standby virtual machine at the standby node is repeated every few seconds to ensure that the content of the active virtual machine and the standby virtual machine are the same or nearly the same. Shown in
Shown in
The status of the active node is monitored by a failover or heartbeat utility that operates on each of the nodes and communicates through a communications link between the two nodes. As one example, the failover or heartbeat utility may communicate between the nodes through the communications link 38, which is coupled between the network interface cards 36 of each node. If the failover utility determines that the active node has failed and is not responding to the failover utility, the standby virtual machine 24B replaces the active virtual machine 24A of the active node and receives all requests and communications from the clients of the failed active node 24A. From the perspective of the user, the transition from the active virtual node to the standby virtual node is seamless and transparent. The client is not aware that a transition has occurred, and the client, in most instances, is not required to reissue any requests to the standby virtual node.
Because the failover process described herein involves the instantaneous and seamless transition between virtual machines, the system and method described herein may be used in the case of high availability virtual machines. In addition, the system and method described herein may be used with virtual machines that are not cluster aware. The virtual machines need not be aware that differential files are being created for the purpose of creating and maintaining an identical standby virtual machine in a standby node. The system and method disclosed herein may also be used in disaster recovery applications in which it is desirable to have a standby version of an active virtual machine. It is expected that, in some situations, an additional software license may not be needed for the standby virtual machine. Until the standby virtual machine is activated, a license may not be necessary for the standby virtual machine.
The system and method disclosed herein is not limited in its application to the computer network architecture disclosed herein. The system and method described herein may be used in computer networks having multiple servers and in computer networks in which one or more of the servers includes multiple virtual machines. It should also be recognized that the system and method disclosed herein may be employed in an environment in which the active virtual machine and the standby virtual machine are employed on the same physical node. The failover and synchronization steps of the present disclosure can be implemented in an architecture in which the virtual machines are implemented on a single physical node. Although the present disclosure has been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and the scope of the invention as defined by the appended claims.