The present disclosure relates generally to the field of networks, and, more particularly, to a system and method for failure recovery and load balancing in a cluster network.
As the value and use of information continues to increase, individuals and businesses continually seek additional ways to process and store information. One option available to users of information is an information handling system. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary with regard to the kind of information that is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use, including such uses as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Computers, including servers and workstations, are often grouped in clusters to perform specific tasks. A server cluster is a group of independent servers that is managed as a single system and is characterized by higher availability, manageability, and scalability, as compared with groupings of unmanaged servers. A server cluster typically involves the configuration of a group of servers such that the servers appear in the network as a single machine or unit. Server clusters often share a common namespace on the network and are designed specifically to tolerate component failures and to support the transparent addition or subtraction of components in the cluster. At a minimum, a server cluster includes two servers, which are sometimes referred to as nodes, that are connected to one another by a network or other communication links.
In a high availability cluster, when a node fails, the applications running on the failed node are restarted on another node in the cluster. The node that is assigned the task of hosting a restarted application from a failed node is often identified from a static list or table of preferred nodes. The node that is assigned the task of hosting the restarted application from a failed node is sometimes referred to as the failover node. The identification of a failover node for each hosted application in the cluster is typically determined by a system administrator and the assignment of failover nodes to applications may be made well in advance of an actual failure of a node. In clusters with more than two nodes, identifying a suitable failover node for each hosted application is a complex task, as it is often difficult to predict the future utilization and capacity of each node and application of the network. It is sometimes the case that, at the time of a failure of a node, the assigned failover node for a given application of the failed node will be at or near its processing capacity and the task of hosting of an additional application by the identified failover node will necessarily reduce the performance of other applications hosted by the failover node.
In accordance with the present disclosure, a system and method for failure recovery in a cluster network is disclosed in which each application of each node of the cluster network is assigned a preferred failover node. The dynamic selection of a preferred failover node for each application is made on the basis of the processor and memory requirements of the application and the processor and memory usage of each node of the cluster network.
The system and method disclosed herein is advantageous because it provides for load balancing in multi-node cluster networks for applications that must be restarted in a node of the network following the failure of another node in the network. Because of the load balancing feature of the system and method disclosed herein, an application from a failed node can be restarted in a node that has the processing capacity to support the application. Conversely, the application is not restarted in a node that is operating near its maximum capacity at a time when other nodes are available to handle the application from the failed node. The system and method disclosed herein is advantageous because it evaluates the load or processing capacity that is present on a potential failover node before assigning to that node the responsibility for hosting an application from a failed node.
Another technical advantage of the present invention is that the load balancing technique disclosed herein can select a failover node according to an optimized search criteria. As an alternative to assigning the application to the first node that is identified as having the processing capacity to host the application, the system and method disclosed herein is operable to search for the node among the nodes of the cluster network that has the most available processing capacity. Another technical advantage of the system and method disclosed herein is that the load balancing technique disclosed herein can be automated. Another advantage of the system and method disclosed herein is that the load balancing technique can be applied in a node in advance of the failure of the node and a time when the processor usage in the node meets or exceeds a defined threshold value. Other technical advantages will be apparent to those of ordinary skill in the art in view of the following specification, claims, and drawings.
A more complete understanding of the present embodiments and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
For purposes of this disclosure, an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer, a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communication with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, and a video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components. An information handling system may comprise one or more nodes of a cluster network.
Enclosed herein is a dynamic and self-healing recovery failure technique for a cluster environment. The system and method disclosed herein provides for the intelligent selection of failover nodes for applications hosted by a failed node of a cluster network. In the event of a node failure, the applications hosted by the failed node of the cluster network are assigned or failed over to the selected failover node. A failover node is dynamically preassigned for each application of each node of the cluster network. The failover nodes are selected on the basis of the processing capacity of the operating nodes of the network and the processing requirements of the applications of the failed node. Upon the failure of a node of the cluster network, each application of the failed node is restarted on its dynamically preassigned failover node.
Shown in
The resource manager 20 of each node measures the processor and memory usage of each of the applications hosted by the node. Resource manager 20 also measures the collective processor and memory usage of all applications and processes on the node. Resource manager 20 also measures the current processor and memory usage of each application on the node. Resource manager 20 also identifies and maintains a record of the processor and memory utilization requirements of each application hosted by the node. Each application failover manager 18 of each node receives from resource manager 20 (and via an application failover manager decision table on shared storage) information concerning the processor and memory usage of each node; information concerning the processor and memory usage of each application on the node; and information concerning the processor and memory utilization requirements of each application on the node. With this information, the application failover manager is able to identify on a dynamic basis for service module 16 a failover node for each application hosted at the node. For each application of the node, failover manager 18 is able to identify, as a failover node, the node of the cluster network that has the maximum amount of available processor and memory resources.
Each server node 14 is coupled to shared storage 22. Shared storage 22 includes an application failover manager decision table 24. Application failover manager decision table 24 is a data structure stored in shared storage 22 that includes data reflecting the processor and memory usage of each node and the processor and memory utilization requirements of each application of each server node of the cluster network. Shown in
The content of the application failover manager decision table 24 is provided by the resource manager 20 of each server node 14. On a periodic basis, resource manager 20 of each node writes to the application failover manager decision table to update the processor and memory usage of the node and the processor and memory requirements of each application in the node. Because of the periodic writes to the application failover manager decision table by each node, the application failover manager decision table includes an accurate and recent snapshot of the processor and memory usage and requirements of each node (and the applications in the node) in the cluster network. Application failover manager decision table 24 can also be read by each application failover manager 18. As an alternative to storing AFM decision table 24 in shared storage 22, a copy of the AFM decision table could be stored in each of the server nodes. In this arrangement, an identical copy of the AFM decision table is placed in each of the server nodes. Any modification to the AFM decision table in one of the server nodes is propagated through a network interconnection to the other server nodes. The flow of data between the modules of the system and method disclosed herein is shown in
Shown in
At step 34 of
At step 40, it is determined if the number of suitable nodes is zero. If the number of suitable nodes is greater than zero, i.e., the number of suitable nodes is one or more, the flow diagram continues with the selection at step 42 of the suitable node that has the most processor availability. At step 44, the selected node is identified as the preferred failover node for the application. The identification of the preferred failover node may be recorded in a data structured maintained at or by application failover manager 18. The identification of the preferred failover node may also be sent to service module 16 of the node, as the service module of the failed node generally assumes the responsibility of restarting each application of the failed node on the respective failover nodes. If it is determined at step 40 that the number of suitable nodes is zero, processing continues with step 41, where a selection is made of the node (not including the current node) that has the most processor availability. At step 44, the node selected at step 41 is identified as the preferred failover node for the application.
Following the selection of the preferred failover node for the application, the local copy of the application failover manager decision table must be updated to reflect that an application of the current node has been assigned a preferred failover node. Following step 44, a portion of the processor and memory availability of a preferred failover node has been pledged to an application of the current node. The reservation of these resources for this application should be considered when assigning preferred failover nodes for the remainder of the applications of the current node. Each previous assignment of a preferred failover node for an application of the current node is therefore considered when assigning a preferred failover node to any of the remainder of the applications of the current node. If the local copy of the decision table is not updated to reflect previous assignments of preferred failover nodes to applications of the current node, each application of the current node will be considered in isolation, with the possible result that one or more nodes of the cluster network could become oversubscribed as the preferred failover node for multiple applications of the current node. At step 46, the local copy of the application failover manager decision table is updated to reflect the addition of the current processor usage of the assigned application to the processor usage of the preferred failover node. At step 48, the local copy of the decision table is updated to reflect the addition of the current memory usage of the assigned application to the memory usage of the preferred failover node. In sum, the local copy of the decision table is updated with the then current usage of the assigned application. Following steps 46 and 48, the decision table reflects the usage that would likely exist on the preferred failover node following the restarting on that node of those applications that have been assigned to restart or fail over to that node.
At step 50, it is determined if the present node includes additional applications that have not yet been assigned a preferred failover node. If the current node includes applications that have not yet been assigned a preferred failover node since the initiation of the assignment process at step 30, the next following application is selected at step 51, and the flow diagram continues with the comparison step of step 38. The step of selecting an application of the current node for assignment of a preferred failover node may be accomplished according to a priority scheme in which the applications are ordered for selection and assignment of a preferred failover node according to their processor utilization requirements; the application that has the highest processor utilization requirement is selected first for the assignment of a preferred failover node, and the application that has the lowest processor utilization requirement is selected last for assignment. Assigning a priority to those applications that have a higher processor utilization requirement may assist in identifying an application failover node for all applications, as such a selection scheme may avoid the circumstance in which failover assignments for a number of applications having lower utilization requirements are made to various nodes of the cluster network. As a result of these previous assignments, some or all nodes of the cluster network may be unavailable for the assignment of an application of a node having a higher utilization requirement. Placing an assignment priority on those applications having the highest resource utilization manages the allocation of preferred failover nodes in a way that attempts to insure that each application will be assigned to a failover node that is able to accommodate the utilization requirements of the application.
As an alternative to a priority scheme in which the application having the highest processor utilization requirement is selected first for assignment, the applications of a node could be selected for assignment according to a priority scheme that recognizes the business importance of the applications or the risk associated with shutting down or reinitiating the application. The selection of a prioritization scheme for assigning failover nodes to applications of the node may be left to a system administrator. If it is determined at step 50 that all applications of the current node have been assigned a preferred failover node, the process of
Shown in
Following the setting of a failover flag at step 66, an application is selected at step 68. The application that is selected at step 68 is an application with a low level of processor usage or memory usage. The selection step may involve the selection of the application that has the lowest processor usage or the lowest memory usage. As an alternative to selecting the application that has the lowest processor usage or the lowest memory usage, an application could be selected according to a priority scheme in which the application having the lowest priority is selected. The selection of an application for migration to another node will result in the application being down, at least for a brief period. As such, applications that, for business or technical reasons, are required to be up are assigned the highest priority, and applications that are best able to be down for a period are assigned the lowest priority. Once an application is identified, a preferred failover node for the selected application is determined at step 70. The identification of a preferred failover node at step 70 can be performed by the selection process set out in the steps of
The system and method described herein may be used with clusters having multiple nodes, regardless of their number. Although the present disclosure has been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and the scope of the invention as defined by the appended claims.