1. Technical Field
Tis application relates generally to computer system management, and more particularly to a distributed high availability system and method.
2. Description of the Related Art
A cluster is a group of servers and other resources that act like a single system. Clusters currently function to provide high availability to applications and services. When applications or services are defined as part of a cluster, they become highly available because the cluster software continuously monitors their status and lets the applications failover between nodes if there are problems. High availability minimizes down time for applications such as databases and web servers. Briefly, nodes refer to addressable devices attached to a computer network, typically computer platforms or hardware running operating systems and various application services. A clustering service may define the association of nodes with the cluster.
Clusters typically require that all systems within the cluster be within a tightly confined area, often within the same room so that all systems may utilize relatively low-speed communication and data transfer hardware. Clusters can thus become susceptible to single point failures such as power or network failures within a facility, building, or the general area in which the systems in the cluster are located. Although the cluster may be aware of what has happened, it cannot react or reduce downtime because the cause of the failure is in the sustaining systems, not in the hardware or software involved in the clusters. A sustaining system, for example, is an infrastructure or any entity, which may be required in order to ensure that the hardware and software that provide a given service can function properly. Examples of a sustaining system may include, but are not limited to, national electrical power grids, high-speed network infrastructures, and communications infrastructures, etc.
Further, while clusters may provide better service than traditional servers and may be capable of handling bottlenecks when there is a requirement to be able to distribute or transport large amounts of data to and from client users, even in clustered systems, data may still need to be transported over long distances.
A network of computer resources includes a plurality of heterogeneous nodes. Each node meets predetermined minimum standards. The nodes are interconnected, either directly or indirectly, to one another over a high-speed data network. A distributed service layer circulates status data pertaining to the plurality of heterogeneous nodes throughout the interconnection of nodes.
A method for utilizing a network of computer resources. A plurality of heterogeneous nodes is interconnected. Each node meeting predetermined minimum standards. The nodes are interconnected either directly or indirectly, to one another over a high-speed data network. Status data pertaining to the plurality of heterogeneous nodes is circulated throughout the interconnection of nodes.
A more complete appreciation of the present disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
In describing the preferred embodiments of the present disclosure illustrated in the drawings, specific terminology is employed for sake of clarity. However, the present disclosure is not intended to be limited to the specific terminology so selected, and it is to be understood that each specific element includes all technical equivalents which operate in a similar manner.
A distributed high availability system (DHAS) according to an embodiment of the present disclosure distributes a plurality of elements of a scattered persistent availability network (SPAN) to various geographic areas. Accordingly, DHAS may avoid or reduce system failures occurring as a result of geographically centered outages.
DHAS, according to one embodiment of the present disclosure, provide cluster-like functionality and high availability across heterogeneous nodes from multiple vendors with the added ability of geographically dispersed locations. This approach allows business critical applications to remain highly available without dependency to a specific vendor cluster solution. DHAS works to minimize downtime and makes optimal use of an underlying network to ensure that the target application is continuously functional and available. DHAS enables applications to become fault tolerant and thus highly available, for example, without the necessity of being in a cluster environment. Applications further benefit from the ability to be geographically separated because the whole of the SPAN may be shielded from local failures such as network outages and/or power outages.
DHAS, according to an embodiment of the present disclosure, also provides a grid-like computing environment by distributing the application to load across the target network. In addition to the speed gained due to parallel processing, SPANs enhance the ability to distribute information more quickly due to the possibility of having nodes in the SPAN closer to a user than a traditional server or cluster.
No specific hardware requirements may be necessary for DHAS. Rather, these requirements may be dictated by the service being provided. For example, if a given service requires large amounts of memory, hardware with large memory capacity may be used. DHAS, according to an embodiment of the present disclosure, may act as a thin service providing fault-tolerance and high availability to any enterprise.
According to an embodiment of the present disclosure, all nodes within the DHAS are able to communicate with all other nodes, either directly or indirectly. For example, there may be multiple paths of communication between nodes. Each of the multiple paths may be linked to a different node in the SPAN. In another example, a node may be linked to another node in the SPAN indirectly by a link to an intermediary node in the SPAN. A consistent SPAN may therefore be maintained and any single node may be prevented from becoming a point of failure. All nodes may use a high-speed communications network and all nodes may be able to access the same shared storage, whether a large distributed Storage Area Network (SAN) or a newly developed technology. The nodes in the SPAN running DHAS may be heterogeneous, for example, able to run under different platforms.
Having high-speed inter-node communication may provide fast responses to failures and bottlenecks. A fast method of communication may ensure, for example, that within seconds of a SPAN-wide event another node becomes aware of the event and responds appropriately. An example of a high-speed communications system includes, but is not limited to, 100 MBps Ethernet.
DHAS, according to an embodiment of the present disclosure, may include high-speed shared and/or distributed storage as a way to access data needed for a service running under the DHAS as part of the service's functionality. Examples of high-speed shared and/or distributed storage include, but are not limited to, high-speed storage networked to a large-area SAN. The system for accessing the high-speed network storage, for example, may be provided by an operating system that the nodes are running.
For example, according to an embodiment of the present disclosure, node information may be maintained using a SPAN-wide heartbeat. The heartbeat may be a set of data that is circulated among all the nodes in the SPAN.
Once the nodes are identified, a cycle among the nodes may be established, for example, as a path from one node to another in the SPAN 114-126 (
The node may then wait for incoming connections (Step S314). The node can receive a heartbeat. The new node may be contacted by the heartbeat as the heartbeat runs through the cycle of the nodes in the SPAN. When the heartbeat, during its circulation through the nodes, reaches this new node, the heartbeat may be updated with the information about the node. For instance, the DSSL on that node may update the heartbeat with the needed information. This information may include, but is not limited to, the number of nodes, the IP address of the nodes, and the statuses of the nodes within the SPAN.
For example, a join request may be received from another node that subsequently gets started. In one embodiment, a join request may be made from and to any node within the SPAN. If there are no nodes running the service, a node that is started initially may make up the SPAN. When another node joins the SPAN, a heartbeat may be circulated and updated.
A node can receive either a heartbeat or a join request. If a heartbeat is received (Yes, Step S316) the information about the nodes that are making the contact is received from the previous node that sent the heartbeat, and is updated with current node information. Current node information may include, for example, the nodes currently known to the contacted node and/or information about nodes that have joined. The heartbeat may then be sent to the next node (Step S318). If no heartbeat is received (No, Step S316), it may be determined whether a join request has been received (Step S320). If a join request is received (Yes, Step S320) the request may be granted and a new node added to the SPAN, for example, by collecting information about the new node (Step S322) and informing other nodes in the SPAN about the new node (Step S324). If no heartbeat has been received (No, Step S316) and no join request has been received (No, Step S320) then the node may continue to wait for incoming connections (Step S314).
In trying to establish a connection to that node again, if a connection to the receiving node (
The above-described method may ensure that a single heartbeat is circulated throughout the SPAN without failure. Each node that is transmitting may be responsible to its previous node, thereby forming a circular dependency. There may be provisions to ensure that deadlocks do not occur. For example, a predetermined amount of time to wait for a response may be set and if the response is not received, move to the next node.
DCSL component (214
DHAS notification service module 512 may allow clients to request real-time notification of events within the SPAN. These events can include a notice when a node has joined the SPAN and a notice of changed status of a node within a SPAN. These events or notifications may be retrieved directly from the DHAS notification module 512, for example, via the API 510, or via a forwarded event notification system, a part of data transport module 520, that operates in a manner similar to the heartbeat.
According to an embodiment of the present disclosure, every node may maintain a set of resource groups 514 that are active within the SPAN. A resource group is a logical entity defined for a particular application or service, which contains within it the resources needed in order to provide that service to clients 518. An example of such a service is a database for a product order system. The database server and the resources the database server needs for its operations may be within the resource group 514. An example of a resource may be a shared disk or IP address. When a resource group 514 is started, the resource group may start all of its required resources and the associated service through user-defined means, for example, as defined in the resource group. Then the resource group may notify the rest of the nodes in the SPAN that it has been started. If the resource group fails, for example, because the node loses power unexpectedly, another node in the SPAN may restart the resource group locally to ensure that the service is still provided.
According to an embodiment of the present disclosure, there are two types of resource groups. A single-instance-single-location (SISL) resource group may run a single service on one node in the SPAN at a time. If the single service fails, it may be restarted on another node in the SPAN either based on the SPAN service's determination of what and where it should be started or based on rules that the user has defined.
A single-instance-multi-location (SIML) resource group may run a single service on every node in the SPAN in parallel. This allows for faster performance that may be required such as in a web server, and if any of the nodes should fail the other nodes will seamlessly continue to work as before.
In addition, because the SPAN has all the nodes interconnected with a very high-speed network, it is capable of resolving data flow path problems. For example, if it is determined that a resource or particular system is unreachable from one node in the SPAN, but not from another, then the data is routed through the node capable of accessing the data as a proxy. While this is happening, an alert may be raised to alert the administrator of the SPAN that there is an error which has been resolved but may need intervention.
According to an embodiment of the present disclosure, if a node failure occurs, it is detected upon failure of heartbeat transmission, and the node sending the heartbeat may modify the information transmitted by the heartbeat to allow the other nodes in the SPAN to become aware of this failure. Further, the monitoring module 524 on the transmitting node may determine what services have failed as a result of the node's failure, and causes a fail-over to begin. If the fail-over starts the service on the transmitting nodes, all remaining nodes may be notified through the data-transport module 520, for example, by contacting the management module 526. The management module 526 may set correct statuses internally and use the notification module 512 to notify any client applications within the SPAN connected to that particular SPAN node that a change has occurred, if appropriate.
According to an embodiment of the present disclosure, high availability service (HAS) disclosed in U.S. patent application Ser. No. 10/418,459, entitled METHOD AND SYSTEM FOR MAKING AN APPLICATION HIGHLY AVAILABLE, assigned to the same assignee, may be used as the API 510 for retrieving information and notifications of events within the SPAN. This may allow any component (such as agent technology) integrated with HAS to detect and operate properly within the SPAN environment. U.S. patent application Ser. No. 10/418,459 is incorporated herein by reference in its entirety.
A common communication standard such as the DIA (distributed information architecture) may be used to transport data in the SPAN, for example, by a data transport module 520. DIA includes an ability not only to transfer data quickly, but also the ability to work through firewalls and other obstructions that would normally hinder an application or service from communicating.
DHAS according to an embodiment of the present disclosure may include three models of data access. A share-all model allows every node in the SPAN to access the required shared data simultaneously through high-speed shared storage such as a SAN. In a share-nothing model, data is not shared via high-speed shared storage. Rather it may be replicated through DIA or some other high-speed transport. A hybrid model may include a combination of the share-all model and the share-nothing model as necessary.
According to an embodiment of the present disclosure, the DHAS may include parent and child processes. The parent process is responsible for ensuring that the DHAS child processing is running. If the child is not running, the parent process may restart it. Likewise, the child process may restart the parent process if it determines that the parent process is not running. This two-process mechanism ensures that DHAS will always be running.
As described above, DHAS may maintain the status of its nodes within the SPAN with a heartbeat that circulates throughout the entire SPAN. According to an embodiment of the present disclosure, to ensure that the heartbeat is small, the heartbeat may contain only information about the current status of the nodes within the SPAN. All nodes within the SPAN may maintain information about all the other nodes locally.
Referring back to
If software is required to be installed across the entire SPAN, it may not be necessary to install the software on every node. Software which is DHAS enabled may be installed on a single node and the installation may be made available to all the other nodes via the shared storage. For example, a software delivery option (IDM/SDO) may install the component in each node without any further interaction from the user.
According to an embodiment of the present disclosure, the DHAS API may allow a client application in each node to create resource groups and resources. A resource group may be a logical coupling of resources that are needed to run a particular application or service. A resource may be anything that is required by the service or application to run properly, for example, IP address or shared storage. The DHAS API may also allow a client to receive notifications based on resource changes via the DCSL API and get information about resources, resource groups, and the SPAN.
According to an embodiment of the present disclosure, the SPAN services may include one or more modules that perform the operations of the SPAN. For example, one module 524 may be responsible for monitoring resources defined to the SPAN and running on the current node. Another module 512 may be responsible for sending out notifications when resources, resource group, and/or any other components change states. Yet another module 526 may be responsible for remediation of failed resources, for example, such as restarting or possibly performing functions necessary for failover in a multi-node SPAN. Still yet another module 528 may be responsible for taking care of resource registration and other overhead relating to defining a resource or group across a SPAN. For example, registration module 528 may facilitate the automatic creation of the proper content to be replicated by the data replication module 530. Another module 530 may be responsible for replicating data across SPAN nodes. The replicated data may include, for example, internal databases and/or applications and component data.
The above-described functionalities, of course are not limited to each module described above. Thus, one module may perform all the functions described above or different modules can perform different functions.
Generic SPAN information collector (GSIC) 532 may be responsible for gathering and distributing all the information about the SPAN, for example node status, resource and/or resource group status across the SPAN. The GSIC 532 may include a heartbeat to all nodes in the SPAN to make sure all nodes are running.
The DHAS, according to an embodiment of the present disclosure, may be enabled to handle load balancing. Load balancing is the ability to take multiple identical services within a SPAN and have them running across multiple nodes within that SPAN simultaneously. User requests may be dynamically routed through a lead node or lead server to nodes that are less utilized within the entire group that is load balancing. This may be accomplished, for example, by having a lead server for rerouting the requests to request status from the processing servers. When the values are returned, the lead server may determine which processing server is least used and give the request to that server. This, for example, may be performed for all server data requests so that minimum response time is achieved by ensuring that servers are never critically over utilized.
The computer system referred to generally as system 1000 may include, for example, a central processing unit (CPU) 1001, random access memory (RAM) 1004, a printer interface 1010, a display unit 1011, a local area network (LAN) data transmission controller 1005, a LAN interface 1006, a network controller 1003, an internal bus 1002, and one or more input devices 1009, for example, a keyboard, mouse etc. As shown, the system 1000 may be connected to a data storage device, for example, a hard disk, 1008 via a link 1007.
The above specific embodiments are illustrative, and many variations can be introduced on these embodiments without departing from the spirit of the disclosure or from the scope of the appended claims. For example, elements and/or features of different illustrative embodiments may be combined with each other and/or substituted for each other within the scope of this disclosure and appended claims.
The present application is based on and claims the benefit of provisional application Ser. No. 60/572,518, filed May 19, 2004, the entire contents of which are herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60572518 | May 2004 | US |