Server clusters are often employed in high-availability computing environments to provide active or passive redundancy in the case of a server failure. This is typically implemented by configuring multiple servers within a cluster of servers with common applications, so that when one server running a particular application fails, failover may be performed by having another server within the same cluster stand in for the failed server by running the same application. Where servers within a cluster run applications that provide HyperText Transfer Protocol (HTTP) services to HTTP-based clients, failover is relatively easy to perform, since in any case multiple HTTP requests from the same HTTP-based client are server indifferent, allowing each HTTP request to be routed to different server within a server cluster for processing. However, in order to support session-based protocols, such as the Session Initiation Protocol (SIP), failover is more complex, as SIP messages are always sent to the same SIP container on the same SIP server. Furthermore, since a single SIP container might support tens of thousands of SIP sessions simultaneously, a failover that would entail a corresponding number of messages notifying SIP proxies which backup servers are taking over for which SIP sessions would be cumbersome and impractical.
The present invention discloses a system and method for session failover management in a high-availability server cluster environment.
In one aspect of the present invention a system is provided for session failover management in a server cluster environment, the system including one or more clusters, each cluster having one or more servers, each server having one or more partitions, each partition identified by a partition ID and grouping one or more sessions, and a failover manager configured to detect the failure of any of the servers and effect the assignment any of the partitions on the failed server to another of the servers within the failed server's cluster.
In another aspect of the present invention any of the servers to which a failed server partition is assigned is configured to activate any of the sessions within the failed server partition.
In another aspect of the present invention the system further includes a server-partition mapper configured to maintain a mapping of each of the partitions to their servers.
In another aspect of the present invention any of the servers to which a failed server partition is assigned is configured to inform the server-partition mapper that it has taken over the failed server partition.
In another aspect of the present invention the system further includes a proxy configured to receive an incoming session-based protocol message, identify to which of the partitions the message belongs, consult the server-partition mapper to determine to which server the identified partition is mapped, and forward the message to the mapped server.
In another aspect of the present invention the system further includes a replication manager configured to replicate session objects, associated with any of the sessions on any of the servers within any of the clusters, to any other of the servers within the cluster.
In another aspect of the present invention the session is a SIP session.
In another aspect of the present invention a method is provided for session failover management in a server cluster environment, the method including defining one or more clusters, each cluster having one or more servers, each server having one or more partitions, each partition identified by a partition ID and grouping one or more sessions, detecting the failure of any of the servers, and effecting the assignment any of the partitions on the failed server to another of the servers within the failed server's cluster.
In another aspect of the present invention the method further includes activating any of the sessions within the failed server partition on the server to which a failed server partition is assigned.
In another aspect of the present invention the method further includes maintaining a mapping of each of the partitions to their servers.
In another aspect of the present invention the method further includes updating the mapping to indicate the server to which a failed server partition is assigned.
In another aspect of the present invention the method further includes receiving an incoming session-based protocol message, identifying to which of the partitions the message belongs, determining to which server the identified partition is mapped, and forwarding the message to the mapped server.
In another aspect of the present invention the method further includes replicating session objects, associated with any of the sessions on any of the servers within any of the clusters, to any other of the servers within the cluster.
In another aspect of the present invention a computer-implemented program is provided embodied on a computer-readable medium, the computer program including a first code segment operative to define one or more clusters, each cluster having one or more servers, each server having one or more partitions, each partition identified by a partition ID and grouping one or more sessions, a second code segment operative to detect the failure of any of the servers, and a third code segment operative to effect the assignment any of the partitions on the failed server to another of the servers within the failed server's cluster.
The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the appended drawings in which:
Reference is now made to
Reference is now made to
An incoming session-based protocol message is received at a network dispatcher 214, which may be any IP sprayer, which forwards the message to any of one or more proxies, such as SIP proxies 216 and 218. Each proxy 216, 218 preferably sees each of clusters 200 and 206, and is able to forward session-based protocol messages to any of servers 202, 204, 208, and 210. Upon receipt of an incoming session-based protocol message from network dispatcher 214, if the message is part of a new session, such as may be effected via a SIP dialog, the proxy routes the message to any of servers 202, 204, 208, and 210, preferably deciding which server by using any known load balancing technique. The incoming message is received by the chosen server's session host, which creates the session and its related objects, and assigns the session to one of its partitions, also preferably deciding which partition by using any known load balancing technique. The session objects are preferably replicated to each of the servers in the cluster by a replication manager 220 to support failover.
Once the session has been created, all outgoing messages sent by the session host include both the session ID, as well as the partition ID to which the session belongs. Thereafter, upon receipt of an incoming message from network dispatcher 214, if the message is part of an existing session and includes a partition ID, the receiving proxy consults server-partition mapper 212 to determine to which server the partition belongs, and forwards the message to the indicated server.
Should a server fail, such as may be detected by a failover manager 222, each of the failed server's partitions is preferably assigned to one of the other servers in the cluster, preferably using known load balancing techniques such that the number of sessions managed by each of the servers after they have taken over the partitions of the failed server falls within load balancing thresholds. The assignment of a failed server's partitions is preferably managed by failover manager 222 and/or by a coordinating server designated by failover manager 222 from among the servers in the cluster. Each server that takes over a partition of the failed server activates the sessions assigned to the partition and informs server-partition mapper 212 of its own identity, such as its network address, as well as the partition ID of each partition it has taken over. Thereafter, upon receipt of an incoming message that belongs to a partition of a failed server, the receiving proxy consults server-partition mapper 212 to determine to which server the partition now belongs, and forwards the message to the indicated server.
It is appreciated that one or more of the steps of any of the methods described herein may be omitted or carried out in a different order than that shown, without departing from the true spirit and scope of the invention.
While the methods and apparatus disclosed herein may or may not have been described with reference to specific computer hardware or software, it is appreciated that the methods and apparatus described herein may be readily implemented in computer hardware or software using conventional techniques.
While the present invention has been described with reference to one or more specific embodiments, the description is intended to be illustrative of the invention as a while and is not to be construed as limiting the invention to the embodiments shown. It is appreciated that various modifications may occur to those skilled in the art that, while not specifically shown herein, are nevertheless within the true spirit and scope of the invention.