Application Cluster In Security Gateway For High Availability And Load Sharing

Abstract
A method for load sharing and high availability in a cluster of computers. The cluster includes a first computer and a second computer which perform a task An active application runs in the first computer and a standby application is installed in the second computer. The active application and the standby application are included in an application group. A first plurality of applications is installed in the first computer; the first plurality includes the running active application. The active application performs the task and stores in memory of the first computer state parameters and a policy A synchronized copy of the state parameters and the policy pertaining to the task is maintained by storing in memory of the second computer. Preferably, the cluster is in a security gateway between data networks and performs a task related to security of one or more of the networks.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:



FIG. 1 is a simplified system block diagram of a prior art virtual system extension (VSX) cluster;



FIG. 2 (prior art) illustrates a computer, for instance, cluster member 101.



FIG. 3 is simplified system drawing, according to an embodiment of the present invention which employs “per virtual system failover”;



FIG. 4 illustrates, a system and failure modes, according to another embodiment of the present invention;



FIG. 5 is a simplified flow diagram showing operation of virtual system high availability, according to an embodiment of the present invention; and



FIG. 6 is a simplified flow diagram showing operation of virtual system load sharing, according to an embodiment of the present invention.





DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is of a system and method of failover and load sharing in a cluster. Specifically, the system and method includes failover and load sharing between virtual systems or applications shared between multiple cluster members 101.


Prior art clusters which provide redundancy, high capacity and failover are “connection based”, (e.g. source/destination, IP address and port number). Load is shared based on connections. When a cluster member, for instance 101a fails, connections handled by 101a are re-routed to other cluster members for instance 101b and/or 101c. In embodiments of the present invention, as opposed to prior art clusters provided high availability, redundancy and failover are not based on connections Functions such as high availability, load sharing and failover are achieved without having to manage connections.


The principles and operation of a system and method of high availability and load sharing between virtual systems in a cluster of computers, according to the present invention, may be better understood with reference to the drawings and the accompanying description.


It should be noted, that although the discussion herein relates primarily to virtual systems which perform as firewalls in a network. e.g. LAN or other sub-network the present invention may, by non-limiting example, alternatively be configured as well using virtual systems which perform other security applications such as encryption, intrusion detection, and malicious code scanning, and filtering, e.g. parental control filtering, authentication, auditing, encryption, virus detection, worm detection, quality of se-vice and/or routing. The present invention in some embodiments can be configured as an application gateway to perform secure sockets layer (SSL) termination including encryption, and link translation. The present invention may alternatively be configured as well using virtual systems which perform functions unrelated to computer security, e.g. searching in a data base Further, a function, such as mathematical processing, may be performed, according to an embodiment of the present invention in a cluster of computers not attached to an external network


Computer or cluster member 101, in different embodiments of the present invention may use dedicated hardware, e g. additional interfaces 204, for transferring data individually to virtual systems and/or portions of memory 209 specifically allocated to individual virtual systems or a dedicated processor 201 in case there are multiple processors 201. In some cases, previously existing cluster 101 cluster members may be reprogrammed to achieve a cluster with virtual system load sharing and high availability, according to embodiments of the present invention.


Before explaining embodiments of the invention in detail, it is to be understood that the invention is not limited in its application to the details of design and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.


By way of introduction, principal intentions of the present invention are to:


(1) provide increased availability and/or redundant load sharing within a cluster;


(2) provide configuration simplicity with a preferably identical configuration within all cluster members;


(3) system scalability with each cluster, member increasing capacity and redundancy in a similar way; and


(4) reduce system overhead by performing synchronization solely by unicast data transfer only between specific cluster members and not by broadcast of data transfer between all cluster members.


Referring now to the drawings, FIG. 3 illustrates system 30, according to an embodiment of the present invention which employs “per virtual system failover”. In system 30, a virtual system group VS includes one active virtual system 203A and one standby virtual system 203S. Active virtual system 203A and standby virtual system 203S are each installed in different cluster members 101. Active virtual system 203A and standby virtual system 203S are synchronized both in state parameters and policy, so that standby copy 203S becomes active if virtual system 203A, stored in cluster member 101a, experiences a failure. Policy is updated occasionally, such as once or twice per day, whereas state parameters or connection table is synchronized typically every transaction performed by the active application and typically not more than about ten thousand times per second preferably by unicast data transfer from active virtual system 203A to standby virtual system 203S. Upon recovery, system 203A is restored to the original active state and virtual system 203S is restored to a standby state. System 30 illustrates cluster members attached using layer 2 switch 105; however another preferably layer 2 networking device such as a hub, may be used to connect cluster members 101.


Further, cluster members 101 in different embodiments may be distributed in different external networks and attached over network connections provided a mechanism ensures that each cluster member 101 receives its required traffic. An example of a distributed cluster includes cluster members 101 as virtual private network (VPN) gateways running VPNs as virtual systems 203.


Cluster members 101 may be interconnected by one or more additional synchronization networks, not shown, through which the synchronization (e.g. of state parameters, policy) and/or management can be performed.


Cluster members 101 can be connected to a number of layer 2 devices 105 and each may be connected to any number of networks 111.



FIG. 4 illustrates, system 40 according to another embodiment of the present invention in which virtual system group VS includes an additional virtual system in a “backup” state 203B in addition to standby state 203S and active state 203A of the virtual system Backup state virtual system 203B contains updated configurational settings, e.g. firewall policy of virtual systems 203 but does not receive state parameter or connection table synchronizations. Hence, the use of backup state 203B saves resources of cluster member 101 particularly processor time and saves bandwidth on the synchronization network.


As in system 30, active virtual system 203A and standby virtual system 203S are synchronized so that standby copy 203S becomes active if cluster member 101 storing active virtual system 203A experiences a failure. Furthermore, when the failure occurs in system 40, backup virtual system 203B is upgraded to become a standby virtual system 203S and begins to synchronize with newly active virtual system 203A. Upon recovery, system 203A is restored to the original active state and virtual system 203S is restored to a standby state and virtual system 203B is restored to a backup state. An example of backup state failover is illustrated in FIG. 4a. In the example, active virtual system 203A installed in cluster member 101a is synchronized with standby virtual system 203S in cluster member 101c undergoes a failure, denoted by “X” in FIG. 4a. Standby virtual system 203S in cluster, member 101c becomes active, (now virtual system 203A) and backup virtual system 203B installed in cluster member 101b becomes standby virtual system 203S which begins to synchronize with newly active virtual system 203A installed in cluster member 101c.


Another failure mode is illustrated in FIG. 4b in which cluster member 101a fails entirely for instance due to a connection failure to power or to network interface 204. As in the example of FIG. 4a, standby virtual system in cluster member 101c becomes active, now virtual system 203A and backup virtual system 203B installed in cluster member 101b becomes standby virtual system 203S and synchronizes with newly active virtual system 203A installed in cluster member 101c. Similarly, backup virtual system 203B of cluster member 101c now becomes standby virtual system 203S and begins to synchronize with its active copy virtual system 203A installed in cluster member 101b. On recovery from either failure mode, of FIG. 4a or FIG. 4b, the system is restored to the original state of system 40 in FIG. 4d. Alternatively, the original states of virtual systems 203 are not restored on recovery, and a manual re-configuration is used to restore the original configuration if desired.


Reference is now made to FIG. 5, a simplified flow diagram according to the embodiment 40 (of FIG. 4) of the present invention. Cluster 10 is physically connected and configured (step 501) preferably with virtual system groups VS with an active virtual system 203A, a standby virtual system 203S and a backup virtual system 2038 each in different cluster members 101. After configuration, (step 501), cluster 10 operates (step 503) and during operation active virtual systems 203A are periodically synchronized (step 503) with standby virtual systems 203S, preferably by unicast data transfer. If a failure occurs. (decision block 505) then for each virtual system 203 involved in the failure, standby virtual system 203S is upgraded (i.e. failover) to active virtual system 203A and similarly backup virtual system 203B is upgraded (i.e. failover) to standby virtual system 203S. Operation and synchronization (step 503b) between new active virtual system 203A and standby virtual systems 203S proceeds in cluster 10 albeit with limited resources due to the failure. Upon automatic monitoring and detection of recovery (decision block 509) the original configuration is preferably restored (step 511) and operation and synchronization (step 503) proceed as prior to failure. Otherwise, if there is no automatic monitoring and detection of recovery, and the failure is detected manually, the original cluster configuration may be restored manually.


Reference is now made to FIG. 6 a simplified flow diagram of load sharing in a virtual system cluster 40, according to embodiments of the present invention. Cluster 40 is configured (step 801) for load sharing. During configuration (step 801) parameters regarding load sharing between virtual systems 203 are set including priorities and/or weights which determine load sharing between virtual systems 203. Preferably, weights are chosen so that load is balanced between cluster members 101. In step 803, the load of each virtual system 203 is monitored. If there is a need to redistribute load (decision block 805) then cluster 40 is reconfigured (step 807) otherwise monitoring of load (step 803) continues.


The control mechanism in cluster 40 may be performed in a number of ways known in the art. Preferably, code in the kernel driver of one or more cluster members 101 periodically monitor (e.g. by polling or “watchdog”) the state of all cluster members 101 and interfaces between cluster members 101 and virtual systems 203. In the event of a failure in a virtual system 203, the kernel driver changes the role of virtual systems 203 of the cluster as described above.


As such, those skilled in the art will appreciate that the conception, upon which this disclosure is based, may readily be utilized as a basis for the designing of other structures, methods and systems for carrying out the several purposes of the present invention,. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the present invention.


While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made.

Claims
  • 1. A method comprising the steps of: (a) providing a cluster of computers, said cluster including a first computer and a second computer;(b) performing a task by said cluster;(c) running an active application in said first computer, wherein a standby application is installed in said second computer; wherein said active application and said standby application are included in an application group;wherein a first plurality of applications are installed in said first computer, wherein said first plurality includes said active application, wherein all applications of said first plurality and said standby application have identical functionality for performing said task;wherein said running said active application includes storing in memory of said first computer, a plurality of state parameters and a policy pertaining to said task; and(d) storing in memory of said standby application in said second computer, a synchronized copy of said plurality of state parameters and said policy.
  • 2. The method, according to claim 1, wherein said cluster is in a security gateway transferring data traffic between a plurality of networks, and wherein said task provides security to at least one of said networks.
  • 3. The method, according to claim 2, wherein said task is selected from the group of tasks consisting of: filtering, malicious code scanning, authentication, auditing, encryption, intrusion detection, virus detection, worm detection, quality of service secure sockets layer termination, link translation and routing.
  • 4. The method, according to claim 1, wherein said first plurality further includes a second standby application and wherein a second plurality of applications are installed in said second computer, wherein said second plurality further includes said standby application and a second active application; wherein said second standby application and said second active application are included in a second application group.
  • 5. The method, according to claim 1, wherein said cluster further includes a third computer, wherein a third plurality of applications runs in said third computer, wherein said third plurality includes a backup application, wherein said backup application is further included in said application group,
  • 6. The method, according to claim 5 wherein said backup application maintains, stored in memory of said third computer, a synchronized copy of said policy without synchronizing said state parameters.
  • 7. The method, according to claim 1, further comprising the step of: (c) monitoring for failure within said application group and upon detecting a failure in said active application, transferring load of said active application to said standby application thereby upgrading state of said standby application to active, and producing a new active application.
  • 8. The method according to claim 7, further comprising the step of: (d) upon recovering from said failure, restoring said state of said standby application.
  • 9. The method, according to claim 7, wherein said cluster includes a third computer and wherein said application group includes a backup application running in said third computer, further comprising the steps of: (d) upon said monitoring for failure within said application group and upon detecting failure in said active application, synchronizing said backup application with said new active application thereby upgrading said backup application to a standby state.
  • 10. The method, according to claim 9, further comprising the step of: (e) upon recovering from said failure, restoring said state of said backup application.
  • 11. The method, according to claim 1, further, comprising the step of: (c) monitoring for load balance between said computers and upon detecting a load imbalance between said computers, redistributing load between said active application and said standby application.
  • 12. The method, according to claim 1, further comprising the step of: (c) configuring said active application and said standby application for at least one function selected from the group consisting of high availability and load sharing.
  • 13. The method, according to claim 1, wherein said running includes synchronizing said active application with said standby application by performing a unicast data transfer between said active and said standby applications.
  • 14. A cluster of computers, comprising: (a) a first computer; and(b) a second computer;
  • 15. The cluster, according to claim 14, wherein the cluster is located at a security gateway between a plurality of networks and wherein said task provides security to at least one of said networks.
  • 16. The cluster, according to claim 14, wherein said synchronization mechanism synchronizes by unicast data transfer between said active application and said standby application.
  • 17. The cluster, according to claim 14, further comprising: (d) a mechanism which monitors for failure within said application group; and(e) a mechanism which upgrades said standby application to an active state when a failure is detected in said active application, and produces thereby a new active application.
  • 18. The cluster, according to claim 14, further comprising: (d) a mechanism which monitors for load balance between said computers and upon detecting a load imbalance between said computers, redistributing load between said active application and said standby application.
  • 19. The cluster, according to claim 14, wherein said application group includes a backup application, the cluster further comprising: (d) a third computer which maintains said backup application;(e) a mechanism which upgrades said backup application to a standby state and initiates synchronization with a new active application when a failure is detected in said active application.
  • 20. A program storage device readable by a computer in a cluster including a first computer and a second computer, the program storage device tangibly embodying a program of instructions executable by the computer to perform a method comprising the steps of: (a) performing a task by said cluster data;(b) configuring and running an active application in said first computer and configuring a standby application in said second computer; wherein said active application and said standby application are included in an application group; wherein a first plurality of applications are running in said first computer, wherein said first plurality includes said active application;wherein said running said active application includes performing said task and storing in memory of said first computer a plurality of state parameters and a policy pertaining to said task, further comprising the step of:(c) storing in memory of said standby application in said second computer, a synchronized copy of said plurality of state parameters and said policy.
  • 21. The program storage device, according to claim 20, wherein the same program of instructions stored on the program storage device is used to program all the applications of the first plurality in the first computer and the standby application in the second computer.