The present invention relates generally to network systems having redundant default gateways and/or routers for receiving traffic from multiple hosts on a LAN. More particularly, the present invention relates to methods and apparatuses for deterministic load balancing across redundant gateway services for a common IP subnet.
Local area networks (LANs) are commonly connected with one another through one or more routers so that a host (a PC or other arbitrary LAN entity) on one LAN can communicate with other hosts on different LANs. Typically, the host is able to communicate directly only with the entities on its local LAN segment. When it receives a request to send a data packet to an address that it does not recognize as being local, it communicates through a router (or other layer-3 device) which determines how to direct the packet between the host and the destination address. Unfortunately, a router may, for a variety of reasons, become inoperative (e.g., a power failure, rebooting, scheduled maintenance, etc.). Such potential router failure has led to the development and use of redundant systems, systems having more than one router to provide a back up in the event of primary router failure. When a router fails, the host communicating through the inoperative router may still remain connected to other LANs if it can send packets to another router connected to its LAN.
Various protocols have been devised to allow a host to choose a router from among a group of routers in a network. Two of these, Routing Information Protocol (or RIP) and ICMP Router Discovery Protocol (IRDP) are examples of protocols that involve dynamic participation by the host. However, because both RIP and IRDP require that the host be dynamically involved in the router selection, performance may be reduced and special host modifications and management may be required.
In a widely used and somewhat simpler approach, the host recognizes only a single “default” router. In this approach, the host is configured to send data packets to the default router when it needs to send packets to addresses outside its own LAN. It does not keep track of available routers or make decisions to switch to different routers. This requires very little effort on the host's part, but has a serious danger. If the default router fails, the host can not send packets outside of its LAN. This will be true even though there may be a redundant router able to take over because the host does not know about the backup. Unfortunately, such systems have been used in mission critical applications such as stock trading. The shortcomings of these early systems led to the development and implementation of a hot standby router protocol (HSRP) by Cisco Systems, Inc. of San Jose, Calif. A more detailed discussion of the earlier systems and of an HSRP type of system can be found in U.S. Pat. No. 5,473,599 (referred to herein as “the '599 patent”), entitled STANDBY ROUTER PROTOCOL, issued Dec. 5, 1995 to Cisco Systems, Inc., which is incorporated herein by reference in its entirety for all purposes. Also, HSRP is described in detail in RFC 2281, entitled “Cisco Hot Standby Router Protocol (HSRP)”, by T. Li, B. Cole, P. Morton and D. Li, which is incorporated herein by reference in its entirety for all purposes.
HSRP forwards data packets from a host on a LAN through a virtual router. The host is configured so that the packets it sends to destinations outside of its LAN are always addressed to the virtual router. The virtual router may be any physical router elected from among a group of routers connected to the LAN. The router from the group that is currently emulating the virtual router is referred to as the “active” router. Thus, packets addressed to the virtual router are handled by the active router. A “standby” router, also from the group of routers, backs up the active router so that if the active router becomes inoperative, the standby router automatically begins emulating the virtual router. This allows the host to always direct data packets to an operational router without monitoring the routers of the network.
A Cisco HSRP system is shown in
Multiple redundancy group versions of the system of
This multiple RG configuration provides a load balancing function, but it is through the use of multiple default gateway IP addresses in a common subnet. This requires dividing users/hosts into multiple user groups and configuring the routers as multiple RGs. The administrative task of dividing hosts among multiple default gateways can be tedious and requires customization of the system which many users would prefer to avoid.
Cisco Systems offers multiple methods for load balancing across redundant uplinks from an access layer switch in an enterprise campus or a service provider data center. One method, shown in
As shown in
In both cases shown in
Other Cisco features such as protocol filtering, CGMP/IGMP-snooping, and broadcast filtering allow a user to scale a single VLAN to increasingly larger size, often alleviating the need for multiple access VLANs form broadcast domain segmentation. As a result, there is a tendency to have fewer VLANs (and, often, preferably one VLAN) while the need for load balancing among redundant gateways persists. Current methods requiring multiple VLANs at the access layer to achieve load balancing may not suffice or work within various users' systems.
In view of the foregoing, it would be desirable to provide redundant gateway services similar to Cisco's HSRP while providing load balancing of host traffic from a common VLAN-subnet among multiple gateway devices. Such services would leverage resiliency mechanisms of systems like HSRP while adding load balancing capabilities.
The present invention relates to gateway load balancing systems using more than one gateway device in a gateway device group for communications directed outside of a LAN. Generally, the invention includes load balancing and failover services. In the load balancing arrangement, hosts that send ARP messages to a shared virtual IP address receive replies from one of the gateway devices in the gateway device group, directing the host to address outgoing communications to a virtual MAC address assigned to one of the gateway devices. Hosts are assigned virtual MAC addresses for the gateway devices according to a prescribed algorithm or methodology. In the event that one member of the gateway device group fails, the outgoing communications that would have been handled by the failed gateway device are re-assigned to another member of the gateway device group. A master gateway device controls address assignment and failover features. In the event that the master fails, additional steps are taken to appoint or elect a new master and ensure continuity in the load balancing function.
In one embodiment, a method of load balancing gateway services to hosts on a network segment includes receiving an ARP message from a host addressed to an address shared by a group of gateway devices available for serving the hosts on the network segment. In response to receiving the ARP message, and based on load balancing considerations, one of the gateway devices is selected to act as the addressee gateway device for the host that sent the ARP message. A reply to the ARP message is sent to the host identifying the selected addressee gateway device. The gateway devices can be layer 3 devices such as routers. The address shared by members of the gateway device group can be a virtual IP address for the group. When the reply message is sent, the reply message can contain a layer 2 address for the addressee gateway device, such as a virtual (secondary) MAC address.
According to another aspect of the present invention, a method providing gateway load balancing services is implemented on a single gateway device. When an ARP message is received from a host, a gateway device is selected from a group of available gateway devices available for servicing hosts on the network segment, utilizing a preselected algorithm or methodology. The selected gateway device then acts as the addressee gateway device for the host that sent the ARP message. The host is notified with a reply message identifying the addressee gateway device. When one of the gateway devices is determined to have failed, responsibility for gateway services of the failed gateway device are re-assigned to another gateway device in the group. The gateway devices again can be layer 3 devices.
A different aspect of the present invention includes a network segment that has multiple gateway devices sharing a virtual IP address. A group of hosts includes a host configured to send an ARP message to the shared address of the gateway devices. One of the gateway devices is configured to respond to the ARP message by sending the host a reply message identifying an addressee gateway device in the gateway device group. The network segment's group of gateway devices also can be configured to assume responsibility for any addressee gateway device that fails.
Another aspect of the invention pertains to computer program products including machine-readable media on which is stored program instructions for implementing at least some portion of the methods described above. Any of the methods of this invention may be represented, in whole or in part, as program instructions that can be provided on such computer readable media. In addition, the invention pertains to various combinations of data and data structures generated and/or used as described herein.
These and other advantages of the present invention will become apparent upon reading the following detailed descriptions and studying the various figures of the drawings.
The present invention may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which:
a is a diagram showing a layer 2 load balancing scheme using a spanning tree feature.
b is a diagram showing a layer 3 load balancing scheme.
a is a schematic diagram of one embodiment of the present invention showing a load balancing implementation.
b is a block diagram of a network device that may be configured to act as a gateway device of this invention.
a is a schematic diagram of the embodiment of
b is a flowchart showing the methodology employed in one embodiment of the present invention showing failure of a slave gateway device.
a is a schematic diagram of the embodiment of
b is a flowchart showing the methodology employed in one embodiment of the present invention showing failure of a master gateway device.
Reference will now be made in detail to the preferred embodiment of the invention. An example of the preferred embodiment utilizing products, protocols, methods, systems and other technology developed, sold and/or used by Cisco Systems is illustrated in the accompanying drawings. While the invention will be described in conjunction with that preferred embodiment, it will be understood that it is not intended to limit the invention to one preferred embodiment or to its implementation solely in connection with Cisco products and systems. On the contrary, the following description is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
The following terms are used in the instant specification. Their definitions are provided to assist in understanding the preferred embodiments described herein, but do not necessarily limit the scope of the invention.
A “host” is a PC or other arbitrary network entity residing on a LAN that periodically communicates with network entities outside of its own LAN through a router or bridge. Other terms that may be used equivalently in this disclosure include user, host and host.
A “router” is a piece of hardware which operates at the network layer to direct packets between various LANs or WANs of a network. The network layer generally allows pairs of entities in a network to communicate with each other by finding a path through a series of connected nodes. Other terms that may be used equivalently in this disclosure include layer 3 switch, layer 3 device and gateway.
An “IP (internes protocol) address” is a network layer address for a device operating in the IP suite of protocols. The IP address is typically a 32 bit field, at least a portion of which contains information corresponding to its particular network segment. Thus, the IP address of a router may change depending upon its location in a network.
A “MAC address” is an address of a device at the data link layer, defined by the IEEE 802 committee that deals with issues specific to a particular type of LAN. The types of LANs for which MAC addresses are available include token ring, FDDI, and ethernet. A MAC address is generally intended to apply to a specific physical device no matter where it is plugged into the network. Thus, a MAC address is generally hardcoded into the device—on a router's ROM, for example. This should be distinguished from the case of a network layer address, described above, which changes depending upon where it is plugged into the network. MAC is an acronym for Media Access Control.
A “virtual address” is an address shared by a group of real network entities and corresponding to a virtual entity. In the context of this invention, one router from among a group of routers emulates a virtual router by adopting one or more virtual addresses, and another entity (usually a host) is configured to send data packets to such virtual address(es), regardless of which router is currently emulating the virtual router. In the preferred embodiments, the virtual addresses encompass both MAC layer and network layer (IP) addresses. Usually, various members of the group each have the capability of adopting the virtual address to emulate a virtual entity.
A “packet” is a collection of data and control information including source and destination node addresses, formatted for transmission from one node to another. In the context of this invention, it is important to note that hosts on one LAN send packets to hosts on another LAN through a router or bridge connecting the LANs.
Hosts (for example, workstations, users and/or data center servers) using the IP protocol must utilize a default gateway to exit a local network and access remote networks. Therefore, each host must have prior knowledge of the gateway's IP address which typically is a router or layer-3 switch IP address. Hosts are either statically configured with the IP address of the default gateway or are assigned the address through a configuration protocol (such as Cisco's DHCP) upon boot-up. In either case, the host uses the same default gateway IP address for all network traffic destined to exit the local network.
To forward traffic to the default gateway, the host must perform an IP-ARP resolution to learn the data-link MAC address of the default gateway. The host sends an ARP inquiry to the IP address of the gateway, requesting the gateway's MAC address. The default gateway will respond to the host's ARP by notifying the host of the gateway's MAC address. The host needs the default gateway's MAC address to forward network traffic to the gateway via a data-link layer transfer. In networks that promote resiliency through redundant access layer uplinks and layer-3 gateways, at least two possible paths and devices exist for IP gateway services.
The present invention allows load-sharing of traffic from access layer hosts within a common subnet through redundant default gateways, while also providing a fail-over capability. One way to direct traffic from access layer hosts to a particular default gateway is to have the gateway respond to a host's ARP request with the gateway's own unique MAC address. Once a particular gateway responds to an ARP request from a host with the gateway's unique MAC address, the host caches the response and will continue to use the cached gateway MAC address for all transmissions leaving the local subnet. Therefore, if the MAC addresses of all of the redundant layer-3 gateways in an RG are sent in a distributed manner in response to ARP queries from various hosts, future transmissions from the hosts can be divided among available gateways to exit the local network, achieving load balancing of the outgoing packets.
Resiliency (fail-over) services also can be provided so that if one gateway should fail, a remaining gateway will assume the failed gateway's host load in addition to the remaining gateway's own load. The failure and load assumption can remain transparent to the local hosts. In addition, the load balancing feature must ignore the fact that more than one device can accept traffic destined to the same IP address, namely redundant default gateway devices sharing a common IP address.
Therefore, the present invention directs local hosts to a variety of possible default gateways in a load balancing arrangement using IP-ARP mechanisms, while using a single virtual IP address for the balanced gateway service. One embodiment of the present invention is shown in
Many of the concepts and mechanisms and much of the terminology used in the disclosure of the present invention are similar to and/or based on Cisco's HSRP system. However, the invention is in no way limited to such an implementation and those skilled in the art will recognize and appreciate that various embodiments of the present invention are achievable without using the HSRP system, per se.
One embodiment of the present invention provides three primary functions once it is implemented and configured. Initially, the system creates an addressing scheme. Once the addressing step is completed, the system provides its three primary functions:
As seen in
Each RG member 510 also maintains its own virtual MAC address (VMAC address) 518 to be used by local IP hosts 520 to forward traffic destined to a specific RG member acting as a default gateway. The VMAC address 518 will be used in the ARP response to an IP host's request for the default gateway to assign a particular RG member to a particular host. A VMAC address 518 can be assigned to each RG member 510 by assigning each RG member 510 a unique HSRP-like group referred to as a “GLBP Group” or “VMAC Group” in which the RG device is the only member of its GLBP group. For example, in
The VMAC address adds some overhead to the system in the form of assigning RG members one additional MAC address. However, this cost is outweighed by the benefits of greater portability of the members of the group. For example, if one RG member was to be removed from service, the VMAC address could be assigned to another gateway device and the prior device removed without interruption in the service or the inconvenience of having to use an unchangeable address (for example, the BIA of a given RG member).
In the embodiment shown in
Even though multiple devices 510 will answer to the same VIP address 516, they will be associated with different VMAC addresses 518. Duplication of the VIP address 516 among RG member responding to the subnet will not be a problem if:
All RG members 510 belong to an HSRP-type group, designated the GLBP or VMAC Group, for the purpose of electing a master. To facilitate load balancing within an RG 508, a “master” is elected from among all eligible RG members 510 participating in the load balancing service (that is, from within the VMAC Group). Based on the individual priorities set for the VMAC Group on the RG members 510, the member with the highest priority is designated the master. As will be appreciated by those skilled in the art, any suitable criterion or criteria can be used to determine the master of the group. RG members not elected as master are “slaves” in the system. Similar to an “active” member of an HSRP group, the master in the system of the present invention is responsible for responding to all ARP requests directed to the shared \TIP address 516 within the RG 508.
Therefore, to implement a load balancing service according to one embodiment of the present invention, each RG member 510 will belong to the “VMAC Group” or “GLBP Group.” A GLBP Group comprises 1 virtual IP address and “n” virtual MAC addresses. The GLBP master gateway assigns VMAC addresses to other members of the group. The GLBP Group is used for master election within the RG 508. The Group configuration includes a VIP address which is used as the shared default gateway by the hosts 520.
Generally, the gateway load balancing techniques (including use of virtual addresses) of the present invention may be implemented on software and/or hardware. For example, these techniques can be implemented in an operating system kernel, in a separate user process, in a library package bound into network applications, on a specially constructed machine, or on a network interface card. In a specific embodiment of this invention, the technique of the present invention is implemented in software such as an operating system or in an application running on an operating system.
A software or software/hardware hybrid load balancing service of this invention is preferably implemented on a general-purpose programmable machine selectively activated or reconfigured by a computer program stored in memory. Such programmable machine may be a network gateway device designed to handle network traffic. Such network devices typically have multiple network interfaces including frame relay and ISDN interfaces, for example. Specific examples of such network devices include routers and switches. For example, the gateway load balancing service of this invention may be specially configured routers such as specially configured router models 1600, 2500, 2600, 3600, 4500, 4700, 7200, 7500, and 12000 available from Cisco Systems, Inc. of San Jose, Calif. A general architecture for some of these machines will appear from the description given below. In an alternative embodiment, the gateway load balancing service may be implemented on a general-purpose network host machine such as a personal computer or workstation. Further, the invention may be at least partially implemented on a card (e.g., an interface card) for a network device or a general-purpose computing device.
Referring now to
The interfaces 568 are typically provided as interface cards (sometimes referred to as “line cards”). Generally, they control the sending and receiving of data packets over the network and sometimes support other peripherals used with the router 510. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. In addition, various very high-speed interfaces may be provided such as fast Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces and the like. Generally, these interfaces may include ports appropriate for communication with the appropriate media. In some cases, they may also include an independent processor and, in some instances, volatile RAM. The independent processors may control such communications intensive tasks as packet switching, media control and management. By providing separate processors for the communications intensive tasks, these interfaces allow the master microprocessor 562 to efficiently perform routing computations, network diagnostics, security functions, etc.
Although the system shown in
Regardless of network device's configuration, it may employ one or more memories or memory modules (such as, for example, memory block 565) configured to store data, program instructions for the general-purpose network operations and/or other operations described herein. The program instructions may control the operation of an operating system and/or one or more applications, for example. The memory or memories may also be configured to store addresses, timer limits, etc.
Because such information and program instructions may be employed to implement the systems/methods described herein, the present invention relates to machine readable media that include program instructions, state information, etc. for performing various operations described herein. Examples of machine-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). The invention may also be embodied in a carrier wave traveling over an appropriate medium such as airwaves, optical lines, electric lines, etc. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
b. Peering and Keep-Alive Mechanisms
In an HSRP group, the primary and standby RG members transmit hellos while other RG members (if any) remain idle. Similar keep-alive practices are implemented in the present invention (the timing of the messages can be structured in a manner similar to the HSRP system as discussed in more detail in RFC 2281 and the '599 patent).
As seen in
The source MAC address of these multicast hello packets is the VMAC address of the particular group if the transmitting RG member is the primary of the group (akin to standard HSRP practice). In
Initial hellos configure the peering arrangement between the RG group members. Hellos also help to elect the master of the load balancing service group. However, upon receiving a valid hello from an RG member's VMAC Group, all other RG members will create a state for the discovered RG member and its associated VMAC address. As described in further detail below, the master adjusts the load balancing procedure to include a newly discovered RG member. In the preferred embodiment, host relationships are not moved to another RG member when a new member appears. However, the gateway assignments can be adjusted so that when a host re-ARPs for the default gateway, it may be directed to a new RG member.
c. Conditional ARP Response Mechanism
Once a load balancing scheme is established using multiple RG members in accordance with the present invention, members can begin sharing responsibility for default gateway services to the hosts within a local subnet. The distribution function used to assign hosts to gateways determines the host load-sharing effectiveness. As discussed above, a master is elected from within the Group. The elected master is the only member of the RG which responds to host ARP requests for a gateway address, in this case, the shared gateway VIP address.
As illustrated by the example in
Depending on the system and the needs of the hosts, various load-sharing algorithms can be implemented to divide the hosts 520 on the LAN or subnet 530 among the participating RG members 510 for default gateway services. A simple or weighted round-robin method can be used in which each successive host 520 is assigned to the next RG member in a sequence. A second method involves hashing the source MAC address of the host to generate the ARP response. The hash yields one of the available default gateways out of all eligible RG members. The distribution function can be implemented so that a hash is performed on the source MAC address of the ARP request itself, yielding one of N answers (where N is the number of active RG members). As will be appreciated, the load balancing function under this distribution algorithm is as effective as the randomness of the source MAC addresses of the ARP requests. In another embodiment of the present invention, the load balancing allocation function can take into account weighting of the responses to direct a specific percentage of the hosts to specific RG members, for example based on uplink interface states.
d. Fail-Over Mechanism
Failure of an RG member is detected by listening to the keep-alives generated by the RG members in their respective VMAC Groups. If a particular RG member (master or slave) is not heard from for a specific amount of time, the RG member is then determined to be down and appropriate action is taken. In the context of the present invention, therefore, two different failure scenarios are considered and addressed.
i. Slave Failure
a illustrates schematically what happens when a slave RG member 510c fails in the preferred embodiment of the present invention. In the example presented in
Two possible failures of a slave member are considered—the slave may be temporarily down, or the slave member may have been permanently removed. To accommodate various failure conditions and scenarios in the present invention, two additional, different timers are used in this embodiment of the present invention, as shown in
Checks 842 are performed periodically to determine whether slave hellos are received during the specified time TH. If the hello from a slave is received, then the system remains in its normal condition 840. However, when a slave fails to send a hello in timely manner, as seen at 844 in
The AIT allows the failed member's VMAC address to remain included in the load balancing distribution algorithm of the master for a defined period of time. This timer thus provides for the possibility that the slave failure may only be temporary and that a portion of new connections (that is, responses to ARPs from hosts) under the load balancing distribution algorithm should still be directed to the failed member's VMAC address which is then being tended by the master. Therefore, a check 848 is performed to see if the slave member is still failed. If not, then the master relinquishes the VMAC address to the resumed slave at 850 and the members stop the AIT timer at 852, after which the system returns to its normal condition 840.
If the slave member is still failed, then another check 854 is performed to determine whether the time TAIT has reached the threshold limit. If the limit has not been reached, then the counter is incremented at 856 and the master continues at 858 to assign clients to the failed slave's VMAC address. If the AIT expires before the failed slave member resumes service, the master then at 860 stops sending the failed slave's VMAC address in response to ARP inquiries as part of the distribution algorithm. At the same time, the master and all other RG members start an algorithm removal timer, the ART timer.
While tending another RG member's VMAC address, it does not matter which VMAC address the master RG member uses as the source MAC address when responding to hosts. Hosts do not rebuild their ARP cache when a frame arrives from a known IP address having a different MAC address. The only time a host will notice a different source MAC address is if the host previously using the failed default gateway re-ARPs for its default gateway. In this case, the host would receive a response with a new VMAC address associated with a remaining RG member. If the failed slave RG member resumes operation and forms a peer relationship with the remaining RG members, the master RG member will stop responding to the resumed RG member's VMAC address. Although the resumed RG member will have to re-ARP for its hosts' MAC addresses, this will not disrupt the present load balancing distribution state.
If the slave has been permanently removed from service, it eventually must be removed altogether from any remaining RG member's state table since it is not to be used for load balancing and gateway services. Therefore, a check 862 is made to determine whether the slave member is still failed. If the slave has resumed service, then at 864 the resumed slave builds peering relationships and state tables and at 866 the other RG members stop their ART timers. At 850 the master relinquishes the VMAC address to the resumed slave and at 852 the members stop the AIT timers, if they are still counting.
The ART determines when it is safe for all remaining active RG members to totally remove any reference to the failed RG member's VMAC address. The ART must be set to be longer than the AIT and must ensure that no remaining hosts will point to the failed RG member's VMAC address after expiration of the ART. A safe minimum timer value for the ART is 4 hours. If the slave is still failed at 862, then a check 868 is made to determine whether the ART limit has been reached. If not, the ART timer TART is incremented at 870 and the master continues responding to the clients sending the failed slave's VMAC address at 872. The check 862 is again performed.
If the limit for the ART timer is reached, then at 874 the master stops responding to clients' using the failed slave's VMAC address and all clients erase the state table entry for the failed slave's VMAC address. As noted above, if a failed RG member resumes service after expiration of the AIT, but before expiration of the ART, the members will retain the resumed member's VMAC address in their caches, but the resumed member will have to begin anew in assignments of hosts to the newly resumed member in the distribution algorithm. However, as will be apparent to one of ordinary skill in the art, any appropriate process for re-integrating resuming members may be used.
ii. Master Failure
a illustrates schematically the second fail-over scenario when the master RG member fails. Upon failure 901 of the master RG member 510a, the slave RG member with the second highest priority in the Group (RG member 510b in the example illustrated in
As seen in
Whether or not the new master is the only RG member remaining active, the new master (and any remaining members) at 946 start the algorithm inclusion timer (AIT). (Again, because any surviving RG member might become the master, all of the RG members start the AIT; the acting master is the only RG member that is responsive to the AIT's status.)
The AIT allows the failed master's VMAC address to remain included in the load balancing distribution algorithm of the new master for a defined period of time. This timer thus provides for the possibility that the former master failure may only be temporary and that a portion of new connections (that is, responses to ARPs from hosts) under the load balancing distribution algorithm should still be directed to the failed master's VMAC address which is then being tended by the new master. Therefore, a check 948 is performed to see if the former master is still failed. If not, then the new master relinquishes the VMAC address to the resumed member as a new slave at 950 and the members stop the AIT timer at 952, after which the system returns to its normal condition 940.
If the former master is still failed, then another check 954 is performed to determine whether the time TAIT has reached the threshold limit. If the limit has not been reached, then the counter is incremented at 956 and the new master continues at 958 to assign clients to the failed master's VMAC address. If the AIT expires before the failed former master resumes service, the new master then at 960 stops sending the failed master's VMAC address in response to ARP inquiries as part of the distribution algorithm. At the same time, the new master and all other remaining RG members start an algorithm removal timer, the ART timer.
As noted above, while tending another RG member's VMAC address, it does not matter which VMAC address the new master RG member uses as the source MAC address when responding to hosts. Hosts do not rebuild their ARP cache when a frame arrives from a known IP address having a different MAC address. The only time a host will notice a different source MAC address is if the host previously using the failed default gateway re-ARPs for its default gateway. In this case, the host would receive a response with a new VMAC address associated with a remaining RG member. If the failed master RG member resumes operation and forms a peer relationship with the remaining RG members, the new master will stop responding to the resumed master RG member's VMAC address. Although the resumed former master, which re-enters the group as a slave, will have to re-ARP for its hosts' MAC addresses, this will not disrupt the present load balancing distribution state.
If the former master has been permanently removed from service, it eventually must be removed altogether from any remaining RG member's state table since it is not to be used for load balancing and gateway services. Therefore, a check 962 is made to determine whether the former master is still failed. If the former master has resumed service as a slave, then at 964 the resumed member builds peering relationships as a slave and state tables and at 966 the other RG members stop the ART timers. At 950 the new master relinquishes the VMAC address to the new slave and at 952 the members stop the AIT timers, if they are still counting.
As noted above, the ART determines when it is safe for all remaining active RG members to totally remove any reference to the failed RG member's VMAC address. The ART must be set to be longer than the AIT and must ensure that no remaining hosts will point to the failed RG member's VMAC address after expiration of the ART. A safe minimum timer value for the ART is 4 hours. If the former master is still failed at 962, then a check 968 is made to determine whether the ART limit has been reached. If not, the ART timer TART is incremented at 970 and the new master continues responding to the clients sending the failed master's VMAC address at 972. The check 962 is again performed.
If the limit for the ART timer is reached, then at 974 the new master stops responding to clients' using the failed master's VMAC address and all clients erase the state table entry for the failed master's VMAC address.
iii. RG Member Resumption Mechanism
As discussed briefly above, a failed RG member that resumes service can be re-inserted into the load balancing service. This RG member may be considered a new member (if it is, in fact, new or if it suffered what the system considers permanent failure) or a recovered member, if the failure was only temporary. Two possible scenarios exist for the insertion of the new or recovered member.
The first scenario involves a member resuming service within the AIT measured by the master (actually, the AIT is run by all RG members since any one of them might become master; it is only the master RG member that acts on the expiration of the AIT). In this scenario, the master has continued sending out the failed RG member's VMAC address to new hosts in a load balancing manner. As the new member comes up and announces itself through its VMAC Group hellos, the master will relinquish the resumed member's VMAC address and the resumed member will re-assume its duties as a slave RG member and addressee device for those hosts for which the resumed member serves as the gateway device. In addition, all active RG members will terminate their ART timers as the failed member is now back in service.
The second scenario involves a member who is considered a new member or is being re-inserted after the ART timers have expired and the active RG members no longer have prior knowledge of the newly inserted member. In this case, the new member announces itself through its VMAC Group hellos and the other active RG members add the new member and its associated VMAC address to their state tables. In addition, the master adds the new VMAC into the load balancing algorithm.
In the preferred embodiment of the present invention, the master will never relinquish its status as master to a newly inserted member, regardless of the newly inserted member's configured priority. Such action could cause problems if the newly inserted member has not yet learned the state of the other RG members before becoming the master. For this reason, no HSRP-type preempt function is used when running the preferred load balancing embodiment of the present invention. The newly inserted member will not become the master unless the current master fails.
One particular failure/recovery case warrants special attention. The case involves multiple failures where the master fails first and the newly elected master subsequently fails and resumes service having lost the state of the newly acquired VMAC address from the original master RG member. In this scenario, hosts addressing packets to the original master could be stranded without a gateway to use. Two possible outcomes exist for this scenario. First, if there are more than two RG members, this is a non-issue as there would be a third slave RG member to assume the master role after the second elected master failed. Because no preempt function is present, there is no threat that a resumed former master would assert itself as the new master.
The troublesome outcome exists when only two RG members exist in the group and the first master fails, subsequently the second master fails, and then either of the masters resumes service. In this scenario, the resumed master, regardless of which RG member resumes service, will have no “knowledge” of the other RG member's former existence. This could lead to the possible stranding of hosts for default gateway services.
To address this case, a gratuitous-ARP mechanism is used. This is the only place where gratuitous-ARP is used in the preferred embodiment of the present invention. If a newly resumed RG member finds itself master, it knows that it is the only active RG member in the gateway load balancing service. This is guaranteed because no preempt function is enabled. As a result, the only way this new member could become master is if it is the only member. The member election happens in the Group and follows normal election procedures. Since the newly resumed RG member is the only existing RG member, it also must be the sole gateway on the local network, thereby making it the master. Therefore, regardless of the address to which hosts are sending gateway traffic from a MAC address perspective (ARP cache), this newly resumed member is the only way out of the LAN or subnet. For that reason, the newly resumed and elected member will generate a gratuitous-ARP message listing its own VMAC address to ensure that all hosts utilize the newly resumed RG member it for default gateway services. This will ensure no hosts are left stranded unable to reach a default gateway.
While one embodiment of a gateway load balancing system has been described, it should be appreciated that other systems, methods and apparatus can implement and/or use the present invention without departing from the spirit or the scope of the present invention.
Number | Name | Date | Kind |
---|---|---|---|
4692918 | Elliott et al. | Sep 1987 | A |
4827411 | Arrowood et al. | May 1989 | A |
4965772 | Daniel et al. | Oct 1990 | A |
5016244 | Massey, Jr. et al. | May 1991 | A |
5018133 | Tsukakoshi et al. | May 1991 | A |
5218600 | Schenkyr et al. | Jun 1993 | A |
5315592 | Conant et al. | May 1994 | A |
5371852 | Attanasio et al. | Dec 1994 | A |
5414704 | Spinney | May 1995 | A |
5423002 | Hart | Jun 1995 | A |
5452447 | Nelson et al. | Sep 1995 | A |
5473599 | Li et al. | Dec 1995 | A |
5488412 | Majeti et al. | Jan 1996 | A |
5506987 | Abramson et al. | Apr 1996 | A |
5511208 | Boyles et al. | Apr 1996 | A |
5515429 | Kawada et al. | May 1996 | A |
5541911 | Nilakantan et al. | Jul 1996 | A |
5544320 | Konrad | Aug 1996 | A |
5555244 | Gupta et al. | Sep 1996 | A |
5572528 | Shuen | Nov 1996 | A |
5586121 | Moura et al. | Dec 1996 | A |
5611049 | Pitts et al. | Mar 1997 | A |
5619552 | Karppanen et al. | Apr 1997 | A |
5644713 | Makishima | Jul 1997 | A |
5644720 | Boll et al. | Jul 1997 | A |
5673265 | Gupta et al. | Sep 1997 | A |
5729537 | Billstrom | Mar 1998 | A |
RE35774 | Moura et al. | Apr 1998 | E |
5740375 | Dunne et al. | Apr 1998 | A |
5751971 | Dobbins et al. | May 1998 | A |
5774660 | Brendel et al. | Jun 1998 | A |
5787470 | DiSimone et al. | Jul 1998 | A |
5793763 | Mayes et al. | Aug 1998 | A |
5818845 | Moura et al. | Oct 1998 | A |
5825759 | Liu | Oct 1998 | A |
5828655 | Moura et al. | Oct 1998 | A |
5835696 | Hess | Nov 1998 | A |
5848241 | Misinai et al. | Dec 1998 | A |
5852717 | Bhide et al. | Dec 1998 | A |
5859852 | Moura et al. | Jan 1999 | A |
5862345 | Okanoue et al. | Jan 1999 | A |
5862451 | Grau et al. | Jan 1999 | A |
5872773 | Katzela et al. | Feb 1999 | A |
5892903 | Klaus | Apr 1999 | A |
5940596 | Rajan et al. | Aug 1999 | A |
5943604 | Chen et al. | Aug 1999 | A |
5946047 | Levan | Aug 1999 | A |
5946048 | Levan | Aug 1999 | A |
5949753 | Alexander et al. | Sep 1999 | A |
5950205 | Aviani, Jr. | Sep 1999 | A |
5953335 | Erimli et al. | Sep 1999 | A |
5956346 | Levan | Sep 1999 | A |
5959660 | Levan | Sep 1999 | A |
5959968 | Chin et al. | Sep 1999 | A |
5959997 | Moura et al. | Sep 1999 | A |
6227785 | Kanekar et al. | Sep 1999 | B1 |
5963540 | Bhaskaran | Oct 1999 | A |
5982745 | Wolff et al. | Nov 1999 | A |
5989060 | Coile et al. | Nov 1999 | A |
5996016 | Thalheimer et al. | Nov 1999 | A |
5996021 | Civaniar et al. | Nov 1999 | A |
5999536 | Kawafuji et al. | Dec 1999 | A |
6006264 | Colby et al. | Dec 1999 | A |
6006266 | Murphy et al. | Dec 1999 | A |
6016388 | Dillon | Jan 2000 | A |
6052718 | Gifford | Apr 2000 | A |
6058425 | White | May 2000 | A |
6061650 | Malkin et al. | May 2000 | A |
6065061 | Blahut et al. | May 2000 | A |
6073168 | Mighdoll et al. | Jun 2000 | A |
6078575 | Dommety et al. | Jun 2000 | A |
6088721 | Lin et al. | Jul 2000 | A |
6092178 | Jindal et al. | Jul 2000 | A |
6097882 | Mogul | Aug 2000 | A |
6108300 | Coile et al. | Aug 2000 | A |
6111675 | Mao et al. | Aug 2000 | A |
6128666 | Muller et al. | Oct 2000 | A |
6131121 | Mattaway et al. | Oct 2000 | A |
6148000 | Feldman et al. | Nov 2000 | A |
6148410 | Baskey et al. | Nov 2000 | A |
6157965 | Mohammed et al. | Dec 2000 | A |
6167438 | Yates et al. | Dec 2000 | A |
6167446 | Lister et al. | Dec 2000 | A |
6173312 | Atarashi et al. | Jan 2001 | B1 |
6178455 | Schutte et al. | Jan 2001 | B1 |
6182139 | Brendel | Jan 2001 | B1 |
6189043 | Buyukkoc et al. | Feb 2001 | B1 |
6189102 | Beser | Feb 2001 | B1 |
6195689 | Bahlmann | Feb 2001 | B1 |
6195705 | Leung | Feb 2001 | B1 |
6202169 | Razzaghe-Ashrafi | Mar 2001 | B1 |
6205481 | Heddaya et al. | Mar 2001 | B1 |
6230196 | Guenthner et al. | May 2001 | B1 |
6230326 | Unger et al. | May 2001 | B1 |
6236678 | Horton et al. | May 2001 | B1 |
6240089 | Okanoue et al. | May 2001 | B1 |
6240461 | Cieslak et al. | May 2001 | B1 |
6243360 | Basilico | Jun 2001 | B1 |
6243760 | Armbruster et al. | Jun 2001 | B1 |
6247054 | Malkin | Jun 2001 | B1 |
6249820 | Dobbins et al. | Jun 2001 | B1 |
6253234 | Hunt et al. | Jun 2001 | B1 |
6260070 | Shah et al. | Jul 2001 | B1 |
6286038 | Reichmeyer et al. | Sep 2001 | B1 |
6286084 | Wexler et al. | Sep 2001 | B1 |
6292832 | Shah et al. | Sep 2001 | B1 |
6295276 | Datta et al. | Sep 2001 | B1 |
6298063 | Coile et al. | Oct 2001 | B1 |
6298381 | Shah et al. | Oct 2001 | B1 |
6301223 | Hrastar et al. | Oct 2001 | B1 |
6324564 | Thieke et al. | Nov 2001 | B1 |
6327622 | Jindal et al. | Dec 2001 | B1 |
6330602 | Law et al. | Dec 2001 | B1 |
6330606 | Logue et al. | Dec 2001 | B1 |
6339830 | See et al. | Jan 2002 | B1 |
6345294 | O'Toole et al. | Feb 2002 | B1 |
6370147 | Beser | Apr 2002 | B1 |
6377972 | Guo et al. | Apr 2002 | B1 |
6385204 | Hoefelmeyer et al. | May 2002 | B1 |
6389027 | Lee et al. | May 2002 | B1 |
6389462 | Cohen et al. | May 2002 | B1 |
6397260 | Wils et al. | May 2002 | B1 |
6415323 | McCanne et al. | Jul 2002 | B1 |
6434608 | Desai | Aug 2002 | B1 |
6438123 | Chapman | Aug 2002 | B1 |
6438652 | Jordan et al. | Aug 2002 | B1 |
6442558 | Brown et al. | Aug 2002 | B1 |
6446121 | Shah et al. | Sep 2002 | B1 |
6449249 | Cloonan et al. | Sep 2002 | B1 |
6449250 | Otani et al. | Sep 2002 | B1 |
6463475 | Calhoun | Oct 2002 | B1 |
6466964 | Leung et al. | Oct 2002 | B1 |
6470027 | Birrell, Jr. | Oct 2002 | B1 |
6470389 | Chung et al. | Oct 2002 | B1 |
6473802 | Masters | Oct 2002 | B2 |
6477197 | Unger | Nov 2002 | B1 |
6477522 | Young | Nov 2002 | B1 |
6480469 | Moore et al. | Nov 2002 | B1 |
6487591 | Budhraja et al. | Nov 2002 | B1 |
6487605 | Leung | Nov 2002 | B1 |
6493318 | Bare | Dec 2002 | B1 |
6499088 | Wexler et al. | Dec 2002 | B1 |
6510162 | Fijolek et al. | Jan 2003 | B1 |
6510469 | Starnes et al. | Jan 2003 | B1 |
6512774 | Vepa et al. | Jan 2003 | B1 |
6519646 | Gupta et al. | Feb 2003 | B1 |
6532493 | Aviani et al. | Mar 2003 | B1 |
6553568 | Fijolek et al. | Apr 2003 | B1 |
6556591 | Bernath et al. | Apr 2003 | B2 |
6577642 | Fijolek et al. | Jun 2003 | B1 |
6578066 | Logan et al. | Jun 2003 | B1 |
6578077 | Rakoshitz et al. | Jun 2003 | B1 |
6588016 | Chen et al. | Jul 2003 | B1 |
6594305 | Roeck et al. | Jul 2003 | B1 |
6598232 | McAlear | Jul 2003 | B1 |
6606315 | Albert et al. | Aug 2003 | B1 |
6606352 | Horton et al. | Aug 2003 | B2 |
6611868 | Arutyunov | Aug 2003 | B1 |
6621810 | Leung | Sep 2003 | B1 |
6633585 | Ghanwani et al. | Oct 2003 | B1 |
6636498 | Leung | Oct 2003 | B1 |
6636971 | Loukianov | Oct 2003 | B1 |
6650624 | Quigley et al. | Nov 2003 | B1 |
6650641 | Albert | Nov 2003 | B1 |
6665304 | Beck et al. | Dec 2003 | B2 |
6671259 | He et al. | Dec 2003 | B1 |
6674713 | Berg et al. | Jan 2004 | B1 |
6683873 | Kwok et al. | Jan 2004 | B1 |
6687217 | Chow et al. | Feb 2004 | B1 |
6691165 | Bruck et al. | Feb 2004 | B1 |
6724724 | Swenson et al. | Apr 2004 | B1 |
6728748 | Mangipudi et al. | Apr 2004 | B1 |
6735631 | Oehrke et al. | May 2004 | B1 |
6742044 | Aviani et al. | May 2004 | B1 |
6760302 | Ellinas et al. | Jul 2004 | B1 |
6760316 | Hebsgaard et al. | Jul 2004 | B1 |
6765892 | Leung et al. | Jul 2004 | B1 |
6768743 | Borella et al. | Jul 2004 | B1 |
6779039 | Bommareddy et al. | Aug 2004 | B1 |
6789125 | Aviani et al. | Sep 2004 | B1 |
6795857 | Leung et al. | Sep 2004 | B1 |
6795860 | Shah et al. | Sep 2004 | B1 |
6801949 | Bruck et al. | Oct 2004 | B1 |
6804221 | Magret et al. | Oct 2004 | B1 |
6826599 | Shaffer et al. | Nov 2004 | B1 |
6832252 | Cieslak et al. | Dec 2004 | B1 |
6835696 | Saito et al. | Dec 2004 | B2 |
6839809 | Forster et al. | Jan 2005 | B1 |
6857026 | Cain | Feb 2005 | B1 |
6920498 | Gourlay et al. | Jul 2005 | B1 |
6952401 | Kadambi et al. | Oct 2005 | B1 |
6981056 | Aviani et al. | Dec 2005 | B1 |
7006431 | Kanekar et al. | Feb 2006 | B1 |
7039720 | Alfieri et al. | May 2006 | B2 |
7062570 | Hong et al. | Jun 2006 | B2 |
7065043 | Wu et al. | Jun 2006 | B2 |
7072979 | Aviani et al. | Jul 2006 | B1 |
7111061 | Leighton et al. | Sep 2006 | B2 |
7117273 | O'Toole et al. | Oct 2006 | B1 |
7136383 | Wilson | Nov 2006 | B1 |
7143195 | Vange et al. | Nov 2006 | B2 |
7149217 | Alexander et al. | Dec 2006 | B2 |
7151758 | Kumaki et al. | Dec 2006 | B2 |
7162540 | Jasen et al. | Jan 2007 | B2 |
7260102 | Mehrvar et al. | Aug 2007 | B2 |
7349979 | Cieslak et al. | Mar 2008 | B1 |
7395348 | Cieslak et al. | Jul 2008 | B1 |
7401159 | Aviani et al. | Jul 2008 | B1 |
7593346 | McLaggan et al. | Sep 2009 | B2 |
7734816 | Cieslak et al. | Jun 2010 | B2 |
20010014097 | Beck et al. | Aug 2001 | A1 |
20010042105 | Koehler et al. | Nov 2001 | A1 |
20010049741 | Skene et al. | Dec 2001 | A1 |
20020010783 | Primak et al. | Jan 2002 | A1 |
20020012327 | Okada | Jan 2002 | A1 |
20020019879 | Jasen et al. | Feb 2002 | A1 |
20020023159 | Vange et al. | Feb 2002 | A1 |
20020040400 | Masters | Apr 2002 | A1 |
20020062372 | Hong et al. | May 2002 | A1 |
20020099849 | Alfieri et al. | Jul 2002 | A1 |
20020120697 | Generous et al. | Aug 2002 | A1 |
20020129134 | Leighton et al. | Sep 2002 | A1 |
20020191562 | Kumaki et al. | Dec 2002 | A1 |
20030014540 | Sultan et al. | Jan 2003 | A1 |
20030031178 | Haeri et al. | Feb 2003 | A1 |
20030037165 | Shinomiya et al. | Feb 2003 | A1 |
20030133420 | Haddad | Jul 2003 | A1 |
20030161303 | Mehvar et al. | Aug 2003 | A1 |
20040071141 | Dhara et al. | Apr 2004 | A1 |
20050111352 | Ho et al. | May 2005 | A1 |
20050213832 | Schofield et al. | Sep 2005 | A1 |
Number | Date | Country |
---|---|---|
9529544 | Nov 1995 | WO |
WO9831107 | Jul 1998 | WO |
2005018176 | Feb 2005 | WO |