A high availability system is a system that is resilient to failures of the system's components. Typically, this is achieved by providing redundant components so that if one component fails, a redundant component can take over the tasks of the component that failed. In software defined networking (“SDN”), high availability (“HA”) may be implemented in service routers (“SRs”) that are configured in an active-standby mode.
A SR may be implemented as a logical router that is configured and deployed, for example, on an edge services gateway (“ESG”). A logical router is a router that routes data packets between logical overlay networks or subnets. Each logical overlay network is logically decoupled from a physical underlay network such as a physical network infrastructure.
An ESG can route data packets in and out of a logical overlay network and outside networks. The ESG may be implemented as a collection of software components that can provide different edge services, including data packet routing, firewall services, NAT, DHCP, VPN, load balancing, and the like. ESGs may be installed on virtual machines (“VMs”) in a datacenter. As such, the ESG may be referred to as a virtual appliance, a virtual network function, or a service VM.
In a typical HA scenario, a pair of SRs forms a SR cluster, and each of the SRs in the cluster is realized on a different ESG node. One of the SRs in the cluster operates in an active mode, while another SR operates in a standby mode. The active SR may be configured to forward data traffic and apply the services that are configured on the active SR's ESG. The standby SR may remain in the standby mode until it needs to become active.
Switching a SR from operating in an active mode to operating in a standby mode is usually triggered upon receiving an indication that the active SR has failed or that an edge gateway node on which the active SR executes has failed. That may happen upon detecting that, for example, an interface line status has changed, a data path process has failed, a bidirectional forwarding detection (“BFD”) link has failed, or a border gateway protocol (“BGP”) routing status has changed. None of those, however, takes into consideration health status of the services configured on the SRs. Furthermore, none of those takes into consideration the relative health of the SRs in the cluster.
Techniques are presented herein for providing HA support by a SR cluster to perform services switchover without affecting other clusters configured on the same ESG. The SR cluster includes a pair of SRs, one of which operates in an active mode and another in a standby mode. The SR cluster may be implemented on an edge gateway. The services switchover may occur when at least one service configured on the active SR of the SR cluster fails.
In addition, techniques are described herein for performing a cooperative active-standby failover between logical SRs based on health of services configured on the SRs in an active-standby SR cluster. The techniques include determining the health status of the services configured on the SRs in the cluster, exchanging the health information between the SRs, and determining, based, at least in part, on the relative health of the two SRs, whether to switch all services configured on an active SR to a standby SR.
A service switchover described herein is a high-level granularity failover process. That means that the switchover pertains to switching either all services from one SR onto another SR, or switching no services from one SR onto another SR. For example, if it is determined that some services configured on an active SR have failed, and that a standby SR has become healthier than the active SR, then all, not just some, services configured on the active SR may be switched over to the standby SR, and the standby SR may become active.
In the drawings:
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the method described herein. It will be apparent, however, that the present approach may be practiced without these specific details. In some instances, well-known structures and devices are shown in a block diagram form to avoid unnecessarily obscuring the present approach.
1. Example Physical Implementations
Hosts 110A and 110B may be configured to implement VMs, ESGs, logical routers, logical switches, and the like. Hosts 110A and 110B are also referred to as computing devices, host computers, host devices, physical servers, server systems, or physical machines.
In the example depicted in
Virtual machines VM1A-VM2B executed on hosts 110A-110B, respectively, are examples of virtualized computing instances or workloads. A virtualized computing instance may represent an addressable data compute node or an isolated user space instance.
An ESG is a virtualized network component that may be configured to provide edge security and gateway services to VMs and hosts. It may be implemented as a logical router or as a service gateway, and may provide dynamic routing services, firewall services, NAT services, DHCP services, site-to-site VPN services, L2 VPN services, load balancing services, and the like.
An ESG can be either a VM or a physical machine. If an ESG is implemented in a VM, then the gateway runs on a hypervisor. If an ESG is implemented in a physical machine, then the gateway executes on a physical server.
In the depicted example, ESGs 130A and 130B may implement for example, gateways configured to process and forward data traffic to, and from, VMs and hosts.
In an embodiment, ESGs 130A and 130B implement one or more SRs (depicted in
In an embodiment, hosts 110A and 110B are configured to support execution of hypervisors (not depicted in
Managed forwarding elements 120A and 120B may be software components configured to perform forwarding of data packets that are received from, or sent to, VMs and/or ESGs. For example, if managing forwarding element 120A executing on host 110A receives a data packet from VM1A 101A, then managing forwarding element 120A may perform the processing of a logical switch, which is logically coupled to VM1A 101A, to direct the packet to for example, ESG 130A.
Each of hosts 110A and 110B includes one or more hardware components. Hardware components 115A/115B may include one or more processors 116A/116B, one or more memory units 117A/117B, one or more network interface controllers 118A/118B, one or more physical network controllers 119A/119B, and/or one or more storage devices 121A/121B. Hardware components 115A/115B may include additional elements not depicted in
2. Preemptive and Non-Preemptive Modes
A cooperative active-standby failover between logical SRs may be implemented in either a non-preemptive mode or a preemptive mode. The difference between the preemptive mode and the non-preemptive mode is mainly in the failback behavior. In a preemptive mode, the system will “failback” as soon as the service is restored on the original active node, whereas in a non-preemptive system, the standby node that became active may remain active until it fails.
SRs configured in an active/standby cluster may have assigned ranks. When a SR recovers and has the higher rank than a peer SR, then in a preemptive mode the recovered SR becomes active; whereas in a non-preemptive mode the recovered SR checks the peer's status to decide whether the SR should go active.
In a preemptive mode, there is also a possibility that a higher-ranked, recovered SR remains in a standby mode as it notifies a peer that it has recovered and as it waits for the peer to relinquish its own active role. Once the peer relinquishes the active role, the recovered SR goes active.
In a non-preemptive mode, all services, regardless of their types, configured in an active-standby SR cluster participate in the service-triggered switchover. In the non-preemptive mode, a SR that is the healthiest in the cluster remains or becomes active.
3. Example Cooperative Active-Standby Failover Between Two Service Routers
In an embodiment, all services configured on SRs of an active-standby cluster report their health status to HA engines, also referred to as HA logic. The HA logic may be implemented inside the SRs. Each SR may implement its own HA engine, and the HA engines implemented in the SRs may communicate with each other.
In
SRs 150A-150B may communicate with each other via a bidirectional forwarding detection (“BFD”) channel 170A. SRs 150A-150B may use BFD channel 170A to exchange various messages and information. Communications exchanged along BFD channel 170A may be transmitted in compliance with a common protocol that provides mechanisms for exchanging messages and scores between the SRs.
Services configured on SRs may report their own health status in many ways. For example, an operational service may generate a message that includes a statement “ready” and transmit that message to HA engines implemented on the SRs, while a nonoperational service—if it can—may generate a message “not_ready,” and transmit that message to the HA engines. The SRs may also implement daemons that monitor non-operational services, as some of the non-operational services may be unable to generate and/or send “not_ready” messages. In that situation, the daemons step in and report the “not_ready” messages to the HA engines.
Based on “ready” messages and “not_ready” messages received from services configured on SRs, the HA engines may compute aggregate scores for the SRs. An aggregate score computed for a SR may be a sum of the “ready” messages received from the services configured on the SR. The HA engines may communicate the aggregate scores to all SRs in a cluster to allow the SRs to work in concert to determine whether a switchover should occur. The aggregate scores may be communicated to SRs using a common protocol that provides mechanisms for sending messages and scores to the SRs.
An aggregate score computed for a SR represents a health measure of the SR. Each SR cluster may have its own criteria for determining health measures of SRs. In an embodiment, the higher the aggregate score of a SR is, the healthier the SR is. If for example, a “ready” message from a SR's healthy service is awarded a score of one, and a “not_ready” message from an unhealthy service is given a score of zero, then an aggregate score computed for the SR represents a count of the healthy services configured on the SR. Thus, if a SR has four services, out of which three services are healthy and one is unhealthy, then an aggregate score computed for the SR is three.
Based on aggregate scores, each SR of the two SRs in a cluster determines the SR that has the highest aggregate score, and thus is the healthiest SR in the cluster. In a non-preemptive mode, the SR with the highest relative health usually becomes or remains an active SR.
A SR may determine its own relative health by comparing its own aggregate score with an aggregate score of a peer SR in a cluster. When an active SR determines that its own aggregate score exceeds an aggregate score computed for a standby SR, then the active SR may determine that it is healthier than the standby SR, and thus it may remain active. However, if an active SR determines that its own aggregate score is lower than the score computed for a standby SR, then the active SR may determine that the standby SR is healthier, and thus a failover should be triggered. Some exceptions to this rule are described later.
Suppose that both SRs operate in a non-preemptive mode, and that SR 150A is active, SR 150B is standby, and an aggregate score computed for SR 150A exceeds an aggregate score computed for SR 150B. In this situation, active SR 150A may determine that SR 150A is healthier than SR 150B, and therefore, SR 150A may remain active.
However, if an aggregate score computed for SR 150A is lower than an aggregate score computed for SR 150B, then SR 150A may determine that SR 150B is healthier than 150A, and therefore a switchover of services 201A-204A from SR 150A to SR 150B should be triggered. Once the switchover is triggered, SR 150B starts providing services 201B-204B, as depicted in
In
4. Example Cooperative Active-Standby Failover Between Two Non-Preemptive Service Routers
In one possible implementation, the steps depicted in
In step 302, an LCP process implemented on an SR receives messages indicating health status of services configured on the SRs. The LCP process may receive the messages via a channel that is separate from a BFD channel. The health status of a service may indicate whether the service is operational or not. A message from an operational service may include a statement “ready,” while a message from a nonoperational service may include a statement “not_ready.” If, for example, four services are configured on each of the SRs, then the LCP process implemented on a SR1 may receive four messages from the four services configured on the SR1 and four messages from the four services configured on a SR2.
In step 304, the LCP process determines, based on the received health status messages, an aggregate score for each SR in the SR cluster. An aggregate score computed for a SR represents a health measure of the SR. A healthy service may be awarded a score of one, while an unhealthy service may be given a score of zero. If all four services configured on a SR sent “ready” messages, then an aggregate score for the SR is four. However, if three services configured on a SR sent “ready” messages, but one service sent a “not_ready” message, then an aggregate score for the SR is three.
Although not depicted in
In step 310, based on the aggregate scores, the LCP process of the first SR determines whether its own aggregate score is higher than an aggregate score of a peer SR.
If, in step 312, the LCP process of the first SR determines that its own score is the highest score, then the first SR proceeds to performing step 314; otherwise, the first SR proceeds to performing step 316.
In step 314, the first SR determines that it is healthier than the second SR, and therefore, the first SR continues operating in an active mode.
In step 316, the first SR determines whether its own aggregate score is the lowest score, i.e., whether its own aggregate score is lower than an aggregate score of the second SR. If there are more than two SRs in the cluster, then the first SR determines whether its own aggregate score is lower than the next-highest aggregate score. If it is not, i.e., they have the same score, then the first SR proceeds to performing step 318; otherwise, the first SR proceeds to performing step 322.
In step 318, which the first SR reaches if the first SR determines that both SRs are equally healthy, the first SR determines whether its own rank is greater than a rank of the second SR. Ranks may be assigned to the SR by users, and/or may be communicated to the SRs by HA engines.
In the preemptive behavior, if the first SR determines that its own rank is greater than the rank of the second SR, then the first SR proceeds to performing step 314; otherwise, the first SR proceeds to performing step 322.
In step 322, the first SR transfers the services from the first SR to the second SR, transitions from an active state to a standby state, and causes the second SR to transition from a standby state to an active state.
Although not depicted in
5. Example Cooperative Active-Standby Failover Between Two-Tier Service Routers
A SR cluster that has been configured with mandatory services supports HA in both a preemptive mode and non-preemptive mode. In a preemptive mode, for the purpose of providing HA, services configured on the SRs are grouped into two tiers. A critical tier may include mandatory services, while an optional tier may include nonmandatory services. Determination which services are mandatory and which services are nonmandatory may be made by users.
In step 402, an active SR, or an LCP implemented on an active SR, receives, from the services configured on the active SR, one or more associations between scores of services configured on the active SR and tier indicators of the services configured on the active SR. An example of an association for an operational, mandatory service, configured on a SR1, and included in a critical_tier, may be [1, critical_tier], or [1, critical_tier, SR1]. An example of an association for a nonoperational, mandatory service, configured on a SR1, and included in a critical_tier, may be [0, critical_tier], or [0, critical_tier, SR1]. An example of an association for a nonoperational, nonmandatory service, configured on a SR1, and included in an optional_tier, may be [0, optional_tier], or [0, optional_tier, SR1].
In step 404, the active SR determines, based on the received associations, whether any of critical_tier services configured on the active SR is nonoperational.
If, in step 406, the active SR determined that one or more mandatory services, configured on the active SR, became nonoperational, then the active SR proceeds to performing step 408; otherwise, the active SR proceeds to performing step 410.
In step 408, which the active SR reaches if the active SR determines that at least one of the mandatory services configured on the active SR became unhealthy, the active SR transfers the services from the active SR to a standby SR, transitions from an active state to a standby state, and causes the standby SR to transition from a standby state to an active state. In one embodiment, a check is performed to ensure that the mandatory services are healthy on the standby SR prior to transitioning the standby SR to active.
In step 410, which the active SR reaches if the active SR determines that none of the mandatory services configured on the active SR is unhealthy, the active SR proceeds to performing steps 308-322 of
6. Example Tie Breaking Process When Two Service Routers are Ready to Become Active
In
The approach described in
Initially, SR 150A detects that, although SR 150A is “down,” SR 150A is ready to become active. Therefore, SR 150A generates, and transmits to SR 150B, a message 520A to inquire about the state of SR 150B.
Meanwhile, SR 150B detects that, although SR 150B is “down,” SR 150B is ready to become active. Therefore, SR 150B generates, and transmits to SR 150A, a message 520B to inquire about the state of SR 150A.
Upon receiving the state inquiry, SR 150A generates, and transmits to SR 150B, a message 522A that indicates that SR 150A is “down.”
Upon receiving the state inquiry, SR 150B generates, and transmits to SR 150A, a message 522B that indicates that SR 150B is “down.”
Upon receiving the message indicating that SR 150B is “down,” SR 150A invokes a finite-state machine (“FSM”) application to generate an ACTIVE_REQUESTED message 524A. Subsequently, SR 150A transmits to SR 150B an ACTIVE_REQUESTED message 524A to indicate that SR 150A is already in an active state.
Upon receiving the message indicating that SR 150A is “down,” SR 150B invokes a FSM application to generate an ACTIVE_REQUESTED message 524B. Subsequently, SR 150B transmits to SR 150A an ACTIVE REQUESTED message 524B to indicate that SR 150B is requesting an active state.
Upon receiving message 524B indicating that SR 150B requested an active state, SR 150A invokes the FSM application to compare a rank of SR 150A with a rank of SR 150B. Since SR 150A has a higher rank than a rank of SR 150B, the FSM determines that SR 150A should become active. Subsequently, SR 150A transitions to the active state.
Upon receiving message 524A indicating that SR 150A requested an active state, SR 150B invokes the FSM application to compare a rank of SR 150B with a rank of SR 150A. Since SR 150A has a higher rank than a rank of SR 150B, the FSM determines that SR 150B should become standby. Subsequently, SR 150B transitions to the standby state. This concludes the transitioning of SR 150A from a down state to an active state.
7. Example Cooperative Active-Standby Failover Between Two Service Routers
It is assumed that SR 150A is in an active state 4A, and a score of SR 150A is a non-negative value “A.” It is also assumed that SR 150B is in a standby state 4B, and a score of SR 150B is a non-negative value “Y.”
Suppose that an event 5A occurs, and SR 150A determines that its own score has been lowered and changed from “A” to “X,” where A is greater than X.
Subsequently, SR 150A sends (620A) a packet to SR 150B. The packet includes information indicating that the state of SR 150A is active, and that the score of SR 150A is X.
Suppose that, upon receiving the packet from SR 150A, SR 150B determines that its own score Y is greater than the score X of SR 150A. If that happen, then SR 150B sends (622B) a packet to SR 150A to indicate that a state 5B of SR 150B as active, and that the score of SR 150B is Y. Upon receiving that packet, SR 150A determines that SR 150B became active. Subsequently, SR 150A transitions to a standby state 6A since SR 150B acts in the active mode now.
However, suppose that, upon receiving the packet from SR 150A, SR 150B determined that its own score Y is lesser than the score X of SR 150A. This situation is not depicted in
In situation when the score X is equal to the score Y, the ranks of SR 150A and SR 150B may be considered, or it may be assumed that for example, SR 150A remains active as long as its own score is not lower than the score of SR 150B.
8. Improvements Provided by Certain Embodiments
In an embodiment, an approach provides mechanisms for implementing a cooperative active-standby failover between logical routers based on health of attached services. The cooperative active-standby failover is triggered upon determining operational states of services configured on the SRs. For example, the cooperative active-standby failover may be triggered based on results of a comparison of aggregate scores computed for the SRs and indicating health status of services configured on the SRs.
In an embodiment, all services configured on an active SR may be switched over to a standby SR when one or more services configured on the active SR fails. Since HA of the services, especially mandatory services, configured on a SR may be paramount to some users, performing a timely switchover of the services from one SR to another SR may be critical to maintain the HA of the logical routers.
9. Implementation Mechanisms
The present approach may be implemented using a computing system comprising one or more processors and memory. The one or more processors and memory may be provided by one or more hardware machines. A hardware machine includes a communications bus or other communication mechanisms for addressing main memory and for transferring data between and among the various components of hardware machine. The hardware machine also includes one or more processors coupled with the bus for processing information. The processor may be a microprocessor, a system on a chip (SoC), or other type of hardware processor.
Main memory may be a random-access memory (RAM) or other dynamic storage device. It may be coupled to a communications bus, and used for storing information and software instructions to be executed by a processor. Main memory may also be used for storing temporary variables or other intermediate information during execution of software instructions to be executed by one or more processors.
10. General Considerations
Although some of various drawings may illustrate a number of logical stages in a particular order, stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings may be specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
The foregoing description, for purpose of explanation, has been described regarding specific embodiments. However, the illustrative embodiments above are not intended to be exhaustive or to limit the scope of the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen to best explain the principles underlying the claims and their practical applications, to thereby enable others skilled in the art to best use the embodiments with various modifications as are suited to the uses contemplated.
Any definitions set forth herein for terms contained in the claims may govern the meaning of such terms as used in the claims. No limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of the claim in any way. The specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
As used herein the terms “include” and “comprise” (and variations of those terms, such as “including,” “includes,” “comprising,” “comprises,” “comprised” and the like) are intended to be inclusive and are not intended to exclude further features, components, integers or steps.
References in this document to “an embodiment,” indicate that the embodiment described or illustrated may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described or illustrated in connection with an embodiment, it is believed to be within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly indicated.
Various features of the disclosure have been described using process steps. The functionality/processing of a given process step could potentially be performed in different ways and by different systems or system modules. Furthermore, a given process step could be divided into multiple steps and/or multiple steps could be combined into a single step. Furthermore, the order of the steps can be changed without departing from the scope of the present disclosure.
It will be understood that the embodiments disclosed and defined in this specification extend to alternative combinations of the individual features and components mentioned or evident from the text or drawings. These different combinations constitute various alternative aspects of the embodiments.
Number | Name | Date | Kind |
---|---|---|---|
5835696 | Hess | Nov 1998 | A |
6108300 | Coile | Aug 2000 | A |
6286048 | Moberg | Sep 2001 | B1 |
6754220 | Lamberton | Jun 2004 | B1 |
6954436 | Yip | Oct 2005 | B1 |
7152179 | Critchfield | Dec 2006 | B1 |
7194653 | Hadders | Mar 2007 | B1 |
7415507 | Ward | Aug 2008 | B1 |
7486611 | Wilson | Feb 2009 | B1 |
7554903 | Blankenship | Jun 2009 | B2 |
8001269 | Satapati | Aug 2011 | B1 |
9838317 | Yadav | Dec 2017 | B1 |
10250562 | Srinath | Apr 2019 | B1 |
10374924 | Holland | Aug 2019 | B1 |
20020078232 | Simpson | Jun 2002 | A1 |
20050058063 | Masuyama | Mar 2005 | A1 |
20050240797 | Orava | Oct 2005 | A1 |
20060036762 | Vadlakonda | Feb 2006 | A1 |
20060047809 | Slattery | Mar 2006 | A1 |
20060064486 | Baron | Mar 2006 | A1 |
20060087976 | Rhodes | Apr 2006 | A1 |
20070127461 | Yamada | Jun 2007 | A1 |
20070264971 | Blankenship | Nov 2007 | A1 |
20080016206 | Ma | Jan 2008 | A1 |
20080205402 | McGee | Aug 2008 | A1 |
20080212483 | Zhang | Sep 2008 | A1 |
20090132862 | Martin | May 2009 | A1 |
20090285090 | Allasia | Nov 2009 | A1 |
20100023632 | Liu | Jan 2010 | A1 |
20100293293 | Beser | Nov 2010 | A1 |
20110267962 | J S A | Nov 2011 | A1 |
20120155283 | Sanguineti | Jun 2012 | A1 |
20120194634 | Ferdinand | Aug 2012 | A1 |
20120317437 | Kanso | Dec 2012 | A1 |
20130322232 | Csaszar | Dec 2013 | A1 |
20130343174 | Guichard | Dec 2013 | A1 |
20140016457 | Enyedi | Jan 2014 | A1 |
20140078887 | Yu | Mar 2014 | A1 |
20140095696 | Sala | Apr 2014 | A1 |
20140112122 | Kapadia | Apr 2014 | A1 |
20140143591 | Chiang | May 2014 | A1 |
20150023352 | Yang | Jan 2015 | A1 |
20150055650 | Bhat | Feb 2015 | A1 |
20150076903 | Kanayama | Mar 2015 | A1 |
20160065498 | Harper | Mar 2016 | A1 |
20160149980 | Karthikeyan | May 2016 | A1 |
20160191613 | Srinivasan | Jun 2016 | A1 |
20160226700 | Zhang | Aug 2016 | A1 |
20160248806 | Smith | Aug 2016 | A1 |
20170111157 | Nagrath | Apr 2017 | A1 |
20170171056 | Dong | Jun 2017 | A1 |
20180063731 | Ashrafi | Mar 2018 | A1 |
20180123956 | Theogaraj | May 2018 | A1 |
20200036576 | Fan | Jan 2020 | A1 |
20210058284 | Chandramohan | Feb 2021 | A1 |
Entry |
---|
Fan, U.S. Appl. No. 16/048,107, filed Jul. 27, 2018, Notice of Allowance, dated Aug. 26, 2019. |
Egress Software Technologies Ltd, Technical White Paper entitled “TLS encryption: Is is really the answer to securing email?” (8 pages). |
Number | Date | Country | |
---|---|---|---|
20200028731 A1 | Jan 2020 | US |