The present invention provides a more highly available network switch. A network switch is a multi-port network interconnect device that forwards received packets to their intended destinations. Unlike a network hub, which broadcasts a received packet out all ports, a switch inspects a packet to determine its destination and then forwards it out only the port or ports that lead to its intended destination.
Modular switches allow for repair or expansion by inserting additional modules, e.g., into a chassis, or replacing defective or outdated modules; often the switches provide for “hot swapping”, i.e., removal or insertion while the switch is in operation to minimize downtime. Heterogeneous modular switches employ different types of modules. For example, a switch can employ 1) connectivity modules that provide the ports for connecting to an external network or networks, 2) fabric modules that provides internal connections among the ports, and 3) management modules. Most data packets are received at a port of a connectivity module and forwarded to a fabric module. The fabric module processes the data packet and forwards it to the appropriate port of a connectivity module for transmission to the packet's destination. A fabric module can route packets destined for the switch itself and packets needing special handling to the management module. For example, communications used to set up communication protocols are sent to management modules.
To avoid catastrophic network failures, several levels of redundancy are applied to the switching function. Multiple switches can be used to provide alternate network paths to bypass a failed switch. In addition, several types of redundancy can be applied to a switch to minimize the likelihood of it failing. At the connectivity level, multiple connections between a switch and a network node can remove the dependency on any single port or connectivity module. At the fabric level, redundant fabric modules can be used, and redundancy can be built into each fabric module.
At the connectivity and fabric levels, additional modules can be used not only to increase performance, but to provide back up in the event a module fails. At the management level, typically only one module can be active at a time; however, redundancy can be implemented in the form of a “tracking” standby module, i.e., a module that does not interact with external devices other than to track the state of an active module. In the event of a failure of the active module, the tracking standby module provides service continuity as it assumes management activities. However, there are situations in which the failover to the standby management module does not provide complete tracking of the state of the active module.
Herein, related art is described to facilitate understanding of the invention. Related art labeled “prior art” is admitted prior art; related art not labeled “prior art” is not admitted prior art.
The figures depict implementations/embodiments of the invention and not the invention itself.
In the course of the present invention, it was recognized that a failure of an active management (device-manager) module can leave a standby management module in a problematic state. In response to an event (e.g., receipt of a data packet in the process of establishing a communications protocol), an active management module can run several processes (herein, including threads and tasks), i.e., sequences of instructions being executed. To achieve high performance, these processes run concurrently and use respective local copies of common data. Once response processing is complete and all processes have reached steady state, a single consistent state is established for the active management module.
However, while a management module is responding to an event, the copies of data owned by different processes can become inconsistent. Likewise, the state of the standby management module, which is tracking the active management module, can be temporarily inconsistent. If the active management module fails when the standby management module is in an inconsistent state, the latter may fail to function properly when it assumes the role of the active management module.
To address this problem, the present invention provides for establishing authority relationships among management processes. During the transition from a standby mode to an active mode, the data associated with some “authoritative” processes is deemed “authoritative”. The data associated with non-authoritative processes or less authoritative processes is reconciled with more authoritative data to ensure a consistent state when active mode is assumed.
In
Network switch API, as illustrated in
Management module 20 provides for concurrent software processes 21, 23, 25, and 27. Authority relations are established among processes 23, 25, and 27. Authoritative process 23 has top-level authority in that its data is authoritative and is not conformed to the data of any other process of management module 20. (Herein, process A “conforms” to process B when first data associated with process A and inconsistent with second data associated with process B is replaced by third data consistent with the second data.) The data can include layer 2 and layer 3 addresses as well as configuration data for processes and modules comprising the network switch.
Hybrid process 25 is both an authoritative process (with respect to process 27) and a conforming process (with respect to process 23). Process 25 is subordinate to authoritative process 23 in that the data of hybrid process 25 must conform or be conformed to the data of authoritative process 23. Conforming process 27 is at the receiving end of authority relations, its data must conform to that of hybrid process 25. Supervisor process 21 sequences activation of processes 23-27 so that authoritative data is available to conforming processes as the latter are activated.
Management module 30 is essentially identical to management module 20; management modules 20 and 30 are two instances of the same program. Authority relations are established among processes 33, 35, and 37. Authoritative process 33 has top-level authority in that its data is authoritative and is not conformed to any other data of module 30. Hybrid process 35 is subordinate to authoritative process 33 but is authoritative with respect to conforming process 37. Hybrid process 35 is subordinate in that its data must conform or be conformed to the data of authoritative process 33. Conforming process 37 is at the receiving end of authority relations: its data must conform to that of hybrid process 35; process 37 does not serve as an authoritative process for any other process. Supervisor process 31 sequences activation of processes 33-37 so that authoritative data is available to conforming processes as the latter are activated.
The invention provides for a variety of authority relationships other than and including those in the illustrated embodiments. One authoritative process can provide data to many conforming processes. One conforming process can receive data from many authoritative processes. One process can both provide and receive authoritative data. A pair of processes can both provide and receive authoritative data from each other.
Authority relationships can be fixed or variable in response to information obtained at runtime. Different rationale are available for determining which processes are authoritative. For example, because packets often flow ‘up’ the network stack, the system designer may know that a particular process receives a packet before other processes, and may infer that it is more likely to send its state info to the standby sooner; it has a higher probability of getting across the link before state from another process. Such a process is a good candidate for an authoritative process. Likewise, in some cases, a process may be operating at a higher priority than other processes, and is assured of finishing its activities, which include sending the state info to its peer, before other processes get to run.
In the illustrated embodiment, hybrid processes 25 and 35 are both sources and destinations for authoritative data. In alternative embodiments, each process is either a source or destination (or neither) of authoritative data; no process is both. In some embodiments, the authoritative data provided by an intermediate level to a lower-authority process is a subset of the data it obtained from a higher-authority process. On the other hand, the illustrated embodiments provide for a hybrid process that provides authoritative data that it did not receive as authoritative data.
As indicated very schematically in
During normal device operation, standby authoritative process 33 locally stores data elements E1-E3, standby hybrid process 35 locally stores data elements E4-E6, and standby conforming process 37 locally stores data elements E7-E9. During steady-state conditions, data elements E1-E7 would equal data elements D1-D7 respectively. However, during transient conditions, some of data elements E1-E7 equal their active counterparts, while others may equal predecessor values for those counterparts.
Upon failure of management module 20, there is no reliable way to determine which of data elements E1-E7 represents the most recent value for the corresponding data element D1-D7. Even if all data elements E1-E7 equaled their counterparts, the resulting state might be inconsistent. Thus some effort must be taken to ensure that management module 30 assumes a consistent state when it enters active mode.
Method ME1, flow charted in
In the simplest instance, a single authoritative process is selected as the source for data for all other processes. For example, if one process stores data that impacts all or most other processes, it could be a good candidate for an authoritative process. Also, a process that needed to be activated first, would be a good candidate for an authoritative process. More complex authoritative hierarchies can be established, such as that shown in
Once the management modules are set up, they operate at method segment M2. The active module responds to events, e.g., protocol-establishing messages, and forwards information required for the standby module to track the state of the active module. The forwarding can be of events from which data is generated or the data itself. Since there is some latency involved in the standby module tracking the active module, there can be some incoherence between modules as an event is being processed when a failure event is detected at method segment M3.
In response to detection of a failure of the active module, the supervisor process (e.g., 31) for the standby module (e.g., 30) begins to transition its module to active mode. Typically, processes are activated sequentially at method segment M4. At overlapping method segment M5, conforming processes, which tend to be activated later than authoritative processes, conform their data to authoritative processes.
Thus, as module 30 is transitioned to active mode, supervisor process 31 activates authoritative process 33 first. Then, as hybrid process 35 is activated, data E4 is conformed to data E1, and data E5 is conformed to data E2. Conforming can be implemented in a variety of ways: 1) the conforming process can actively retrieve the conforming data; 2) an authoritative process can impose its data on a conforming process; and 3) a third process, e.g., supervisor process 31, can be responsible for conforming. When conforming process 37 is activated, data E8 is conformed to data E2, and data E9 is conformed to data E6. Note that, because of the hierarchical authority relations, data element E9 can be conformed to data not represented in top-level authoritative process 33.
Once the activation of processes and data conforming is complete, former standby management module 30 becomes the active management module at method segment M6. As the active management module, it can begin reacting to external events and generating outgoing events at method segment M7. In the meantime, the failed management module can be addressed at method segment M8. Often, a failed module only requires rebooting. In other cases, the failed module may need a software update. In still other cases, the module may need to be physically replaced. In all of these cases, there is a reboot to a standby state. At that point, method ME1 returns to method segment M2; in this iteration, the active and standby roles are reversed.
While the illustrated device is a network switch, the invention can apply to other devices with state-tracking standby modules provided there are steady-state internal consistency rules for conforming data. In some cases, these rules can provide for reconstructing a state of the formerly active module; in other cases, the invention provides a useful albeit inexact copy of the state of the formerly active module. In some cases the inaccuracies may be unimportant—e.g., in view of the resilience of network protocols to communications errors. These and other variations upon and modifications to the illustrated embodiments are provided for by the present invention, the scope of which is defined by the following claims.