The present disclosure relates generally to detecting failed links in a network and, in response, rerouting network traffic using a data plane protocol.
A data network routes network traffic in the form of data packets to and from destination devices. The data network typically includes many network devices or nodes connected to each over network links. A network controller controls the nodes over a control plane using control plane Operations, Administration, and Maintenance (OAM) messages in order to reconfigure the network links in the event of link failures/breakages. If connectivity with the network controller is lost, the capability to respond to such link failures is also lost. Even if connectivity is not lost, control plane OAM messages used to reconfigure the nodes and network links often take an inordinate amount of time to travel between the network controller and any target node because the target node may be located deep inside a topology of the data network. This causes unacceptably long reconfiguration delays in the event of link failures.
Overview
Techniques are provided herein to reroute network traffic in a network responsive to one or more failures in the network using a data plane protocol. The network includes multiple routing arcs for loop-free routing of network traffic to a destination. Each routing arc comprising nodes connected in sequence by reversible links oriented away from a node initially holding a cursor toward one of first and second edge nodes and their corresponding edges at opposite ends of the arc and through which the network traffic exits the arc. Each node includes a network device. According to a first technique to handle a single failure in the arc, a first failure is detected in the arc. Responsive to the first failure detection, first management frames are exchanged between nodes within the arc over a data plane in order to transfer the cursor from the node initially holding the cursor to a first node proximate the first failure and reverse links in the arc as appropriate so that the network traffic in the arc is directed away from the first failure toward the first edge node of the arc through which the network traffic is able to exit the arc.
According to a second technique to handle a second failure in the arc, a second failure is detected in the arc after the first failure was detected. Responsive to the second failure detection, second management frames are exchanged between nodes in the arc to: freeze incoming edges incident to the arc to prevent network traffic in corresponding parent arcs from entering the arc; reverse each incoming edge so that network traffic originating in the arc and transiting the corresponding incoming node in the arc (that receives the corresponding incoming edge) will be directed toward the corresponding parent arc; and reverse links in the arc as appropriate so that all network traffic originating in the arc is directed to incoming edges.
According to a third technique, the first and second techniques are applied recursively across one or more of parent arcs if those arcs include failures therein until a reconfigured path is established for network traffic to exit the failed arcs on its way to a destination.
Example Embodiments
Referring first to
Arc Topology
With reference to
Once established, arc topology 200 guarantees that any network device 214 at any location on any routing arc 210 has at least two non-congruent paths for reaching the destination, guaranteeing reachability to the destination even if a link failure is encountered in the arc topology. In the ensuing description, a network device (e.g., network device 214) is also referred to herein as a “network node” or simply a “node” that is assigned an identifiable position within arc topology 200. Thus, arc topology 200 is also referred to more broadly as a network of network devices or nodes.
Nodes 214 receive and forward network traffic (e.g., data packets) via their data links in their respective arcs. Because nodes “forward” network traffic to each other toward the destination, they are said to operate in a “forwarding plane” of arc topology 200. The “forwarding plane” and the “data plane” are synonymous and may be used interchangeably. Each routing arc 210 also includes a cursor (labeled “CURSOR” in
Arc Configuration
With reference to
In the example of
Network traffic in routing arc 300 may exit the right end of the routing arc through edge node E and respective edge 302E or the left end of the arc through edge node A and respective edge 302A. One end of edge 302E is connected to edge node E, while the other end of edge 302E is connected to a node “a” that is normally a part of another arc in topology 200, such that edge 302E is said to be “incoming” to that other routing arc, and arc 300 is said to be a parent arc to the other routing arc. Similarly, one end of edge 302A is connected to edge node A, while the other end of edge 302A is connected to a node “f” that is normally a part of another routing arc in topology 200, such that edge 302A is said to be “incoming” to that other routing arc.
Safe nodes B and D respectively receive edge links 306B and 306D from parent routing arcs (not shown in
Arc 300 also includes a moveable or transferrable cursor 320 that is held by node C referred to as a cursor node and that provides exclusive control of directing the network traffic along the arc. One node of arc 300 has possession of arc cursor 320 at any given time. The node having possession of the arc cursor 320, e.g., node C, can control the network traffic along the arc based on possession of the arc cursor. In particular, the node holding the cursor directs network traffic away from itself along either of its outwardly-oriented (i.e., left pointing and right pointing) links toward the opposite ends of the arc (i.e., toward edge nodes A and E and respective edges 302A and 302E). In the example of
Arc Failures
Returning again to
Accordingly, techniques presented herein perform link repair and reroute operation in arc topology 200 using a strictly data plane protocol that operates within broken/failed arcs and arcs connected thereto, which is simpler and more efficient than use of the control plane protocol. Based on the data plane protocol, nodes within a broken arc exchange management frames in the data plane with each in order to reconfigure links and reroute network traffic in the broken arc to compensate for the breakage and thus provide continued routing of the network traffic to the destination. The use of such data plane management messages to perform repair and reroute operations in arc topology 200 instead of control plane management messages obviates the need for nodes 214 to communicate with network controller 220 in the event of link failures. Thus, the use of the data plane management messages is an efficient, autonomous approach to link repair and reroute operations that avoids the inefficiencies and delays associated with the use of the control plane protocol.
In an embodiment, the data plane management frames exchanged between nodes to detect and repair SRLG failures in arc topology 200 are referred to as Joint Operation and Automation of the Network (JOAN) frames. A JOAN frame is a data plane OAM frame that stays within the arcs 210 of arc topology 200. A JOAN frame may be emitted from any node in a given arc at any time, depending on a triggering event. The JOAN frame is sent from the node that emits the JOAN frame toward one of the edges of the arc, as seen in the example of
JOAN Frames
With reference to
There are several different types of JOAN frames exchanged between nodes in an arc each suited to a particular scenario, as briefly described below with reference to
Each type of JOAN frame is now described briefly, after which several example arc failure scenarios in which the different JOAN frames are used will be described.
JOAN1 frame—Cursor-Request JOAN frame: A node that detects a failure proximate itself, e.g., in one of its links, sends the JOAN1 frame toward an edge node to request possession of the cursor (i.e., to have the cursor transferred from the node initially holding the cursor in the arc to the requesting node). If the requesting node gains possession of the cursor, all network traffic will be directed away from that node and, thus, away from the failure location.
JOAN2 frame—Cursor-Grant JOAN frame: In response to receiving a JOAN1 Cursor-Request frame, an edge node through which network traffic can exit the arc (because the edge node and its connected edge are not broken) sends a JOAN2 Cursor-Grant frame back to the requesting node to indicate that the cursor may be transferred to that node.
JOAN3 frame—Cursor-Reject JOAN frame: This is used in a scenario where an arc experiences multiple spaced apart failures (breakages), such as a double failure scenario. In the double failure scenario, a first node proximate a first failure has already captured the cursor using a JOAN1-JOAN2 frame exchange. A second node proximate a second, subsequent failure sends another JOAN1 (Cursor-Request) frame to gain possession of the cursor. In response to the JOAN1 Cursor-Request frame, the first node sends a JOAN3 (Cursor-Reject) frame back to the second node because the cursor has already been transferred once due to the first failure and cannot be transferred again. On the way back to the second node, the JOAN3 (Cursor-Reject) frame freezes all incoming edges from parent arcs. Once frozen, the incoming edges cannot pass network traffic.
JOAN4 frame—Reverse-Incoming JOAN frame: This is also used in the double failure scenario. The JOAN4 (Reverse-Incoming) frame is sent from a node that both requested the cursor (using JOAN1) and that was rejected (using JOAN3). JOAN4 (Reverse-Incoming) frame reverses each incoming edge (link) from a parent arc, and reverse links for each node in the arc transited by the JOAN4 (Reverse-Incoming) frame beginning with the sending node up to a first of the incoming nodes (i.e., nodes that receive incoming edges from parent arcs) transited by the JOAN4 frame, so that network traffic originating in the arc will be directed to the incoming node (and be directed into the parent arc).
JOAN5 frame—Ping JOAN frame (Optional): The JOAN5 (Ping) frame is a failure-detect frame that is sent back and forth between the edge nodes at opposite ends of an arc. A failure in the arc is detected when one of the JOAN5 (Ping) frames sent from one of edge nodes to the other of the edge nodes is not received at the other of the edge nodes. The JOAN5 (Ping) frame is optional because other mechanisms for detecting failure in an arc may be used.
Network Device—Node
With reference to
The memory 608 may comprise read only memory (ROM), random access memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible (e.g., non-transitory) memory storage devices. Thus, in general, the memory 608 may comprise one or more computer readable storage media (e.g., a memory device) encoded with software comprising computer executable instructions and when the software is executed (by the processor 604) it is operable to perform the operations described herein. For example, the memory 608 stores or is encoded with instructions for Data Plane Repair Protocol logic 610 to perform failure detect and repair operations based on a data plane protocol (i.e., an exchange of data plane management messages) as described herein. In addition, memory 608 stores data plane management frame formats 612, e.g., JOAN frame formats, used by the Data Plane Repair Protocol logic 610 to construct and interpret data plane management frames to be sent from system 600 and received by system 600 over links 602a and 602b.
Single Failure in Arc
Turning now to
At 705 a first node proximate a first failure in the arc detects the first failure.
At 710, responsive to the failure detection, the nodes in the arc exchange first management frames, e.g., JOAN frames, over a data forwarding plane between nodes within the arc. The exchange of JOAN frames transfers the cursor from the node initially holding the cursor to the first node proximate the first failure and reverse links in the arc as appropriate so that the network traffic in the arc is directed away from the first failure toward the first edge node of the arc through which the network traffic is able to exit the arc. The exchange of first JOAN frames in the data plane in 710 results in repair/rerouting in the arc without any interaction with a network controller of the network, i.e., no OEM management frames are exchanged with the network controller (e.g., network controller 220).
At 805, first and second edge nodes of the arc periodically send Ping (i.e., failure-detect) JOAN frames back and forth between themselves.
At 810, a failure (e.g., the first failure) is detected in the arc when one of the failure-detect JOAN frames sent from one of the first and second edge nodes to the other of the first and second edge nodes is not received at the other of first and second edge nodes. The failure is manifested as a break in the arc that prevents the failure-detect JOAN frame from reaching the designated edge node.
At 905, the first node proximate the detected first failure (as referenced above in connection with operation 705) sends a Cursor-Request JOAN frame toward a first edge node in a direction that causes that JOAN frame to transit the node initially holding the cursor. In the example of
At 910, responsive to the Cursor-Request JOAN frame, the first edge node sends a Cursor-Grant JOAN frame back to the first node if the first edge node and its corresponding edge are able to pass network traffic, i.e., if network traffic is able to exit the arc through the first edge node and its edge. In the example of
Responsive to the Cursor-Grant JOAN frame transiting the node initially holding the cursor and next nodes thereafter (including the first node), the following operations 915, 920, and 925 occur, as described below.
At 915, the node initially holding the cursor transfers a token representative of the cursor to the Cursor-Grant JOAN frame (i.e., asserts the token in field 510 of the Cursor-Grant JOAN frame) as the Cursor-Grant JOAN frame transits the node initially holding the cursor. In the example of
At 920, the next nodes transited by the Cursor-Grant JOAN frame, after the node initially holding the cursor is transited, causes each of the next nodes to reverse their links so that network traffic in the arc is thereafter directed away from the first node toward the first edge node. Each next node reverses its links responsive to the presence of the token in the Cursor-Grant JOAN frame. In the example of
At 925, the first node accepts the token in the Cursor-Grant JOAN frame as the cursor when the Cursor-Grant JOAN frame is received at the first node. In the example of
As a result of the above operations, all links are reoriented away from the location of the failure. In the example of
Double Failure in Arc
Operations of method 1600 are described below also with reference to
It is assumed that the arc that experiences the second failure may include one or more incoming nodes each connected with a corresponding incoming edge of a corresponding parent arc. Under normal operation, i.e., in the absence of failures in the arc, the parent arc directs network traffic in that parent arc toward the corresponding incoming node.
At 1605, a second failure is detected proximate a second node in the arc that is spaced apart from the first failure proximate the first node. In the example of
Responsive to the second failure being detected, second JOAN frames (as opposed to the “first JOAN frames” exchanged responsive to the first failure in method 700) are exchanged between the nodes in the arc. In summary, the exchange of second JOAN frames will: freeze each incoming edge to prevent network traffic in the corresponding parent arc from entering the arc through the frozen incoming edge, reverse each incoming edge so that network traffic originating in the arc and transiting the corresponding incoming node will be directed toward the corresponding parent arc, and reverse links in the arc as appropriate so that all network traffic originating in the arc is directed to incoming edges. These and further effects will become apparent from this detailed description of method 1600.
At 1610, responsive to the second failure being detected, the second node sends a Cursor-Request JOAN frame toward the second edge node. In the example of
At 1615, responsive to the Cursor-Request JOAN frame, the first node proximate the first failure sends a Cursor-Reject JOAN frame toward the second node indicating the cursor cannot be transferred (because of the first failure proximate the second node) and to freeze each incoming edge in the corresponding incoming node transited by the Cursor-Reject JOAN frame. In the example of
If the Cursor-Reject JOAN frame emitted in 1615 results in a frozen incoming edge from a parent arc, the parent arc treats (designates) that frozen edge as a detected failure/breakage in the parent arc, which initiates a series of operations 1622, some of which are performed in the parent arc, that are executed concurrently with a next operation 1620 (following from operation 1615) that is performed in the arc in which the second failure was detected. In the example of
At 1620, responsive to the Cursor-Reject JOAN frame, the second node sends a Reverse-Incoming JOAN frame toward the first node to:
In the example of
As mentioned above, a frozen incoming edge is treated (designated) as a failure in the corresponding parent arc and initiates concurrent operations 1622 that are performed at least in part in the parent arc. Operations 1622 comprise operations 1625, 1630, and 1635, now described in detail.
At 1625, each reversed incoming edge (from a corresponding parent arc) is treated as a detected failure in the corresponding parent arc. Therefore, nodes in the parent arc exchange a second instance of the first JOAN frames in accordance with operations 710 described above to attempt to transfer the cursor to an edge node of the parent arc corresponding to the frozen, reversed incoming edge. Returning to the example of
At 1630, if the attempt to transfer the cursor in the parent arc is successful using the exchange of the second instance of the first JOAN frames, the heretofore frozen, reversed incoming edge is unfrozen. In the example of
As a result of the above operations, all network traffic originating in the arc with the two failures (e.g., arc 1000) is directed toward and into the parent arc (e.g., parent arc 1810) through the unfrozen, reversed incoming edge (e.g., incoming edge 1805). Once in the parent arc, the network traffic follows a path to the destination.
In one embodiment, unfreeze operation 1630 triggers additional JOAN frames that cause the first and second nodes proximate the first and second failures in the arc to gain possession of split or half-cursors to also achieve the redirection of network traffic away from the failures toward the parent arc incoming edge. In the example of
Returning to 1630 in
The exchange of second JOAN frames in the data plane in method 1600 and the recursive exchange of first and second JOAN frames in parent arcs also in the data plane result in repair/rerouting in the arcs of the network without any interaction with a network controller of the network, i.e., no OEM management frames are exchanged with the network controller.
Concurrent Single Failures in Multiple Arcs
The multiple failure locations become sinks that are unable to pass network traffic and emit JOAN frames, as described above.
As can be seen from the discussion above in connection with
In other words, the techniques provided herein may include limiting to a maximum time a time between successive instances of a given incoming edge of a given parent arc being frozen and reversed as are result of the recursively performing the methods 700 and 1600 in parent arcs. Similarly, techniques provided herein may also include limiting to a maximum number of times that a given incoming edge of a given arc is permitted to be repeatedly frozen and reversed as are result of the recursively performing the methods 700 and 1600.
In summary, in one form, a method is provided, comprising: in a routing arc of a network including multiple routing arcs for loop-free routing of network traffic to a destination, each routing arc comprising nodes connected in sequence by reversible links oriented away from a node initially holding a cursor toward one of first and second edge nodes and their corresponding edges at opposite ends of the arc and through which the network traffic exits the arc, each node including a network device: detecting a first failure in the arc; and responsive to the detecting the first failure, exchanging first management frames over a data plane in the network between nodes within the arc in order to transfer the cursor from the node initially holding the cursor to a first node proximate the first failure and reverse links in the arc as appropriate so that the network traffic in the arc is directed away from the first failure toward the first edge node of the arc through which the network traffic is able to exit the arc.
In another form, a system is provided, comprising: nodes in a routing arc of a network including multiple routing arcs for loop-free routing of network traffic to a destination network, the nodes connected in sequence by reversible links oriented away from a node initially holding a cursor toward one of first and second edge nodes and their corresponding edges at opposite ends of the arc and through which the network traffic exits the arc, wherein each node includes: an interface unit configured to implement corresponding reversible links; and a processor coupled to the network interface unit; and wherein the nodes are configured to: detect a first failure in the arc; and responsive to the first failure detection, exchange first management frames over a data forwarding plane in the network between nodes in the arc in order to transfer the cursor from the node initially holding the cursor to a first node proximate the first failure and reverse links in the arc as appropriate so that the network traffic in the arc is directed away from the first failure toward the first edge node of the arc through which the network traffic is able to exit the arc.
In still another form, an apparatus is provided, comprising: a node in a routing arc of a network including multiple routing arcs for routing of network traffic to a destination, each routing arc comprising nodes connected in sequence by reversible links oriented away from a node initially holding a cursor toward one of first and second edge nodes and their corresponding edges at opposite ends of the arc and through which the network traffic exits the arc, the node including an interface unit configured to implement corresponding links to communicate with adjacent nodes, including one or more reversible links, and a processor coupled to the network interface unit, wherein the processor is configured to: detect a failure in one of the links and responsive thereto send a Cursor-Request management frame over a data plane of the network to an end node within the arc to request possession of the cursor; receive a data plane Cursor-Grant management frame including a token representative of the cursor and, responsive thereto, accept the token as the cursor; and receive a data plane Cursor-Reject management frame indicating the cursor request is rejected and that freezes one or more incoming edges of parent arcs that are incoming to the arc as the Cursor-Reject management frame transits the arc.
In still another form, a tangible processor readable medium is provided for storing instructions that, when executed by a processor, cause the processor to: in a node in a routing arc of a network including multiple routing arcs for routing of network traffic to a destination, each routing arc comprising nodes connected in sequence by reversible links oriented away from a node initially holding a cursor toward one of first and second edge nodes and their corresponding edges at opposite ends of the arc and through which the network traffic exits the arc: detect a failure in one of the links and responsive thereto send a Cursor-Request management frame over a data plane of the network to an end node within the arc to request possession of the cursor; receive a data plane Cursor-Grant management frame including a token representative of the cursor and, responsive thereto, accept the token as the cursor; and receive a data plane Cursor-Reject management frame indicating the cursor request is rejected and that freezes one or more incoming edges of parent arcs that are incoming to the arc as the Cursor-Reject management frame transits the arc.
In still another form, a method comprises: in a node in a routing arc of a network including multiple routing arcs for routing of network traffic to a destination, each routing arc comprising nodes connected in sequence by reversible links oriented away from a node initially holding a cursor toward one of first and second edge nodes and their corresponding edges at opposite ends of the arc and through which the network traffic exits the arc: detecting a failure in one of the links and responsive thereto send a Cursor-Request management frame over a data plane of the network to an end node within the arc to request possession of the cursor; receiving a data plane Cursor-Grant management frame including a token representative of the cursor and, responsive thereto, accept the token as the cursor; and receiving a data plane Cursor-Reject management frame indicating the cursor request is rejected and that freezes one or more incoming edges of parent arcs that are incoming to the arc as the Cursor-Reject management frame transits the arc.
The method may further comprise, responsive to receipt of the Cursor-Request management frame: sending a data plane Cursor-Grant frame if network traffic is able to exit the arc through the node.
The method may further comprise, responsive to receipt of the Cursor-Request management frame: sending a data plane Cursor-Reject frame if network traffic is not able to exit the arc through the node.
The method may further comprise, responsive to receipt of the Cursor-Reject management frame: sending a data plane Reverse-Incoming JOAN frame to an end node within the arc so as to reverse a direction of one or more incoming edges incident to the arc so that traffic in the arc is able to exit the arc through the reversed incoming edges.
The method may further comprise: detecting the failure as a broken one of the reversible links.
The method may further comprise: detecting the failure as a frozen one of the links that is an incoming edge.
Although the apparatus, system, and method are illustrated and described herein as embodied in one or more specific examples, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made therein without departing from the scope of the apparatus, system, and method and within the scope and range of equivalents of the claims. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the apparatus, system, and method, as set forth in the following claims.
This application claims the benefit of U.S. provisional application No. 61/913,555, filed Dec. 9, 2013, incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
20120300668 | Thubert | Nov 2012 | A1 |
20130208594 | Thubert et al. | Aug 2013 | A1 |
20130301470 | Thubert et al. | Nov 2013 | A1 |
20140029416 | Ceccarellli | Jan 2014 | A1 |
Entry |
---|
International Search Report and Written Opinion in counterpart International Application No. PCT/US2014/068316, mailed Feb. 20, 2015, 13 pages. |
Thubert et al., “Available Routing Constructs—draft-thubert-rtgwg-arc-00”, Internet Engineering Task Force, Internet Draft, Standards Track, Oct. 2, 2012, 19 pages. |
Number | Date | Country | |
---|---|---|---|
20150163091 A1 | Jun 2015 | US |
Number | Date | Country | |
---|---|---|---|
61913555 | Dec 2013 | US |