The disclosure relates generally to packet routing in Interior Gateway Protocol (IGP networks and. More particularly, to delay-based prevention .micro-loops during routing updates.
In a typical packet-switched network, such as an Internet Protocol (IP) network or Multi-Protocol Label Switching (MPLS)/Label Distribution Protocol (LDP) network, each network node, e.g. router or switch, computes a shortest path to each destination and uses the shortest path for routing in non-failure conditions. The routes or paths to each destination in the network are stored in a routing table. When a link or another node attached to an IP router fails, a routing table is computed based on the new network topology. Computation of the new routing table can take between hundreds of milliseconds to a few seconds. If the previous routing table is used until the new routing table is computed, packets going through the failed link/node would be lost, potentially causing service disruption.
Fast Rerouting (FRR) solutions provide protection against link or node failures by computing in advance alternate routes, also called backup routes, that can be activated instantly when a link or node failure occurs. Initially, the source router is using a primary path (PATH-1) to a destination. When a link/node fails, it switches to a backup path (PATH-B) while it re-computes routing tables. Traffic is routed over the backup path while the new routing table is computed thereby avoiding traffic loss. When the new table has been computed, the router switches to a new primary path (PATH-2).
One requirement for a viable backup path is that it should be “loop-free”, i.e., packets must be able to reach the destination without looping between nodes. Loop-free alternatives (LFAs) avoid traffic loss by sending traffic to a neighboring node that can forward the traffic to destination without looping back to source. Remote LFAs can be used if a LFA does not exist. RLFAs use LDP tunnels to tunnel traffic to a non-neighbor “release node” from which the traffic can be forwarded to the destination without looping back to the source. The Internet Engineering Task Force (IETF) describes high-level algorithms to compute Loop-Free Alternates (LFA, RFC5286) and Remote Loop-Free Alternates (RLFA, RFC7490, RFC 8102). Another technique called Topology-Independent LFA (TI-LFA, proposed IETF standard) is used in segment routing networks to provide loop-free backup paths. TI-LFA differs from RLFA by constraining the backup-path computation problem to require that, topologically, the backup path (PATH-B) must be the same as the primary path (PATH-2) after the router re-converges following a link/node failure.
When a router detects a failure of an incident link, it announces the failure by sending a link-state packet. The announcement propagates through the network and each router recomputes its routing tables based on the new network topology. It takes some time for the announcement of the link failure to propagate through the network and for the routers to converge on a common view of the new network topology. There will be differences in the amount of time needed for different routers to update their routing tables. Thus, during the routing transition period, different routers may be using inconsistent routing tables and micro-loops may occur as a result. For example, a micro-loop may occur if the source router updates its routing table and begins forwarding to a neighbor before the neighbor has had time to update its own routing tables. In this example, the neighbor will forward packets based on the previous routing table, which can cause a micro-loop. Although micro-loops may be transitory, these micro-loops can nevertheless cause network congestion of loss of packets.
Routers can avoid micro-loops in some circumstances by delaying copying of the routing updates from the control-plane to forwarding-plane to provide time for the network to converge on a common view of the new network topology. Version 9 of an Internet Draft published by IETF titled “Micro-loop prevention by introducing a local convergence delay”, describes the use of a convergence delay to prevent local transient micro-loops in case of a link failure. When the failure of an incident link is detected, copying of affected routes to the forwarding plane is delayed for a configured period of time. During this delay period, traffic is forwarded over a backup path that is guaranteed to be loop-free. The delay gives the rest of the network time to “catch up” with the link failure, so when the delay period expires and source router's forwarding tables are updated, the packets travel over new primary paths without looping.
The solution outlined by the IETF handles failure of one incident link of the source router. The solution requires that a link failure be detected by looking at the link-state packet that announces the failure of the link. The detailed logic for detection is not. specified and is left to the network implementation. The solution also mandates that the delay be aborted (and the risk of micro-loop re-introduced) if a subsequent event/convergence occurs before the delay period expires, even if that event is not a second link failure.
The present disclosure relates generally to micro-loop prevention following a link failure or node failure. A router generates graphs of the network topology before and after a link failure or node failure, and compares the graphs to identify changes in the network topology. Based on an analysis of the changes, the router determines whether to implement a delay, or to abort a delay already in force.
One aspect of the disclosure comprises methods implemented by a router in a communication network of making routing updates. The router determines an initial topology of a communication network prior to a failure of a network node or link in the communication network. The router detects the failure of the network node or link, determines a new topology of the communication network resulting from the failure, and computes a new routing table based on the new topology. The router further determines whether to delay activation of the new routing table based on a comparison of the new topology with the initial topology determined before the failure.
Another aspect of the disclosure comprises a router in a packet-switched communication network including a packet forwarding unit and a routing unit. The packet forwarding unit is configured to receive packets over the packet-switched communication network and to forward the received packets toward a respective destination. The routing unit is configured to determine an initial topology of a communication network prior to a failure of a network node or link in the communication network. The routing unit is further configured to detect the failure of the network node or link, determine a new topology of the communication network resulting from the failure, and compute a new routing table based on the new topology. The routing unit is further configured to determine whether to delay activation of the new routing table based on a comparison of the new topology with the initial topology determined before the failure.
Another aspect of the disclosure comprises a computer program product for a router in a packet-switched communication network. The computer program product comprises program instructions that when executed causes the router to determine an initial topology of a communication network prior to a failure of a network node or link in the communication network. The computer program further includes instructions that when executed cause the router to detect the failure of the network node or link, determine a new topology of the communication network resulting from the failure, and compute a new routing table based on the new topology. The computer program further includes instructions that when executed cause the router to determine whether to delay activation of the new routing table based on a comparison of the new topology with the initial topology determined before the failure The computer program product may be in a carrier, such as a non-transitory computer readable medium.
Embodiments of the present disclosure provide greater flexibility to network operators in handling events and allows network operators to implement their own policies in deciding whether to implement a delay or to abort a delay already in progress. Generally, aborting a delay is not desirable because it may lead to micro-loops. The techniques herein described enable network operators to apply a delay even though current practices would mandate that the delay be aborted.
Referring now to the drawings,
A problem arises when S updates its routing tables while C is still using a routing table based on the “old” network topology. In his case, C will forward packets addressed to P3 back to S, resulting in a micro-loop. To avoid such micro-loops, S can implement a pre-configured delay before installing the routing updates to the forwarding plane to give C time to update its own routing tables. During the delay period, S can forward packets for P3 using a loop-free backup path. In the case, a loop-free backup path exists, so S forwards packets to A, from which packets will be forwarded to P3 without looping.
Embodiments of the present disclosure provide a more flexible approach by allowing the router to consider the impact of other events in determining whether to abort the delay. In the example shown in
According to an aspect of the present disclosure, the router 15 generates graphs of the network topology before and after the failure of a link or node and compares the graphs to identify changes in the network topology. Based on an analysis of the changes in network topology, the router 15 determines whether to implement and/or abort a delay for making routing updates.
In one embodiment, the implementation of a delay is handled as follows by a router 15 implementing an Interior Gateway Protocol (IGP) such as the Intermediate System-to-Intermediate System (ISIS) protocol or Open Shortest Path First (OSPF) protocol. The router 15 parses link-state packets in the link-state database, and constructs an initial network topology in the form of a graph data structure. Pseudo-nodes are replaced with the actual connectivity relationships they imply. The vertexes of the graph represent routers 15, and are stored as a dynamically allocated array denoted herein as Netnodes. Each element of the Netnodes embeds another array, which represents the incident links of a router 15 in the form of neighbors at the end of those links. This array is denoted herein as Nodenbrs. For example, Netnodes[i].Nodenbrs[j] contains information about the j'th neighbor of i'th router 15. The Netnodes array and all of the Nodenbrs arrays are sorted by router-id, effectively indexing the arrays by router-id. The graph of the initial network topology is stored in a variable called CurrentTopo.
In the example shown in
CurrentTopo.Netnodes={S, A, B C, D}
CurrentTopo.Nodebrs={{A, B, C}, {S, D}, {S, D], {S, D}, {A, B, C}}
Note that the members Nodebrs correspond to members of Netnodes in the same order. Thus, CurrentTopo.Nodebrs[D]={A, B, C}.
When a SPF calculation is triggered (e.g. responsive to a link or node failure), the router 15 generates a new topology graph as described above describing the new network topology. This new network topology is stored in a graph called NewTopo.
In this example shown in
NewTopo.Netnodes={S, A, B C, D}
NewTopo.Nodebrs={{A, C}, {S, D}, {S, D], {S, D}, {A, B, C}}
The router 15 compares the graphs stored by CurrentTopo and NewTopo and makes a list of the differences. The differences are computed in terms of the events that have transformed CurrentTopo to NewTopo. These events are link-down events and link-up events. Node-down events and Node-up events are also reduced to link-down and link-up events. The router enumerates the differences between CurrentTopo and NewTopo and stores them in a variable called TopoDiffs. TopoDiffs contains two arrays referred to as the DownLinks array and UpLinks array. The DownLinks array comprises a list of “downlinks” that are present in the initial network topology but not present in the new network topology. The UpLinks array comprises a list of “uplinks” that are not present in the initial network topology but are present in the new network topology. DownLinks and UpLinks are initially empty.
The router 15 compares the NetNode arrays in CurrentTopo and NewTopo. If a node is present in CurrentTopo but not in NewTopo, all of its links are added to TopoDiff.DownLinks. If a node is present in NewTopo but not in CurrentTopo, all of its links are to TopoDiffUpLinks. If a node is present in both, the router 15 compares the Nodenbrs arrays associated with the node in CurrentTopo and NewTopo. If a link is present in CurrentTopo. Nodenbrs but not in NewTopo. Nodenbrs, the router 15 adds it to TopoDiff.DownLinks. If the opposite, the router 15 adds it to TopoDiff. UpLinks.
In the example shown in
TopoDiffs.Downlinks={S→B}
TopoDiffs.Uplinks={ }
In some embodiments, the process of enumerating events maybe “short-circuited” for performance reasons to terminate as soon as a certain event (i.e., a second link failure) is detected, or it may be allowed to run to completion to collect a full set of events.
After the changes in network topology are enumerated and stored in TopoDiffs, the router sets CurrentTopo equal to NewTopo. At this point, the router 15 determines whether to implement a pre-configured delay, and or abort a delay already in progress. To make this decision, the router 15 looks at the events in TopoDiffs, which includes a complete list of all events that have occurred. Depending on the enumerated events, the router 15 decides to whether to implement a delay (e.g., start a delay timer), or to abort an ongoing delay (e.g., stop a delay timer).
In this example, the length of TopoDiff.DownLinks is 1, and points to a link incident to router 15 that is protected by a backup path. In this case, the router 15 may decide to apply the delay timer to the update of P3's routing entry. Optionally, the router 15 may also consider TopoDiff. Uplinks and implement the delay optionally only if:
1) TopoDiff. Uplinks is not Null; or
2) if no link incident to router is in TopoDiff. UpLinks.
As another example, the router 15 may decide to abort a delay timer currently in force if:
1) TopoDiff. UpLinks is not Null, or
2) any incident link to router 15 is in TopoDiff. UpLinks.
To further explain the advantages of using topology changes, assume that the topology changes shown in
NewTopo.Netnodes={S, A, B C, D, E}
NewTopo.Nodebrs={{A, C}, {S, D}, {S, D}, {S, D, E}, {A, C, {C}}
TopoDiffs.Downlinks={S→B, B→D}
TopoDiffs.Uplinks={C→E}
Noting that two links have failed, the router S may decide to abort the delay, preferring to take the risk of micro-loops. Alternatively, the router 15 may opt to perform a deeper analysis of the topology changes and decide that the topology changes do not cause any additional complications. In this case, the additional downlink on the path from S to D does not cause any complications because B is not on the path to P# for any of the routers 15. Also, the link C-E not being incident to S does not create a new complication. Thus, the router 15 may choose to implement a delay or continue a delay currently in force to avoid micro-loops caused by the failure of the link S-B.
Memory 930 stores routing tables 940 used by the packet forwarding unit 910 to receive and forward packets and computer programs 950 to configure the packet forwarding unit 910 and routing unit 920. The computer programs 950 may comprise a packet forwarding program executed by the packet forwarding unit 910 to receive and forward packets, and a routing program executed by the routing unit 920 to compute the backup paths as herein described.
Embodiments of the present disclosure provide greater flexibility to network operators in handling events and allows network operators to implement their own policies in deciding whether to implement a delay or to abort a delay already in progress. Generally, aborting a delay is not desirable because it may lead to micro-loops. The techniques herein described enable network operators to apply a delay even though current practices would mandate that the delay be aborted.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IN2018/050735 | 11/13/2018 | WO | 00 |