The present disclosure is generally related to network communications, and specifically to various systems and methods for Border Gateway Protocol (BGP)-Shortest Path First (SPF) Flooding Reduction.
The Border Gateway Protocol (BGP) is an inter-Autonomous System (AS) routing protocol that facilitates the exchange of routing information between Autonomous Systems (ASes) for enabling the routing of data packets between ASes. An AS is a set of routers or a connected group of Internet Protocol (IP) networks managed by a single administrative entity such as an Internet service provider (ISP), a large enterprise technology company, a university, or a government agency. In particular, an AS is a collection of one or more associated IP prefixes, referred to as an IP address space of the AS, with a clearly defined routing policy that governs how the AS exchanges routing information with other ASes.
Each AS uses BGP to announce which IP addresses they are responsible for and which other ASes they connect to. BGP routers take all this information from ASes around the world and put it into databases called routing tables to determine the fastest paths from AS to AS. Data packets cross the Internet by hopping from AS to AS until they reach the AS that contains the destination IP address specified by the data packets. For instance, when a data packet arrives at an AS, BGP routers refer to their routing tables to determine which AS the packet should go to next. Routers within the AS that contains the destination IP address send the data packet to a network device corresponding to the destination IP address.
With so many ASes in the world, BGP routers are constantly updating their routing tables. As networks go offline, new networks come online, and ASes expand or contract their IP address space. All of the updated information has to be announced via BGP so that BGP routers can adjust their routing tables.
A first aspect relates to a method of reducing flooding in the BGP-SPF domain implemented by a network node in a Border Gateway Protocol-Shortest Path First (BGP-SPF) domain. The method includes establishing an external BGP (EBGP) session with a set of route-reflectors (RRs) of the BGP-SPF domain for exchanging routing; determining a link change corresponding to a link of the network node; and sending a BGP Link-State SPF (BGP-LS-SPF) Link Network Layer Reachability Information (NLRI) indicating the link change in a BGP update message over the eBGP session to a subset of the set of RRs according to a flooding behavior that determines which RRs are in the subset of the set of RRs.
Optionally, in a first implementation according to any of the first aspect, the method includes encoding the information indicating the link change in the BGP-LS-SPF Link NLRI; encoding the BGP update message comprising the BGP-LS-SPF Link NLRI; and communicating the BGP update message to the subset of the set of RRs.
Optionally, in a second implementation according to any of the first aspect or any implementation thereof, the method further includes receiving a flooding behavior instructions indicating a flooding behavior that determines which RRs are in the subset of the set of RRs; and configuring the flooding behavior on the network node.
Optionally, in a third implementation according to any of the first aspect or any implementation thereof, the method further includes receiving the flooding behavior instructions from a RR in the set of RRs, wherein the RR is a leader RR in the BGP-SPF domain.
Optionally, in a fourth implementation according to any of the first aspect or any implementation thereof, the method further includes receiving the flooding behavior instructions encoded in a Node Flood Type-Length-Value (TLV); and decoding the Node Flood TLV to determine the flooding behavior.
Optionally, in a fifth implementation according to any of the first aspect or any implementation thereof, the method further includes assigning the network node to a group network nodes in the BGP-SPF domain based on the flooding behavior instructions; and communicating the information indicating the link change to the subset of the set of RRs designated for the group.
A second aspect relates to a method of reducing flooding in the BGP-SPF domain implemented by a RR in a BGP-SPF domain. The method includes establishing an external BGP (EBGP) session with network nodes of the BGP-SPF domain for exchanging routing; configuring a flooding behavior for the network nodes; sending a Node Flood Type-Length-Value (TLV) indicating the flooding behavior in a BGP update message to the network node.
Optionally, in a first implementation according to any of the second aspect, the method includes encoding the flooding behavior in the Node Flood TLV; encoding the BGP update message comprising the Node Flood TLV; and communicating the BGP update message to the network nodes.
Optionally, in a second implementation according to any of the second aspect or any implementation thereof, the method further includes communicating a priority of the RR to become a leader of the BGP-SPF routing domain.
Optionally, in a third implementation according to any of the second aspect or any implementation thereof, the method further includes encoding a priority of the RR to become a leader of the BGP-SPF routing domain in a Leader Priority TLV; encoding a BGP update message comprising the Leader Priority TLV; and communicating the BGP update message to the network nodes and other RRs of the BGP-SPF routing domain.
Optionally, in a fourth implementation according to any of the second aspect or any implementation thereof, the method further includes receiving priorities of the other RRs of the BGP-SPF routing domain to become a leader; determining that the priority of the RR is a highest priority to become a leader of the BGP-SPF routing domain based on the priorities of the other RRs; and configuring the RR as the leader of the BGP-SPF routing domain.
Optionally, in a fifth implementation according to any of the second aspect or any implementation thereof, wherein the flooding behavior instructs the network nodes to send information indicating a link change to only particular RRs of the BGP-SPF routing domain.
A third aspect relates to a method of reducing flooding in a BGP-SPF domain implemented by a network node in the BGP-SPF domain. The method includes obtaining a flooding topology (FT) of the BGP-SPF domain, wherein the FT is a sub-network topology that connects all nodes of a real network topology (RT) of the BGP-SPF domain; determining a link change corresponding to a link of the network node; and sending Network Layer Reachability Information (NLRI) in a BGP update message indicating the link change to network nodes that are directly connected to the network node on the FT.
Optionally, in a first implementation according to any of the third aspect, the method further includes obtaining the FT from a leader node of the node connections BGP-SPF domain.
Optionally, in a second implementation according to any of the third aspect or any implementation thereof, the method further includes receiving a node index mapping from the leader node; and decoding an encoding of the FT using the node index mapping to obtain the FT.
Optionally, in a third implementation according to any of the third aspect or any implementation thereof, the method further includes receiving updates to the FT from the leader node; and modifying the FT based on the updates.
Optionally, in a fourth implementation according to any of the third aspect or any implementation thereof, wherein the updates comprise new connections encoded in a Paths TLV that is encoded in a Multiprotocol Reachable Link Network Layer Reachability Information (MP_REACH_NLRI) path attribute in a BGP update message.
Optionally, in a fifth implementation according to any of the third aspect or any implementation thereof, wherein the updates comprise removed connections encoded in a Paths TLV that is encoded in a Multiprotocol UnReachable Link Network Layer Reachability Information (MP_REACH_NLRI) path attribute in a BGP update message.
A fourth aspect relates to a method for computing a flooding topology (FT) implemented by a network device. The method includes selecting a node R0 from a network; initializing the FT with a node element for the node R0, wherein the node element comprises a node, a number of node connections (D), a previous hops (PHs) list; initializing a candidate queue (Cq) comprising node elements for each node directly connected to the node R0 on the network; implementing a first loop of the method comprising: removing the node element of a first node from the Cq and appending the node element to the FT, wherein the first node has a D less than a maximum number of connections (MaxD); determining whether the FT includes all nodes in the network; identifying a set of nodes connected to the first node in the network that are not in the FT when the FT does not include all the nodes in the network, and appending nodes in the set of nodes that are not in Cq to the Cq, and appending the first node to a previous hops (PHs) list of the node element of nodes in the set of nodes that are in Cq, and repeating the first loop; and terminating the first loop when the FT includes all the nodes in the network; and adding a link to any node in the FT that has the D equal to one (1).
Optionally, in a first implementation according to any of the fourth aspect, wherein the first loop of the method determines that FT does not include all nodes in the network when the Cq is not empty, and determines that the FT includes all nodes in the network when the Cq is empty.
Optionally, in a second implementation according to any of the fourth aspect or any implementation thereof, wherein adding the link to any node in the FT that has the D equal to one (1) in the FT comprises implementing a second loop, wherein the second loop includes: identifying a single link node in the FT, wherein the single link node has the D equal to one (1) in the FT; terminating the second loop when there is no single link node in the FT; otherwise, identifying a set of links connected to the single link node in the network, wherein the set of links excludes an existing link of the single link node on the FT; identifying a set of remote nodes connected to the set of links; identifying a set of transit capable remote nodes in the set of remote nodes that can support transit; identifying a second link in the set of links connected to a transit capable remote node in the set of transit capable remote nodes that has a minimum D and a minimum node identifier (ID); identify the second link attached in the set of links connected to a remote node in the set of remote nodes that has a minimum D and a minimum node ID when there is no transit capable remote node in the set of transit capable remote nodes; adding the second link into the FT; increasing the D of the single link node in the FT by one; increasing the D of the transit capable remote node or the remote node, when there is no transit capable remote node, in the FT by one; and repeating the second loop.
Optionally, in a third implementation according to any of the fourth aspect or any implementation thereof, wherein the node R0 has a lowest node identifier (ID) in the network.
Optionally, in a fourth implementation according to any of the fourth aspect or any implementation thereof, wherein the Cq is initialized with nodes ordered from lowest node ID to highest node ID.
Optionally, in a fifth implementation according to any of the fourth aspect or any implementation thereof, wherein nodes appended to the Cq are ordered from lowest node ID to highest node ID.
A fifth aspect relates to network node that includes at least a processor and a memory storing instructions, wherein the instructions when executed by the processor causes the network node to perform the method according to any of the preceding aspects or implementation thereof.
A sixth aspect relates to computer program product comprising computer-executable instructions that are stored on a non-transitory computer-readable medium and that, when executed by a processor of a device, cause the device to perform the method according to any of the preceding aspects or implementation thereof.
For the purpose of clarity, any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
These and other features, and the advantages thereof, will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems, computer program product, and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
Disclosed herein are various embodiments for reducing flooding in a BGP-SPF routing domain. Flooding refers to the process of communicating, also commonly referred to as advertising, link state or routing information within a domain to ensure that all routers within the domain converge on the same network topological information within a finite period of time. A network topology is the physical and logical arrangement of nodes and connections or links in a network. A link is a communication channel that connects two devices for the purpose of data transmission. A BGP-SPF routing domain, or BGP-SPF domain, is a set of BGP-SPF nodes that are under a single administrative domain and exchange link-state information (also referred to as routing information) using a BGP Link-State SPF (BGP-LS-SPF) Link Network Layer Reachability Information (NLRI). A BGP-LS-SPF Link NLRI is a Link NLRI that uses the Subsequent Address Family Identifiers (SAFI) assigned to BGP-LS-SPF. A Link NLRI is an encoding format for providing network layer reachability information such as information that describes links, nodes, and prefixes comprising link-state information. The BGP-LS-SPF Link NLRI is used for determining routing paths in a BGP-SPF routing domain.
A BGP-SPF node is a network device (e.g., a router) that implements BGP-SPF. BGP-SPF is an extension to BGP that leverages BGP Link-State (BGP-LS). BGP-LS is an extension to BGP defined in Internet Engineering Task Force (IETF) document Request for Comment (RFC) 7752 entitled “North-Bound Distribution of Link-State and Traffic Engineering (TE) Information Using BGP” by H. Gredler, et al., published March 2016. BGP-LS is used to share interior gateway protocol (IGP) link-state network topology information of an AS with external components (e.g., external servers) using BGP. For example, as previously stated, BGP-SPF uses BGP-LS-SPF Link NLRI to exchange routing information. The BGP-LS-SPF Link NLRI reuses the LS NLRI format and an Address Family Identifier (AFI) of BGP-LS (AFI 16388). However, instead of using the SAFI assigned to BGP-LS (SAFI 71), BGP-SPF uses SAFI 80 assigned to BGP-LS-SPF to indicate that the Link NLRI is a BGP-LS-SPF Link NLRI used for BGP-SPF route computation. All the TLVs used in BGP-LS AFI are applicable and used for the BGP-LS-SPF SAFI. In addition, in contrast to BGP, BGP-SPF replaces the existing BGP route selection decision process based on a Path Vector algorithm as described in IETF document RFC 4271 entitled “A Border Gateway Protocol 4 (BGP-4)” by Y. Rekhter, et al., published January 2006 with an SPF-based route selection process based on an SPF algorithm such as the Dijkstra algorithm that computes the shortest path from a given node in a graph to every other node in the graph. Additional information on BGP-SPF routing is described in the IETF document entitled “BGP Link-State Shortest Path First (SPF) Routing” by K. Patel, et al., published Feb. 15, 2022 (draft-ietf-lsvr-bgp-spf-16).
An RR is a designated router configured to distribute/reflect routing information among BGP peers. BGP peers are BGP speakers configured to exchange routing information. For instance, a RR may receive routing information from a first BGP peer and reflect/distribute the routing information to other BGP peers. By using RRs, the RR peering model allows for fewer BGP peer sessions and, consequently, fewer instances of the same NLRI being received from multiple BGP peers. A BGP-SPF routing domain may include any number of BGP peers and RRs. For instance, a BGP-SPF routing domain may have multiple RRs in case one of the RRs malfunctions.
As a nonlimiting example, in
In the depicted embodiment, when there is a link change, a BGP-LS-SPF speaker sends/advertises a BGP-LS-SPF Link NLRI in a BGP update message to indicate the link change to the RR 110 and the RR 112. BGP update messages are messages that carry routing information and are used for exchanging routing information between BGP neighbors. A link change is any change that affects routing information. For example, a link change occurs when a new link is established/up or when a previously established link is withdrawn/down. After receiving the BGP-LS-SPF Link NLRI, the RR 110 and the RR 112 send the BGP-LS-SPF Link NLRI in a BGP update message to the other nodes that peer with the RR 110 and the RR 112. For example, assume that node 102 detects that the link 122 to node 106 is down. Node 102 sends a BGP-LS-SPF Link NLRI indicating the link change in a BGP update message to both the RR 110 and the RR 112 as indicated by the arrows. After receiving the BGP-LS-SPF Link NLRI from node 102, both RR 110 and RR 112 sends the BGP-LS-SPF Link NLRI to node 104 and node 106. Thus, both node 104 and node 106 receive two copies of the same BGP-LS-SPF Link NLRI, one from RR 110 and the other from RR 112. The second copy of the BGP-LS-SPF Link NLRI is not needed. Therefore, the existing flooding process for a BGP-SPF routing domain under the RR peering model is inefficient.
As a nonlimiting example, in
Similarly, when the EBGP session over the link 216 is established and the BGP-LS-SPF AFI/SAFI capability is exchanged for the corresponding session, node 208 considers link 216 up and sends the BGP-LS-SPF Link NLRI for the link 216 through the EBGP sessions of node 208 (i.e., the session between node 208 and node 206 over the link 220, and the session between node 208 and node 202 over the link 216) to nodes node 206 and node 202. For simplicity reasons, arrows for messages initiated by node 208 are not shown in
As shown above, in the BGP-SPF routing domain 200, each of the nodes receives several redundant BGP-LS-SPF Link NLRIs when there is a link change. Therefore, the existing flooding process for a node connections BGP-SPF routing domain is inefficient.
Similarly, in a BGP-SPF routing domain that implements a directly-connected nodes model (referred to herein as a directly-connected nodes BGP-SPF routing domain), where each node is directly-connected to all the other nodes (not illustrated), the sessions may be between loopback addresses (i.e., two-hop sessions). As a result, there will be a single EBGP session even when there are multiple direct connections between nodes. BGP-LS-SPF Link NLRI is advertised as long as an EBGP session has been established and the BGP-LS-SPF AFI/SAFI capability has been exchanged. In comparison to the node connections model, because there are EBGP sessions between every directly-connected node in the directly-connected nodes BGP-SPF routing domain, there is only a reduction in EBGP sessions when there are parallel links between nodes. Thus, the existing flooding process for a directly-connected nodes BGP-SPF routing domain is also inefficient.
In accordance with an embodiment, when a node determines that there is a link change, the node is configured to send a BGP-LS-SPF Link NLRI indicating the link change in a BGP update message to a subset (i.e., some, but not all) of the RRs that node peers with. After receiving the BGP-LS-SPF Link NLRI, the RR or controller sends the BGP-LS-SPF Link NLRI to the other nodes/BGP-LS-SPF speakers that peer with the RR or controller.
In some embodiments, the flooding behavior for each node is configured on the node. For example, in some embodiments, the number of RRs that the node sends the BGP-LS-SPF Link NLRI to may be user configurable (e.g., configured by a network administrator). Alternatively, in some embodiments, the node is configured to send the BGP-LS-SPF Link NLRI to half or some other percentage of the RRs that node peers with, or to just one RR that the node peers with. In some embodiments, the particular RRs that the node sends the BGP-LS-SPF Link NLRI to may be user configurable or may be selected at random. For example, the node may be configured to randomly select half of the RRs that the node peers with to send a BGP-LS-SPF Link NLRI indicating a link change. Still, in various embodiments, including embodiments described in
As a nonlimiting example, in
In this embodiment, every node in the routing domain 400 is configured to send a BGP-LS-SPF Link NLRI indicating a link change in a BGP update message to the same two RRs in the routing domain 400 for providing redundancy in case one of the RR fails. For example, node 402, node 404, and node 406 each are configured to send a BGP-LS-SPF Link NLRI indicating a link change in a BGP update message to RR 410 and RR 412. For instance, assume that node 402 discovers that the link 422 to node 406 is down. Node 402 sends a BGP-LS-SPF Link NLRI indicating the link change in a BGP update message to RR 410 and RR 412 as indicated by the arrow. After receiving the BGP-LS-SPF Link NLRI from node 402, RR 410 and RR 412 send the BGP-LS-SPF Link NLRI to node 404 and node 406. Thus, both node 404 and node 406 receive two copies of the BGP-LS-SPF Link NLRI, one as a redundancy backup. Even though the nodes receive a redundant copy of the BGP-LS-SPF Link NLRI in this embodiment, when compared to the existing flooding process described in
In this embodiment, the nodes in the routing domain 500 may be evenly divided, or as close to being evenly divided as possible, into a number of groups. The number of groups may be equal to the number of RRs in a routing domain (i.e., providing an optimum load-balancing approach where every group of nodes has about the same number of nodes as each of the other groups, the workload is balanced among the RRs in the routing domain). Alternatively, the number of groups may be less than the number of RRs in a routing domain (i.e., not every RR in the routing domain will be used to reflect the BGP-LS-SPF Link NLRI). The selection of which nodes belong to which group may be random or may be based on one or more factors (e.g., distance between a node and a RR, node identifiers (IDs), or other factors). For example, in one implementation, the nodes (supposing there are M nodes in total) are divided into N groups through ordering of the nodes by the node ID of the nodes in ascending order and grouping the nodes. Each of the N groups has M/N nodes. The first M/N nodes in the ordered nodes are in the first group; the M/N nodes according to their node IDs following the first group are in the second group; the M/N nodes according to their node IDs following the second group are in the third group; and so on. The nodes following the second to last group are in the Nth group (i.e., the last group). The nodes in each group are then configured to send a BGP-LS-SPF Link NLRI indicating a link change in a BGP update message to a particular RR assigned to the group when a link change occurs. Thus, a first group of nodes sends their BGP-LS-SPF Link NLRIs to a first RR; a second group of nodes sends their BGP-LS-SPF Link NLRIs to a second RR; and so on. After receiving a BGP-LS-SPF Link NLRI from a node, a RR sends the BGP-LS-SPF Link NLRI to the other nodes in the BGP-SPF routing domain.
For example, assume node 502, node 504, and node 506 are divided into two groups, with node 502 in a first group, and node 504 and node 506 in a second group. Node 502 in the first group is configured to send BGP-LS-SPF Link NLRIs to RR 510 when a link change occurs, and node 504 and node 506 in the second group are configured to send BGP-LS-SPF Link NLRIs to RR 512 when a link change occurs (as shown by the arrow in
In the event that a leader node fails (i.e., is not working correctly), a new leader is elected based on a priority of the node for becoming a leader as further described below. In an embodiment, in centralized mode, the new leader computes a new FT for the BGP-SPF routing domain 600 and advertises the new FT to every node in the BGP-SPF routing domain 600. Because this is a new FT, as opposed to an updated FT, the new leader also advertises the mappings between all the nodes and their indexes (as described in
Once each of the nodes in the BGP-SPF routing domain 600 obtains an FT of the BGP-SPF routing domain 600, either by computing the FT or by receiving the FT from another node, when a node detects a link change (e.g., a link is up or down), the node sends a BGP-LS-SPF Link NLRI in a BGP update message indicating the link change to peer nodes on the FT (i.e., a peer node is a node or nodes that are directly connected to the node by a link on the FT). A peer node that receives the BGP-LS-SPF Link NLRI in the BGP update message from the node is configured to send the BGP-LS-SPF Link NLRI in a BGP update message to other peer nodes of the peer node on the FT (i.e., exclude peer node(s) on FT that sent the BGP-LS-SPF Link NLRI to the node). For example, assume that node 602 determines that link 616 is up. In response to the link change, node 602 determines, according to the FT in
When node 608 receives the BGP-LS-SPF Link NLRI in the BGP update message from node 602, node 608 determines, according to the FT in
Similarly, when node 604 receives the BGP-LS-SPF Link NLRI in the BGP update message from node 602, node 604 determines, according to the FT in
Thus, node 606 receives two copies of the BGP-LS-SPF Link NLRI indicating the link change for link 616. One copy is redundant. Further, when node 606 receives the BGP-LS-SPF Link NLRI indicating the link change for link 616 from nodes 604 and 608, node 606 determines, according to the FT in
As is the case with the existing flooding process described in
As shown in
In the depicted embodiment, at step 702, the FT algorithm 700 includes instructions to initialize a maximum degree (MaxD) variable, and data structures for an FT and a candidate queue (CQ). A degree is a connection/link to a node on the FT. Thus, MaxD indicates the maximum number of connections that a node on the FT can have to other nodes on the FT. A reason for limiting the maximum number of connections that a node on the FT can have to other nodes on the FT is that the number of links on the FT is a key factor for reducing the amount of LS flooding. In general, the smaller the number of links, the less the amount of LS flooding. In an embodiment, MaxD is set to an initial value 3. In an embodiment, if during execution, the FT algorithm 700 determines that an FT cannot be constructed with all the nodes of a RT based on the value of MaxD, the FT algorithm 700 includes instructions to increase the value of MaxD by 1 and restart the FT algorithm 700 to rebuild the FT from scratch using the increased value of MaxD.
In an embodiment, the FT and CQ data structures comprise node elements of form (N, D, PHs), where N represents a Node, D is the degree of node N, and PHs contains the Previous Hops of node N. The FT data structure stores the above information for nodes on the FT. The CQ data structure contains potential nodes (i.e., candidate nodes) that can be added to the FT. In an embodiment, the FT algorithm 700 begins with an initial empty FT data structure FT={ } and a CQ={(R0, D=0, PHs={ })}. In an embodiment, R0 is a node in the RT with the smallest node ID. Alternatively, R0 may be selected based on some other criteria. In an alternative embodiment, the FT algorithm 700 can simply start with R0 as a root node (i.e., starting node) of the FT (e.g., FT={(R0, D=0, PHs={ })}) and an initial candidate queue CQ containing the node elements of the nodes directly connected to R0 (e.g., CQ={(R1, D=0, PHs={R0}), (R2, D=0, PHs={R0}), . . . , (Rm, D=0, PHs={R0})}, where each of the nodes R1 to Rm have degree D=0 and is directly connected to R0 as indicated by Previous Hops PHs={R0}. In an embodiment, R1 to Rm are in increasing order by node IDs. Alternatively, the order of R1 to Rm may be based on a different factor.
In an embodiment, the FT algorithm 700 includes instructions to implement a first loop comprising steps 704 to step 710 to add all nodes of the RT to the FT. At step 704, the FT algorithm 700 includes instructions to identify and remove the first node element with node N in CQ that is not on FT and one Previous Hop (PH) in PHs has a D less than MaxD. For instance, if FT={(R0, D=0, PHs={ })} and CQ={(R1, D=0, PHs={R0}), (R2, D=0, PHs={R0}), . . . , (Rm, D=0, PHs={R0}), then (R1, D=0, PHs={R0}) is removed from CQ because R1 is not on FT and R0's D=0, which is less than 3 (value of MaxD). As stated above, if no node element in CQ satisfies the above conditions of step 704, this means that an FT cannot be constructed with all the nodes of a RT using the value of MaxD, then the FT algorithm 700 includes instructions to increase the value of MaxD by 1 and restart the FT algorithm 700 to rebuild the FT from scratch using the increased value of MaxD (e.g., if MaxD is initially set to 3, then the first time no node element in CQ satisfies the above conditions, MaxD is increased to 4, and the FT algorithm 700 restarts construction of the FT from scratch to see if an FT can be constructed using the MaxD of 4, and if not, the FT algorithm 700 increases MaxD to 5, and restarts, and so on).
At step 706, the FT algorithm 700 includes instructions to add the first node element with node N into the FT, set D of node N=1, and increase D of PH of node N by 1. For example, adding the above node element (R0, D=0, PHs={R0}) that was removed from CQ to FT={(R0, D=0, PHs={ })} results in FT={(R0, 0, { }), (R1, 0, {R0})}. The D of node N, which is node element with R1 (or R1 node element for short) in this example, is set to 1, which results in FT={(R0, 0, { }), (R1, 1, {R0})}. Additionally, the D of PH of node N is increased by 1. The PH of R1 is R0 in this example. The D of R0 as shown above is 0. Therefore, the D of R0 is increased to 1, which results in FT={(R0, 1, { }), (R1, 1, {R0})}. Thus, the FT currently has two nodes R0 and R1, and there is a link between R0 and R1.
At step 708, the FT algorithm 700 includes instructions to determine whether all the nodes of the RT are on the FT. In an embodiment, the FT algorithm 700 determines that all the nodes of the RT are on the FT when CQ is empty and that all the nodes of the RT are not on the FT when the CQ is not empty.
When the FT algorithm 700 determines that all the nodes of the RT are not on the FT, at step 710, the FT algorithm 700 includes instructions to identify any node that are connected to node N and not on the FT (represented as node Xi, where (i=1, 2, . . . , n) in
If at step 708, the FT algorithm 700 determines that all nodes of a RT are on the FT (i.e., CQ is empty), the FT algorithm 700 terminates the first loop, and implements a second loop at step 712 to add a link to any node in the FT that has a D equal to one (1) in the FT. For example, in an embodiment, at step 712, the FT algorithm 700 includes instructions to perform a for-loop for each node on FT whose D is 1 (referred to as a node B in
With reference now to
The Leader Priority TLV 800 includes a Type field 802, a Length field 804, a Reserved field 806, and a Priority field 808. The Type field 802 is a 2-byte field that specifies a type value indicating that the TLV is a Leader Priority TLV. The type value is to be decided (TBD) and assigned by the Internet Assigned Numbers Authority (IANA). The Length field 804 is a 2-byte field that specifies a length value of 4 indicating the number of bytes that the Leader Priority TLV 800 uses after the Length field 804 (i.e., the size of the VALUE part of the Leader Priority TLV 800, which comprises the Reserved field 806 and the Priority field 808). The Reserved field 806 is a 3-byte field that is currently not used and is set to zero in transmission. The Reserved field 806 should be ignored on reception. The Priority field 808 is a 1-byte field that specifies a priority value that indicates a priority of the RR to become a leader RR of a BGP-SPF routing domain. In an embodiment, the priority value is an unsigned integer from 0 to 255 in one octet indicating the priority of the RR to become a leader RR. The leader RR is the RR with the highest priority to become a leader in the domain. In an embodiment, when there are more than one RRs having the same highest priority, the RR with the highest Node ID and the highest priority is the leader RR in the domain.
The Node Flood TLV 900 includes a Type field 902, a Length field 904, a Reserved field 906, and a Flood-behavior field 908. The Type field 902 is a 2-byte field that specifies a type value indicating that the TLV is a Node Flood TLV. The type value is TBD. The Length field 904 is a 2-byte field that specifies a length value of 4 indicating the number of bytes in the Node Flood TLV 900 after the Length field 904. The Reserved field 906 is a 3-byte field that is currently not used and is set to zero in transmission. The Reserved field 906 should be ignored on reception. The Flood-behavior field 908 is a 1-byte field that specifies a flooding behavior value that indicates for a particular flooding behavior for a node. The following flooding behavior values and corresponding flooding behavior are defined.
As discussed above, a leader RR may be elected for a BGP-SPF routing domain based on the priority of the RR for becoming a leader (e.g., as advertised using the Leader Priority TLV 800 in
In the depicted embodiment, the Leader Preference TLV 1000 includes a Type field 1002, a Length field 1004, a Reserved field 1006, a Priority field 1008, and an Algorithm field 1010. The Type field 1002 is a 2-byte field that specifies a type value indicating that the TLV is a Leader Preference TLV. The type value is TBD. The Length field 1004 is a 2-byte field that specifies a length value of 4 indicating the number of bytes in the Leader Preference TLV 1000 after the Length field 1004. The Reserved field 1006 is a 2-byte field that is currently not used and is set to zero in transmission. The Reserved field 1006 should be ignored on reception. The Priority field 1008 is a 1-byte field that specifies a priority value that indicates a priority of a BGP-SPF node to become a leader of a BGP-SPF routing domain. In an embodiment, the priority value is an unsigned integer from 0 to 255 in one byte indicating the priority of the BGP-SPF node to become a leader BGP-SPF node. In an embodiment, the leader is the BGP-SPF node with the highest priority specified in the Priority field 1008. In an embodiment, when there are more than one BGP-SPF nodes having the same highest priority, the BGP-SPF node with the highest Node ID and the highest priority is the leader. The Algorithm field 1010 is a 1-byte field that can specify whether the FT is computed by the leader (i.e., centralized mode), computed by each node (i.e., distributed mode), and if distributed, what algorithm to use to compute the FT. For example, in an embodiment, the Algorithm field 1010 may specify a numeric identifier in the range 0-254, where 0 indicates centralized FT computation by the leader, values 1-127 indicate distributed FT computation using a particular standardized distributed algorithm, and values 128-254 indicate distributed FT computation using a particular private distributed algorithm.
In the depicted embodiment, the Algorithm Support TLV 1100 includes a Type field 1102, a Length field 1104, an Algorithm field 1106, and an Algorithm field 1108. The Type field 1102 is a 2-byte field that specifies a type value indicating that the TLV is an Algorithm Support TLV. The type value is TBD. The Length field 1104 is a 2-byte field that specifies a length value indicating the number of bytes in the Algorithm Support TLV 1100 after the Length field 1104. The length value varies based on the number algorithms the BGP-SPF node supports for computing an FT. For example, if the BGP-SPF node only supports two algorithms for computing an FT as shown in
In the depicted embodiment, the Node IDs TLV 1200 includes a Type field 1202, a Length field 1204, a Reserved field 1206, a Last (L) field 1208, a Starting index field 1210, and Node ID fields 1212A-1212N, where A is the first node in the Node IDs TLV 1200 and N is the last node in the Node IDs TLV 1200. The Type field 1202 is a 2-byte field that specifies a type value indicating that the TLV is a Node IDs TLV. The type value is TBD. The Length field 1204 is a 2-byte field that specifies a length value indicating the number of bytes in the Node IDs TLV 1200 after the Length field 1204. The length value varies based on the number nodes in the Node IDs TLV. The Reserved field 1206 is a 17-bit field that is currently not used and is set to zero in transmission. The Reserved field 1206 should be ignored on reception. The L field 1208 is a 1-bit field following the Reserved field 1206. In an embodiment, the L field 1208 is set (i.e., set to 1) when the index of the last node ID specified in Node ID field 1212N in the Node IDs TLV 1200 is equal to the last index in the full list of node IDs for the BFP-SPF domain. In other words, when the L field 1208 is set, a node that receives the Node IDs TLV 1200 can initiate decoding the FT because the node has received the last of the node mapping information needed to decode the entire FT.
The Starting index field 1210 specifies the index of the first node ID specified in the Node ID field 1212A of the Node IDs TLV 1200. Node ID fields 1212A-1212N specify the BGP identifier, also referred to as the BGP Router-ID, of a node in the BGP-SPF domain.
The index of the other nodes listed in Node ID field 1212B (not shown) to Node ID field 1212N are not encoded in the Node IDs TLV 1200. Instead, to determine the index of the other nodes in the Node IDs TLV 1200, a node simply increments the value specified in Starting index field 1210 to obtain the index of the nodes listed in Node ID field 1212B to Node ID field 1212N. For example, if the value specified in Starting index field 1210 happens to be 100, then the index of the node specified in specified in the Node ID field 1212A is 100, the index of the node specified in the Node ID field 1212B is 101, the index of the node specified in the Node ID field 1212C (not shown) is 102, and so on.
When there are a number of single link paths (e.g., N single link paths), using one paths TLV to present them is more efficient than using N paths TLVs to present them, where each Paths TLV 1300 encodes a single link path. Using one TLV consumes 4+6*(N−1)+4=6*N+2 bytes. Using N TLVs occupies N*(4+4)=8*N bytes. The space used by the former is about ¾ (three quarters) of the space used by the latter when N is a big number such as 50.
In an embodiment, a node uses the CUF TLV 1400 to indicate that a connection/link is a part of the FT, is used for flooding, and a session over the connection/link has the node's (local) node ID and the remote node ID. The node sends the CUF TLV 1400 to every node in the BGP-SPF domain. A node that receives the CUF TLV 1400 can verify that the connection specified in the CUF TLV 1400 is on the FT to ensure that the FT stored and used by the receiving node is accurate and has not changed.
Additionally, as described above, in some embodiments, in communicating the information indicating the link change to the subset of the set of RRs, the network node may encode the information indicating the link change in a BGP-LS-SPF Link NLRI; encode a BGP update message comprising the BGP-LS-SPF Link NLRI; and communicate the BGP update message to the subset of the set of RRs. In some embodiments, the method 1500 may include receiving a flooding behavior instructions indicating the flooding behavior that determines which RRs are in the subset of the set of RRs; and configuring the flooding behavior on the network node. The method 1500 may also include receiving the flooding behavior instructions encoded in a Node Flood Type-Length-Value (TLV); and decoding the Node Flood TLV to determine the flooding behavior. The method 1500 may further include assigning the network node to a group network nodes in the BGP-SPF domain based on the flooding behavior instructions; and sending the BGP-LS-SPF Link NLRI indicating the link change to the subset of the set of RRs designated for the group.
Additionally, as described above, in some embodiments, the method 1600 may include communicating a priority of the RR to become a leader of the BGP-SPF routing domain. The method 1600 may also include encoding a priority of the RR to become a leader of the BGP-SPF routing domain in a Leader Priority TLV; encoding a BGP update message comprising the Leader Priority TLV; and communicating the BGP update message to the network nodes and other RRs of the BGP-SPF routing domain. The method 1600 may further include receiving priorities of the other RRs of the BGP-SPF routing domain to become a leader; determining that the priority of the RR is a highest priority to become a leader of the BGP-SPF routing domain based on the priorities of the other RRs; and configuring the RR as the leader of the BGP-SPF routing domain.
As described above, in some embodiments, the method 1700 may include obtaining the FT from a leader node of the BGP-SPF domain. The method 1700 may also include receiving a node index mapping from the leader node; and decoding an encoding of the FT using the node index mapping to obtain the FT. The method 1700 may further include receiving updates to the FT from the leader node; and modifying the FT based on the updates. The updates may include new connections encoded in a Paths TLV that is encoded in a Multiprotocol Reachable Link Network Layer Reachability Information (MP_REACH_NLRI) path attribute in a BGP update message. The updates may also include removed connections encoded in a Paths TLV that is encoded in a Multiprotocol UnReachable Link Network Layer Reachability Information (MP_REACH_NLRI) path attribute in a BGP update message.
The apparatus 1800 comprises receiver units (Rx) 1820 or receiving means for receiving data via ingress/input ports 1810; a processor 1830, logic unit, central processing unit (CPU) or other processing means to process instructions; transmitter units (TX) 1840 or transmitting means for transmitting via data egress/output ports 1850; and a memory 1860 or data storing means for storing the instructions and various data.
The processor 1830 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs). The processor 1830 is communicatively coupled via a system bus with the ingress ports 1810, RX 1820, TX 1840, egress ports 1850, and memory 1860. The processor 1830 can be configured to execute instructions stored in memory 1860. Thus, the processor 1830 provides a means for determining, creating, indicating, performing, providing, or any other action corresponding to the claims when the appropriate instruction is executed by the processor 1830.
The memory 1860 can be any type of memory or component capable of storing data and/or instructions. For example, the memory 1860 may be volatile and/or non-volatile memory such as read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM). The memory 1860 can also include one or more disks, tape drives, and solid-state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. In some embodiments, the memory 1860 can be memory that is integrated with the processor 1830.
In one embodiment, the memory 1860 stores a BGP-SPF Flooding Reduction module 1870. The BGP-SPF Flooding Reduction module 1870 includes data and executable instructions for implementing the disclosed embodiments. For instance, the BGP-SPF Flooding Reduction module 1870 can include instructions for implementing the BGP-SPF Flooding Reduction as described herein. The inclusion of the BGP-SPF Flooding Reduction module 1870 substantially improves the functionality of the apparatus 1800 by reducing the amount of link-state messages that the apparatus 1800 both receives and transmits, and thereby increasing the efficiency of the apparatus 1800 as well as the overall network.
The disclosed embodiments may be a system, an apparatus, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a non-transitory computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure. The computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.
This application is a continuation of International Patent Application No. PCT/US2022/045381 filed on Sep. 30, 2022, by Futurewei Technologies, Inc., and titled “Border Gateway Protocol (BGP)—Shortest Path First (SPF) Flooding Reduction,” which claims the benefit of U.S. Provisional Patent Application No. 63/250,738 filed on Sep. 30, 2021. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
63250738 | Sep 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2022/045381 | Sep 2022 | WO |
Child | 18615379 | US |