BUILDING AN EFFICIENT EVPN VXLAN BROADCAST DOMAIN BASED ON WORKLOAD

Description

BACKGROUND
Field

Currently, replication of overlay traffic in Ethernet Virtual Private Network (EVPN) Virtual Extensible Local Area Network (VXLAN) is based on one of two models. In a first model (“ingress replication” model), an ingress virtual tunnel endpoint (VTEP) replicates a packet to each remote VTEP in the broadcast domain (BD) via unicast tunnels. However, when the ingress VTEP needs to support very high replication demands, a significant overhead may be incurred. In a second model (Underlay Multicast model), the ingress VTEP replicates a packet once to a multicast group, which replicates the packet to participating VTEPs via multicast tunnels. However, virtual network identifiers (VNIs) advertised by various VTEPs may be mapped to the same multicast group, even if a remote VTEP may not advertise a specific VNI mapped to the multicast group. As a result, a significant amount of underlay multicast tunneled traffic may get dropped when many VNIs share multicast groups.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates an environment depicting an overlay network connected in a mesh topology, in accordance with an aspect of the present application.

FIG. 2A depicts a table representing the ingress replication model for the environment of FIG. 1, in accordance with an aspect of the present application.

FIG. 2B depicts a table representing the underlay multicast model for the environment of FIG. 1, in accordance with an aspect of the present application.

FIG. 3 illustrates an exemplary format of an optional parameter used in a Border Gateway Protocol (BGP) message to indicate capacity and capability, in accordance with an aspect of the present application.

FIG. 4A depicts a table of the current replication count for the devices and networks of the environment of FIG. 1, in accordance with an aspect of the present application.

FIG. 4B depicts an algorithm for selecting the replication type for a broadcast domain, in accordance with an aspect of the present application.

FIG. 4C depicts a table of the efficient broadcast domain model using the algorithm of FIG. 4B on the environment of FIG. 1, in accordance with an aspect of the present application.

FIG. 5 illustrates an environment depicting an overlay network connected in a mesh topology, based on the efficient broadcast domain model represented by the table in FIG. 4C, in accordance with an aspect of the present application.

FIGS. 6A-C depict tables comparing the performance of the environment of FIG. 1 based on the ingress replication model, the underlay multicast model, and the efficient hybrid broadcast domain model, in accordance with an aspect of the present application.

FIG. 7A presents a flowchart illustrating a method for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application.

FIGS. 7B-7C present flowcharts illustrating a method for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application.

FIG. 8 illustrates a computer system for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application.

FIG. 9 illustrates a non-transitory computer-readable medium for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application.

In the figures, like reference numerals refer to the same figure elements.

DETAILED DESCRIPTION

Aspects of the instant application provide a method, computer system, and computer-readable medium which facilitate building an efficient EVPN VXLAN broadcast domain based on workload.

A Layer 2 overlay network can be implemented by encapsulating Layer 2 frames as payloads in Layer 3 packets, e.g., based on a Virtual Extensible Local Area Network (VXLAN) protocol. The Layer 3 packets can be communicated through a Layer 3 underlay network. By using a Layer 2 network which overlays a Layer 3 network, Layer 2 virtual networks (e.g., virtual local area networks (VLANs)) can span across the Layer 3 network, possibly across different physical domains (e.g., different data centers, different campuses, different geographic sites, etc.). The spanning of a VLAN across different physical domains can refer to stretching or extending the VLAN across the different physical domains.

Network devices (e.g., switches or other types of network devices) can be used in a Layer 2 overlay network for a virtual private network (VPN) over a set of tunnels with corresponding tunnel endpoints. A respective tunnel endpoint can deploy a VPN by mapping a respective client VLAN to a corresponding tunnel network identifier (TNI). If the tunnel is formed based on VXLAN, the TNI can be a virtual network identifier (VNI) of a VXLAN header, and a tunnel endpoint can be a VXLAN tunnel endpoint (VTEP).

A network device used in a Layer 2 overlay network for a VPN can include a data plane entity that performs VXLAN encapsulation and decapsulation. This type of data plane entity can be referred to as a VXLAN tunnel endpoint (VTEP). The VTEP can be part of the data plane of the underlay and overlay network used for forwarding of data by the network device. The network device can also include a control plane entity (which is part of the control plane of the underlay and overlay network) that exchanges control information with other network devices to enable forwarding of data by the network devices. In some examples, the control plane of the underlay and overlay network can operate based on the Ethernet Virtual Private Network (EVPN) technology.

An Ethernet VPN (EVPN) can be deployed as an overlay network over a set of interconnected networks (e.g., VXLANs). The expansion of EVPN VXLAN into campus networks can result in an increase in the number of broadcast domains and remote endpoints, which in turn can result in edge nodes (i.e., ingress nodes or switches) handling and replicating an increasing amount of traffic. Replication requirements for ingress nodes may be based on dynamic traffic conditions, i.e., broadcast domains may experience unpredictable high-density or low-density traffic.

As described above, replication of overlay traffic in EVPN VXLAN is currently based on one of two models. In a first model (ingress replication), an ingress VTEP replicates a packet to each remote VTEP in the broadcast domain via unicast tunnels. However, when the ingress VTEP needs to support very high replication demands, a significant overhead may be incurred. In a second model (underlay multicast), the ingress VTEP replicates a packet once to a multicast group (e.g., via a spine switch), which replicates the packet to participating VTEPs via multicast tunnels. However, virtual network identifiers (VNIs) advertised by various VTEPs may be mapped to the same multicast group, even if a remote VTEP may not advertise a specific VNI mapped to the multicast group (i.e., does not participate in the VNI flood tree). As a result, a significant amount of underlay multicast tunneled traffic may get “ingress discarded” or “in discarded” (i.e., dropped) when many VNIs share multicast groups. An exemplary EVPN and corresponding replication tables for the ingress replication model and the underlay multicast model are described below in relation to FIGS. 1, 2A, and 2B.

The described aspects of the application address the limitations of replication resources and the inefficiencies created by the dropped traffic of the two existing models, by creating an efficient broadcast domain based on the replication requirement. The described aspects provide a system which can use an EVPN attribute referred to as the “Replication Threshold Count” (RTC), which can indicate a replication capacity of a given network device. The system can dynamically classify the broadcast domain for the given network device and VNI for a virtual extensible local area network (VXLAN) into one of three replication types based on a comparison of the RTC and the current count of discovered endpoints by the given network device. The three replication types can include: the ingress replication type or model; the underlay multicast type or model; and a “hybrid replication” type or model. An algorithm for classifying and selecting the replication type is described below in relation to FIGS. 4A-4C.

Thus, in contrast to current replication models, in which only the first or the second type can be used as the replication type for a broadcast domain, the described aspects provide a third hybrid type which allows both the first and the second type to be used as the replication type for the broadcast domain. As described below in relation to FIGS. 6A-6C, using the third hybrid replication type for a broadcast domain can result in significant improvements in performance and efficiency. For example, in terms of replicating flood packets, the hybrid model can result in a performance gain of ˜200% over the ingress replication model. As another example, in terms of utilizing the underlay traffic and reducing the number of dropped packets, the hybrid model can result in a performance gain of ˜150% over the underlay multicast model.

The terms “ingress discard” or “in discard” are used interchangeably in this disclosure and refer to packets which are dropped, e.g., using the underlay multicast replication method, due to multiple VNIs advertised by various VTEPs being mapped to the same multicast group, but where not all of the VTEPs advertise all the mapped VNIs. These “ingress discards” or “in discards” are described below in relation to FIGS. 1 and 2B.

In this disclosure, the term “switch” is used in a generic sense and can refer to any standalone network device or fabric switch operating in any network layer. The term “switch” should not be interpreted as limiting examples of the present invention to layer-2 networks. Any device that can forward traffic to an external device or another switch can be referred to as a “switch.” Furthermore, if the switch facilitates communication between networks, the switch can be referred to as a gateway switch. Any physical or virtual device (e.g., a virtual machine or switch operating on a computing device) that can operate as a network device and forward traffic to an end device can be referred to as a “switch.” If the switch is a virtual device, the switch can be referred to as a virtual switch. Examples of a “switch” include, but are not limited to, a layer-2 switch, a layer-3 router, a routing switch, a component of a Gen-Z network, or a fabric switch comprising a plurality of similar or heterogeneous smaller physical and/or virtual switches.

The term “packet” refers to a group of bits that can be transported together across a network. The term “packet” should not be interpreted as limiting examples of the present invention to a particular layer of a network protocol stack. The term “packet” can be replaced by other terminologies referring to a group of bits, such as “message,” “frame,” “cell,” “datagram,” or “transaction.” Furthermore, the term “port” can refer to a port that can receive or transmit data. The term “port” can also refer to the hardware, software, and/or firmware logic that can facilitate the operations of that port.

Environment for Facilitating Building an Efficient Broadcast Domain in an Overlay Network

FIG. 1 illustrates an environment depicting an overlay network 100 connected in a mesh topology, in accordance with an aspect of the present application. Overlay network 100 can be part of an Ethernet network, InfiniBand network, or other network, and may use a corresponding communication protocol, such as Internet Protocol (IP), FibreChannel over Ethernet (FCOE), or other protocol. Overlay network 100 can include one or more network devices (e.g., switches) and may include heterogeneous network components (e.g., layer-2 and layer-3 hops and tunnels).

For example, network 100 can include five network devices (102, 104, 106, 108, and 110) and end devices (e.g., client devices or servers, not shown) coupled to the network devices. Network devices 102-110 can operate as tunnel endpoints (e.g., VTEPs in a VXLAN) in overlay network 100, and overlay network 100 can be a distributed tunnel fabric in which network devices 102-110 can be coupled to each other via tunnels. Examples of tunneling protocols can include, but are not limited to, VXLAN, Generic Routing Encapsulation (GRE), Network Virtualization using GRE (NVGRE), Generic Networking Virtualization Encapsulation (Geneve), Internet Protocol Security (IPsec), and Multiprotocol Label Switching (MPLS). A VPN, such as an EVPN, can be deployed as overlay network 100 over a set of tunnels or interconnected networks (e.g., VXLANs), as described herein. The tunnels in overlay network 100 can be formed over an underlay network (not shown), which can be a physical network, and a respective link of the underlay network can be a physical link. An index 130 can indicate various types of tunnels: possible unicast tunnels can be depicted with a light dashed line; multicast tunnels for a multicast group G1 can be depicted with a heavy solid line; and multicast tunnels for a multicast group G2 can be depicted with a heavy dashed line.

Each of network devices 102-110 can be indicated with the list of VLANs (or VNIs for VXLANs) advertised in the Inclusive Multicast Ethernet Tag (IMET) Type-3 Route (RT-3) of the Border Gateway Protocol (BGP) update messages transmitted between the nodes. For example: a VTEP1102 node can advertise VXLANs as indicated by (1,2,3,4); a VTEP2104 node can advertise VXLANs as indicated by (1,2); a VTEP3106 node can advertise VXLANs as indicated by (3,4,5); a VTEP4108 node can advertise VXLANs as indicated by (3,4,5); and a VTEP5110 node can advertise VXLANs as indicated by (3,4,5).

A node may receive a multicast data packet (e.g., from a source or end device, not shown) and replicate the packet by distributing the packet via either individual tunnels (using the ingress replication model) or via multicast tunnels (using the underlay multicast model).

FIG. 2A depicts a table 200 representing the ingress replication model for the environment of FIG. 1, in accordance with an aspect of the present application. Table 200 can include entries 220-228 with fields indicating the broadcast domain (broadcast domain 202) and the respective VXLANS (VXLAN1204, VXLAN2206, VXLAN3208, VXLAN4210, and VXLAN5212) advertised in the network. Each entry in table 200 can indicate the broadcast domain (BD) for each VTEP/VNI/VXLAN domain using the ingress replication model.

The notation “BDx(y)” can indicate the broadcast domain (i.e., the remote tunnel endpoints to which the network device sends replicated traffic) of VTEP x for VXLAN y. For example, entry 220 indicates the broadcast domain of VTEP1 for the respective VXLANs, where VTEP1102 advertises “(VXLANs 1,2,3,4)”: VXLAN1 has VTEP2104 (indicated as “V2”) as the remote tunnel endpoint; VXLAN2 has VTEP2104 (indicated as “V2”) as the remote tunnel endpoint; VXLAN3 has VTEP3106, VTEP4108, and VTEP5110 (indicated as “V345”) as the remote tunnel endpoints; VXLAN4 has VTEP3106, VTEP4108, and VTEP5110 (indicated as “V345”) as the remote tunnel endpoints; and VXLAN5 has no remote tunnel endpoints (indicated as “0”). Thus, when a flood (i.e., multicast traffic) occurs on VTEP1102 on VXLAN3, VTEP1102 can replicate the packet to VTEP3106, VTEP4108, and VTEP5110, represented by “BD1(3)=V345.” Unicast tunnels (indicated by light dashed lines in FIG. 1) can be created and added to each specific broadcast domain or replication group based on the advertised VXLANs. FIG. 1 does not indicate the exact unicast tunnels created based on the topology, and instead indicates a unicast tunnel from each ingress node to each of its peer nodes and assumes that each ingress node can be ingress replication-capable. However, in the ingress replication model, each ingress VTEP may need to support very high replication demands and may not have sufficient replication resources for handling the necessary replication requirements.

FIG. 2B depicts a table 240 representing the underlay multicast model for the environment of FIG. 1, in accordance with an aspect of the present application. Table 240 can include entries 260-268 with fields indicating the broadcast domain (broadcast domain 242) and the respective VXLANS (VXLAN1244, VXLAN2246, VXLAN3248, VXLAN4250, and VXLAN5252) advertised in the network. Each entry in table 240 can indicate the broadcast domain (BD) for each VTEP/VNI/VXLAN domain using the underlay multicast model. In the underlay multicast model, instead of sending a replicated packet directly to an individual remote tunnel endpoint via a unicast tunnel, the network device can send a replicated packet directly to a multicast group to which the VXLANs/VNIs have been previously mapped. The multicast group can operate based on a bidirectional multicast protocol, such as a Bi-directional Protocol Independent Multicast (PIM) or PIM-BDR protocol. The network device can maintain a set of multicast groups associated with the PIM-BIDR and each multicast group can be identified by a corresponding multicast address (e.g., a range of multicast IP addresses).

For example, using two multicast group IP addresses indicated by “G1” (120) and “G2” (122), the odd-numbered VXLANs (VXLAN1, VXLAN3, VXLAN5) can be mapped to G1 and the even-numbered VXLANs (VXLAN2, VXLAN4) can be mapped to G2. Thus, “BD1(3)=G1” can indicate that the broadcast domain on VTEP1102 for VXLAN3248 can be G1, as indicated in entry 260 for VXLAN3248 (because VXLAN3 is mapped to the multicast group IP address for G1) and can use a multicast tunnel 140 associated with the multicast group IP address for G1 (as indicated by the heavy bold line for “multicast tunnels G1”). VTEP1102 can thus join the PIM BIDIR tree for G1.

Similarly, BD(1), BD2(1), BD3(3), BD3(5), BD4(3), BD4(5), BD5(3), and BD5(5) also indicate that the corresponding broadcast domain is G1. Thus, when a flood occurs on, e.g., VXLAN5 at VTEP5110, the ingress VTEP5110 can send out a multicast VXLAN packet with the G1 multicast IP address (where VTEP1102 and VTEP2104 have both already joined the PIM BIDIR tree for G1). However, because VTEP1102 and VTEP2104 do not advertise VXLAN5, both VTEP1102 and VTEP2104 would drop or “ingress discard” the multicast VXLAN packet sent by the flood on VXLAN5 based on the G1 multicast IP address, since the associated VXLAN5 has not been created on VTEP1102 and VTEP2104. For example, as depicted by a multicast tunnel 144 from VTEP1102 to G2 and by a multicast tunnel 142 from VTEP2104 to G2 (as indicated by the heavy dashed line for “multicast tunnels G2”), VTEP1102 and VTEP2104, respectively, would receive the multicast VXLAN packet sent as a result of the flood on VXLAN5 and would discard the multicast VXLAN packet. This ingress discard limitation associated with the underlay multicast model can result in additional unnecessary traffic, which can reduce the efficiency and traffic utilization of the overall network.

Using Replication Threshold Count and Current Replication Count to Determine a Broadcast Domain Type

In current replication models, the unicast tunnels and the multicast tunnels depicted in network 100 in relation to FIGS. 2A and 2B cannot exist together in the same broadcast domain. The described aspects address the limitations of using only one of the ingress replication model and the underlay multicast model by creating a hybrid broadcast domain based on the replication requirement. The described aspects can use the “Replication Threshold Count” (RTC) and the “current replication count” to classify a broadcast domain into one of three types: the first type is the ingress replication type of broadcast domain; the second type is the underlay multicast type of broadcast domain; and the third type is a hybrid broadcast domain which can include both the first type and the second type in the same broadcast domain.

Replication Threshold Count

The Replication Threshold Count can be a global pre-defined count which represents the replication capacity of a given VTEP. For example, a low-capacity VTEP device which supports low-speed ports and low replication resources may set the RTC to a lower value while a high-capacity VTEP device which supports high-speed ports and high replication resources may set the RTC to a higher value. The RTC can be configured by an administrator upon creation of the VXLAN segment based on the capacity of the device or the device itself can derive the RTC based on the maximum supported capacities of the device. The RTC can be applicable to all the broadcast domains created in a given ingress VTEP.

If the RTC is not configured, the system can use a default value of, e.g., “2” when a given device is capable of both ingress replication and underlay multicast. This value of “2” can indicate that for at the most two discovered remote endpoints in the broadcast domain, the ingress VTEP can continue using the ingress replication model (i.e., set the BD type to the first type), while for three or more discovered remote endpoints in the broadcast domain, the ingress VTEP can use either the underlay multicast model or the hybrid model (i.e., set the BD type to the second type or the third type). The notation “RTCx=2” indicates that the RTC for VTEPx is set to 2.

Optional Parameter in BGP Open Message

Each VTEP can communicate its RTC to its peer VTEPs via the BGP Open message. The BGP Open message can include an optional parameter, which can be used for advertising capacity and capability. FIG. 3 illustrates an exemplary format of an optional parameter 300 used in a Border Gateway Protocol (BGP) message to indicate capacity and capability, in accordance with an aspect of the present application. Optional parameter 300 can include: a capability code 302 (comprising 1 octet), which can indicate a capability as the replication threshold; a capability length 304 (comprising 1 octet), which can indicate a size of the optional parameters; and a capability value 306 (comprising a variable size), which can indicate a value of the replication threshold.

By using the third type of broadcast domain (i.e., a hybrid BD which allows one or both of ingress replication and underlay multicast), the described aspects can be both self-balancing (as the BGP Open messages are updated upon VTEPS being established in a BGP session) and active (as the broadcast domain can be updated based on the advertised VNI membership in the EVPN IMET updates).

Current Replication Count

The current replication count can indicate the running or on-demand count of the number of VXLAN endpoints discovered by the EVPMN IMET RT-3 advertisement updates. The described aspects can leverage the EVPN table to determine the current replication count per a specific broadcast domain. The Replication Threshold Count may be global and static (for a given BGP session), while the current replication count (for a given VXLAN broadcast domain) can be dynamic since it is based on the discovered VTEPs via the dynamically sent advertisement updates. Furthermore, the current replication count may ignore any local directly attached ports and may indicate only the number of remote tunnel endpoints.

FIG. 4A depicts a table 400 of the current replication count for the devices and networks of the environment of FIG. 1, in accordance with an aspect of the present application. Table 400 can include entries 420-428 with fields indicating the broadcast domain (broadcast domain 402) and the respective VXLANS (VXLAN1404, VXLAN2406, VXLAN3408, VXLAN4410, and VXLAN5412) advertised in the network. Each entry in table 400 can indicate the current replication count for each VTEP/VNI/VXLAN domain. That is, each entry can correspond to the number of remote endpoints discovered corresponding to the environment of FIG. 1 using the ingress replication model of FIG. 2A.

The notation “Cx (y)” can indicate the current replication count on VTEPx for VLAN y. Thus, as shown in entry 424, C3(5)=2 can indicate that the current replication count on VTEP3106 for VXLAN5 is 2 (also indicated in entry 224 of table 200 as “V45” which represents the two VTEPs VTEP4108 and VTEP5110).

Selection of Broadcast Domain Type

As described above, the system can use the RTC and the current replication count to classify a broadcast domain (BD) into one of three types. In the first type of BD (ingress replication), all the participating VTEPs in the BD may have a low replication demand or are ingress replication-capable devices. In the second type of BD (underlay multicast), all the participating VTEPs in the BD may have a high replication demand or are underlay multicast replication-capable devices. In the third type of BD (hybrid), some of the participating VTEPs may have a low replication demand or do not support the underlay multicast replication model, while others of the participating VTEPs may have a high replication demand and do support the underlay multicast replication model.

FIG. 4B depicts an algorithm 430 for selecting the replication type for a broadcast domain, in accordance with an aspect of the present application. Algorithm 430 can include a section 432 which defines the terms used in subsequent sections of pseudocode 434, 436, and 438. In section 432, the following terms are defined:

- “RTCx”: Replication Threshold Count configured on VTEP x,
- “RTCx==0”: The given device at VTEP x is only ingress replication-capable,
- “Cx (y)”: Current replication count on VTEP x for VLAN y, and
- “T (BDx(y))”: Type of broadcast domain at VTEP x for VLAN y.

Section 434 indicates that, at a given VTEP x for VLAN y, if the current replication count is less than or equal to (i.e., not greater than) the RTC of the given VTEP or if the RTC of the given VTEP x has a value of “0,” then the type of the broadcast domain for the given VTEP x for VLAN y is set to the first type (ingress Replication BD). Section 436 indicates that, at the given VTEP x for VLAN y, if the current replication count is greater than the RTC of the given VTEP x, then the type of the broadcast domain for the given VTEP x for VLAN y is set to the second type (underlay multicast BD). Section 438 indicates that for all the BDs which are set to the second type, the system can iterate through all the remote endpoints (VTEPz) advertised in BDx(y) (i.e., the broadcast domain at VTEP x for VLAN y). If the remote VTEPz has a RTC with a value of “0,” this indicates that the remote VTEPz is ingress capable, as published in the EVPN table, which is updated based on the optional parameter of the capability advertisement in the BGP Open message, as described above in relation to FIG. 3. As a result, if “RTCz==0,” then the type of the broadcast domain for the given VTEP x for VLAN y is set to the third type (hybrid BD).

Creating the Efficient Broadcast Domain

After determining, setting, or selecting the broadcast domain using the algorithm described above in relation to FIG. 4B, the system can create the efficient broadcast domain based on the workload, i.e., the RTC and the current replication count.

For all broadcast domains categorized as “ingress replication BD,” the system can create these with unicast tunnels along with the physical ports of the VLAN, where the unicast tunnels are based on the discovered tunnel endpoints. In this model, the unicast tunnels can be added to the replication group or broadcast domain.

For all broadcast domains categorized as “underlay multicast BD,” the system can create these by adding the multicast/multipoint tunnels along with the physical ports of the BD VXLAN into the replication group. The device can send a PIM Join request to receive the flood traffic downstream and can send the multicast VXLAN tunnel packet to the multicast IP group address (e.g., via a Rendezvous Point (RP)).

For all broadcast domains categorized as “hybrid BD,” the system can iterate through the broadcast domain associated with the VXLAN to identify the remote endpoints which support “underlay multicast” and “ingress replication” models, e.g., by searching the EVPN table. The system can create the hybrid BD by adding the ingress replication-supported remote tunnels as unicast tunnels and by adding the underlay multicast-supported remote tunnels as multicast tunnels along with the physical local ports. As described above for the second type (i.e., all BDs categorized as “underlay multicast BD”), the device can send a PIM Join request to receive the flood traffic downstream and can send the multicast VXLAN tunnel packet to the multicast IP group address.

Continuing with the example of FIG. 1, assume that all five of the VTEPS depicted in network 100 (i.e., VTEP1102, VTEP2104, VTEP3106, VTEP4108, and VTEP5110) have an RTC configured or set to a default value of “2.” The resulting broadcast domains are depicted below in relation to FIG. 4C.

FIG. 4C depicts a table 470 of the efficient broadcast domain model using the algorithm of FIG. 4B on the environment of FIG. 1, in accordance with an aspect of the present application. Table 470 can include entries 490-498 with fields indicating the broadcast domain (broadcast domain 472) and the respective VXLANS (VXLAN1474, VXLAN2476, VXLAN3478, VXLAN4480, and VXLAN5482) advertised in the network. Each entry in table 470 can indicate the broadcast domain (BD) for each VTEP/VNI/VXLAN domain using the efficient broadcast domain model, i.e., based on the above-described selection method and algorithm of FIG. 4B.

As an example, BD5(5) is set to “V34” (VTEP3106 and VTEP4108, as illustrated in entry 498 of FIG. 4C) because the current replication count on VTEP5110 for VLAN5 is “2” (as indicated by entry 428 of FIG. 4A), which does not exceed the default RTC threshold of “2” for VTEP5110. Continuing to apply the algorithm of FIG. 4B, BD5(4) is set to “G2” (as illustrated in entry 498 of FIG. 4C) because the current replication count on VTEP5110 for VLAN4 is “3” (as indicated by entry 428 of FIG. 4A), which does exceed the default RTC threshold of “2” for VTEP5110. “BD5(4)=G2” can also indicate that the broadcast domain of VLAN 4 at VTEP 5 has the multicast tunnels mapped to the multicast IP group address for G2, where VTEP1102, VTEP3106, VTEP4108, and VTEP5110 have joined the G2 tree (as described above in relation to FIG. 2B). Thus, the broadcast domain for VTEP5110 across its advertised VXLANs is a hybrid domain which includes both multicast and unicast tunnels, as BD5(4) includes the multicast tunnel to G2 and BD5(5) includes the unicast tunnels to VTEP3106 and VTEP4108.

FIG. 5 illustrates an environment depicting an overlay network 500 connected in a mesh topology, based on the efficient broadcast domain model represented by the table in FIG. 4C, in accordance with an aspect of the present application. Similar to overlay network 100 of FIG. 1, overlay network 500 can include five network devices (502, 504, 506, 508, and 510) and end devices (e.g., client device or servers, not shown) coupled to the network devices. Network devices 502-510 can operate as tunnel endpoints (e.g., VTEPs in a VXLAN) in overlay network 500, and overlay network 100 can be a distributed tunnel fabric in which network devices 502-510 can be coupled to each other via tunnels.

Each of network devices 502-510 can be indicated with the list of advertised VLANs (or VNIs for VXLANs). For example: a VTEP1502 node can advertise VXLANs as indicated by (1,2,3,4); a VTEP2504 node can advertise VXLANs as indicated by (1,2); a VTEP3506 node can advertise VXLANs as indicated by (3,4,5); a VTEP4508 node can advertise VXLANs as indicated by (3,4,5); and a VTEP5510 node can advertise VXLANs as indicated by (3,4,5).

Network 500 can depict the hybrid broadcast domain represented by table 470 in FIG. 4C, including unicast tunnels and multicast tunnels in the same broadcast domain. An index 530 can indicate the various types of tunnels: possible unicast tunnels can be depicted with a light dashed line; hybrid BD unicast tunnels corresponding to table 470 can be depicted with a heavy dotted line; hybrid BD multicast tunnels for G1 can be depicted with a heavy solid line; and hybrid BD multicast tunnels for G2 can be depicted with a heavy dashed line.

For example, network 500 can indicate that VTEP1502, VTEP3506, VTEP4508, and VTEP5510 (which each advertise VXLAN4, which is mapped to G2) have joined the G2 tree and are using multicast tunnels associated with the multicast IP group address for G2, as indicated by the heavy dashed lines from G2522 to the respective VTEPs (VTEP1502, VTEP3, 506, VTEP4508, and VTEP5110) (e.g., a heavy dashed line 540 from VTEP3506 to G2522). Similarly, network 500 can indicate that VTEP1502, VTEP3506, VTEP4508, and VTEP5510 (which each advertise VXLAN3, which is mapped to G1) have joined the G1 tree and are using multicast tunnels associated with the multicast IP group address for G1, as indicated by the heavy solid lines from G1520 to the respective VTEPs (VTEP1502, VTEP3, 506, VTEP4508, and VTEP5110) (e.g., a heavy solid line 542 from VTEP1502 to G1520).

At the same time, VTEP1502 has determined and added a unicast tunnel 544 to VTEP2504, via which to transmit multicast VXLAN packets on VXLAN1 and VXLAN2, as indicated by entry 490 in FIG. 4C and by the heavy dotted line (544) from VTEP1502 to VTEP2504. Similarly, VTEP2504 has determined and added a (same) unicast tunnel 540 to VTEP1502, via which to transmit multicast VXLAN packets on VXLAN1 and VXLAN, as indicated by entry 492 and by the same heavy dotted line (540) from VTEP2504 to VTEP1502.

As another example, VTEP3506 has determined and added unicast tunnels 552 and 554 to VTEP4508 and VTEP5510, as indicated by entry 494 in FIG. 4C and by the heavy dotted lines (552 and 554) from VTEP3506 to VTEP4508 and VTEP5510. Entries 496 and 498 indicate similar determined and added unicast tunnels 552/556 and 554/556, respectively, from VTEP4508 to VTEP3506 and VTEP5510 and from VTEP5510 to VTEP3506 and VTEP4508, as indicated by the heavy dotted lines (552, 554, and 556).

Thus, table 470 and network 500 demonstrate how a broadcast domain for a VTEP over various VXLANs can include either an ingress replication model (first type) or an underlay multicast model (second type) (as in entry 492) or a hybrid model (third type) which includes both the ingress replication model and the underlay multicast model (as in entries 490, 494, 496, and 498).

Comparison of Received, Transmitted, and Dropped Packets Using Three Types of Broadcast Domain Models

In FIG. 6A, table 600 indicates the performance of network 100 based on the ingress replication model. Each of entries 610-618 in table 600 can include: a tunnel endpoint (TEP) 602, such as a VTEP; a number of packets received (RX 604) if a flood occurs at the remote VTEPs on the VXLANs also advertised by the corresponding TEP; a number of packets transmitted (TX 606) if a single packet is sent or a flood occurs on the corresponding VTEP; and a number of packets which are dropped or suffer ingress discard (in discards 608) based on the chosen replication model. For example, based on the network 100 of FIG. 1 and the ingress replication model table 200 of FIG. 2A, entry 610 indicates a total of 8 packets received: if a flood occurs on VLAN1 at VTEP2104, VTEP1102 will receive 1 packet; if a flood occurs on VLAN2 at VTEP2104, VTEP1102 will receive 1 packet; if a flood occurs on VLAN3 at VTEP3106, VTEP4108, and VTEP5110, VTEP1102 will receive 3 packets, i.e., one packet from each of these three VTEPs; and if a flood occurs on VLAN4 at VTEP3106, VTEP4108, and VTEP5110, VTEP1102 will receive 3 packets, i.e., one packet from each of these three VTEPs.

Entry 610 also indicates a total of 8 packets transmitted. If a single packet is sent or a flood occurs at VTEP1102: on VLAN1, VTEP1102 will send 1 packet to VTEP2104; on VLAN2, VTEP1102 will send 1 packet to VTEP2104; on VLAN3, VTEP1102 will send 3 packets to each of VTEP3106, VTEP4108, and VTEP5110; and on VLAN4, VTEP1102 will send 3 packets to each of VTEP3106, VTEP4108, and VTEP5110. As described above in relation to FIG. 1, ingress replication can result in zero in discards (as illustrated in table 600), but an ingress node (or TEP) may suffer from insufficient replication resources.

In FIG. 6B, table 630 indicates the performance of network 100 based on the underlay multicast model. Each of entries 640-648 in table 630 can include information similar to table 600 of FIG. 6A, including: a TEP 632; a number of packets received (RX 634); a number of packets transmitted (TX 636); and a number of in discards 638 based on the chosen replication model. For example, based on the network 100 of FIG. 1 and the underlay multicast model table 240 of FIG. 2B, entry 642 indicates a total of 13 packets received: if a flood occurs on VLAN1 at VTEP1102, VTEP2104 will receive 1 packet (via G1); if a flood occurs on VLAN2 at VTEP1102, VTEP1102 will receive 1 packet (via G1); if a flood occurs on VLAN3 at VTEP1102, VTEP3106, VTEP4108, and VTEP5110, VTEP2104 will receive 4 packets (via G1), i.e., one packet from each of these four VTEPs; if a flood occurs on VLAN4 at VTEP1102, VTEP3106, VTEP4108, and VTEP5110, VTEP2104 will receive 4 packets (via G2), i.e., one packet from each of these four VTEPs; and if a flood occurs on VLAN5 at VTEP3106, VTEP4108, and VTEP5110, VTEP2104 will receive 3 packets (via G1).

Entry 642 also indicates a total of 2 packets transmitted by VTEP2104. If a single packet is sent or a flood occurs at VTEP2104: on VLAN1, VTEP2104 will send 1 packet to VTEP1102; on VLAN2, VTEP2104 will send 1 packet to VTEP1102. As described above in relation to FIG. 1, underlay multicast can result a significantly reduced number of packets transmitted, but can result in a significantly higher number of in discards (as illustrated in table 630).

In FIG. 6C, table 660 indicates the performance of network 100 (or similarly, network 500 of FIG. 5) based on the efficient broadcast domain model. Each of entries 670-678 in table 660 can include information similar to table 600 of FIG. 6A, including: a TEP 662; a number of packets received (RX 664); a number of packets transmitted (TX 666); and a number of in discards 668 based on the chosen replication model. For example, based on the network 100 of FIG. 1 and the hybrid efficient broadcast domain table 470 of FIG. 2B, entry 670 indicates: a total of 8 packets received (664) (similar to entry 610 of the ingress replication model table 600); a total of 4 packets transmitted (666) (similar to entry 640 of the underlay multicast model table 630); and a total of 0 in discards (668) (similar to entry 610 of the ingress replication model table 600). Thus, FIG. 6C demonstrates that the hybrid broadcast domain model (hybrid model or third type) can eliminate ingress discards (similar to using only the ingress replication model) and can reduce the number of transmitted packets (similar to using only the underlay multicast model). When compared to the ingress replication model, the hybrid model can result in a more efficient transmit count due to less traffic being sent unnecessarily via multicast groups (e.g., ˜200% improvement in replicating flood packets). When compared to the underlay multicast model, the hybrid model can also result in a more efficient receive count due to less traffic being sent unnecessarily via multicast groups, which can further result in a significantly decreased number of in discards (e.g., ˜150% improvement in utilizing underlay traffic).

Method for Facilitating Building an Efficient Broadcast Domain Based on Workload

FIG. 7A presents a flowchart 700 illustrating a method for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application. During operation, the system determines a replication threshold for a device in an overlay network, wherein the device operates as a tunnel endpoint, and wherein the replication threshold indicates a capacity of replication resources of the device (operation 702). The device can be a tunnel endpoint (TEP), and in a VXLAN, the device can be a virtual TEP (VTEP). The replication threshold can be referred to as the “Replication Threshold Count” or “RTC,” and can be stored in the EVPN table based on an optional parameter included in a BGP Open message transmitted between the endpoints, as described above in relation to FIG. 3. In some aspects, the RTC for a network device can be set to a default value of 2, which can indicate that the network device is both ingress replication-capable and underlay multicast-replication capable. The system obtains a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN) (operation 704), as described above in relation to table 400 of FIG. 4A.

The system selects a replication type for a broadcast domain specific to the device and the respective VXLAN based on the replication threshold and the current replication count (operation 706). The replication type indicates one of: a first type associated with an ingress replication broadcast domain comprising unicast tunnels; a second type associated with an underlay multicast broadcast domain comprising multicast tunnels; and a third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type (i.e., comprising both unicast and multicast tunnels). The system can select the replication type based on the algorithm (430) described above in relation to FIG. 4B. The system can use a combination of the RTC (as stored in the EVPN table of the device) and the dynamically learned current replication count (based on information exchanged with its peer nodes) to categorize or determine the replication type for a given broadcast domain and respective VXLAN (e.g., BDx(y), as described above in relation to FIGS. 4B and 4C).

The system creates the broadcast domain specific to the device and the respective VXLAN based on the selected replication type (operation 708). If the selected replication type is the first type, the system adds a unicast tunnel to the ingress replication broadcast domain, as described above in relation to FIGS. 1, 2A, and 5. If the selected replication type is the second type, the system adds a multicast tunnel to the multicast underlay broadcast domain, as described above in relation to FIGS. 1, 2B, and 5. If the selected replication type is the third type, the system adds to the hybrid broadcast domain: a unicast tunnel associated with the respective remote endpoint capable of ingress replication; and a multicast tunnel associated with a multicast group for the respective remote endpoint (as described above in relation to FIGS. 4B, 4C, and 5).

FIGS. 7B-7C present flowcharts 720 and 740 illustrating a method for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application. In flowchart 720 of FIG. 7B, the system determines a replication threshold for a device in an overlay network, wherein the device operates as a tunnel endpoint, and wherein the replication threshold indicates a capacity of replication resources of the device (operation 722, similar to operation 702 of FIG. 7A). The system obtains a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN) (operation 724, similar to operation 704 of FIG. 7A).

The system selects a replication type for a broadcast domain specific to the device and the respective VXLAN (based on the replication threshold and the current replication count) (operation 726, similar to operation 706 of FIG. 7A). The replication type can indicate one of three types, as described herein. The system can select the replication type based on the algorithm (430) described above in relation to FIG. 4B. The system can use a combination of the RTC (as stored in the EVPN table of the device) and the dynamically learned current replication count (based on information exchanged with its peer nodes) to determine or define the replication type for a given broadcast domain and respective VXLAN (e.g., BDx(y), as described above in relation to FIGS. 4B and 4C).

If the current replication count is not greater than (i.e., is less than or equal to) the replication threshold (decision 728), the system selects a first type, which is associated with an ingress replication broadcast domain comprising unicast tunnels (operation 730), as described above in relation to section 434 of FIG. 4B. The system adds a unicast tunnel to the ingress replication broadcast domain (operation 732) and the operation returns. If the current replication count is greater than the replication threshold (decision 728), the operation continues at Label A of FIG. 7C.

In flowchart 740 of FIG. 7C, the system selects a second type, which is associated with an underlay multicast broadcast domain comprising multicast tunnels (operation 742), as described above in relation to section 436 of FIG. 4B. The system iterates through remote endpoints advertised in the broadcast domain specific to the device and the respective VXLAN (operation 744), by performing operation 746, decision 748, and operations 750, 752, and 754. The iteration is described above in relation to section 438 of FIG. 4B.

The system determines whether a respective remote endpoint is capable of ingress replication based on the first type of broadcast domain (operation 746). If the respective remote endpoint is not only ingress replication-capable (decision 748), the operation continues at operation 754. That is, the system adds to the underlay multicast broadcast domain, a multicast tunnel associated with a multicast group for the respective remote endpoint (operation 754).

If the respective remote endpoint is only ingress replication-capable (decision 748), the system selects a third type, which is associated with a hybrid broadcast domain comprising a combination of the first type (unicast tunnels) and the second type (multicast tunnels) (operation 750). Determining whether a remote endpoint is only ingress replication-capable can be based on determining that the replication threshold count for the remote endpoint is set to a value of 0, as described above in relation to section 438 of FIG. 4B. The system adds, to the hybrid broadcast domain, a unicast tunnel associated with the respective remote endpoint capable of ingress replication (operation 752) and also adds, to the hybrid broadcast domain, a multicast tunnel associated with a multicast group for the respective remote endpoint (operation 754). Thus, the system iterates through the remote endpoints for the broadcast domain and determines whether any are only ingress replication-capable, in which case the hybrid replication type is selected, a unicast tunnel is added, and multicast tunnels are added for the remote endpoints which are not only ingress replication-capable, as described above in relation to efficient broadcast domain model table 470 of FIG. 4C and overlay network 500 of FIG. 5.

Computer System for Facilitating Building an Efficient Broadcast Domain Based on Workload

FIG. 8 illustrates a computer system 800 for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application. Computer system 800 includes a processor 802, a memory 804, and a storage device 806. Memory 804 can include a volatile memory (e.g., random access memory (RAM)) that serves as a managed memory and can be used to store one or more memory pools. Furthermore, computer system 800 can be coupled to peripheral I/O user devices 810 (e.g., a display device 811, a keyboard 812, and a pointing device 813). Storage device 806 includes non-transitory computer-readable storage medium and stores an operating system 816, a content-processing system 818, and data 830. Computer system 800 may include fewer or more entities or instructions than those shown in FIG. 8.

Content-processing system 818 can include instructions, which when executed by computer system 800, can cause computer system 800 to perform methods and/or processes described in this disclosure. Specifically, content-processing system 818 may include instructions 820 to identify a replication threshold for a device in an overlay network. The device can operate as a tunnel endpoint in the overlay network, and the replication threshold can indicate a capacity of replication resources of the device. Content-processing system 818 may include instructions 822 to determine a current replication count for the device corresponding to a respective VXLAN, as described above in relation to table 400 of FIG. 4A. Content-processing system 818 may include instructions 824 to define, for a broadcast domain specific to the device and the respective VXLAN, a replication type based on the replication threshold and the current replication count. The replication type can indicate one of three types (i.e., ingress replication, underlay multicast, or hybrid), as described above in relation to FIGS. 4B, 4C, and 5. Content-processing system 818 may include instructions 826 to create the broadcast domain specific to the device and the respective VXLAN based on the defined replication type, as described above in relation to overlay network 500 of FIG. 5.

Data 830 can include any data that is required as input or that is generated as output by the methods, operations, communications, and/or processes described in this disclosure. Specifically, data 830 can store at least: a data packet; a control packet; a replication threshold count; a default value; a current replication count; a replication type; an indicator of a unicast tunnel or a multicast tunnel; a multicast group identifier; a multicast group IP address; a table; a data structure; an EVPN table; a configuration; maximum supported capabilities of a device; an indicator of whether a remote endpoint is only capable of ingress replication; a BGP message; an optional parameter; a code; a length; and a value.

Computer system 800 and content-processing system 818 may include more instructions than those shown in FIG. 8. For example, content-processing system 818 can also store instructions for executing the operations described above in relation to: overlay network 100 of FIG. 1; algorithm 430 of FIG. 4B; efficient broadcast domain model table 470 of FIG. 4C; overlay network 500 of FIG. 5; the operations depicted in the flowcharts of FIGS. 7A-7C; and the instructions of non-transitory computer-readable storage medium 900 in FIG. 9.

Non-Transitory Computer-Readable Medium for Facilitating Building an Efficient Broadcast Domain Based on Workload

FIG. 9 illustrates a non-transitory computer-readable medium (CRM) 900 for facilitating building an efficient EVPN VXLAN broadcast domain based on workload, in accordance with an aspect of the present application. CRM 900 can store instructions that when executed by a computer cause the computer to perform the methods, operations, and functions described herein. Specifically, CRM 900 can store instructions 902 to determine a capacity of replication resources for a device in an overlay network. CRM 900 can store instructions 904 to obtain a current replication count for the device corresponding to a respective VXLAN, wherein the device operates as a tunnel endpoint. CRM 900 can further store instructions 906 to select a replication type (i.e., ingress replication, underlay multicast, or hybrid) for a broadcast domain specific to the device and the respective VXLAN based on the capacity of replication resources and the current replication count. CRM 900 can store instructions 908 to create the broadcast domain specific to the device and the respective VXLAN based on the selected replication type.

CRM 900 can additionally include instructions 910 to iterate through remote endpoints advertised in the broadcast domain to determine ingress replication-capable-only endpoints and select the hybrid broadcast domain. CRM 900 can also store instructions 912 to communicate the replication threshold to remote tunnel endpoints via BGP messages.

CRM 900 may include more instructions than those shown in FIG. 9. For example, CRM 900 can also store instructions for executing the operations described above in relation to: overlay network 100 of FIG. 1; algorithm 430 of FIG. 4B; efficient broadcast domain model table 470 of FIG. 4C; overlay network 500 of FIG. 5; the operations depicted in the flowcharts of FIGS. 7A-7C; and the instructions of content-processing system 818 in FIG. 8.

Aspects and Variations

In general, the disclosed aspects provide a method, computer system, and non-transitory computer-readable storage medium for facilitating efficient iterative collective operations using a network-attached memory. In one aspect, a method is performed, e.g., by a system. The system determines a replication threshold for a device in an overlay network, wherein the device operates as a tunnel endpoint, and wherein the replication threshold indicates a capacity of replication resources of the device. The system obtains a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN). The system selects a replication type for a broadcast domain specific to the device and the respective VXLAN based on the replication threshold and the current replication count, wherein the replication type indicates one of: a first type associated with an ingress replication broadcast domain comprising unicast tunnels; a second type associated with an underlay multicast broadcast domain comprising multicast tunnels; and a third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type. The system creates the broadcast domain specific to the device and the respective VXLAN based on the selected replication type.

In a variation on this aspect, determining the replication threshold for the device comprises determining at least one of: a configuration set by an administrator based on a capacity of the device; and maximum supported capabilities of the device.

In a further variation, the overlay network comprises an Ethernet Virtual Private Network (EVPN), and the device operates as a virtual tunnel endpoint (VTEP) in the EVPN.

In a further variation, the replication threshold is stored by the network device in a data structure based on control packets transmitted in the EVPN.

In a further variation, selecting the replication type comprises: selecting the first type in response to the current replication count being less than or equal to the replication threshold; and selecting the second type in response to the current replication count being greater than the replication threshold.

In a further variation, subsequent to selecting the second type, the system iterates through remote endpoints advertised in the broadcast domain specific to the device and the respective VXLAN by: determining whether a respective remote endpoint is capable of ingress replication based on the first type of broadcast domain; and in response to determining that the respective remote endpoint is capable of ingress replication, selecting the third type as the replication type for the broadcast domain specific to the device and the respective VXLAN.

In a further variation, creating the broadcast domain comprises the following operations. In response to selecting the first type, the system adds a unicast tunnel to the ingress replication broadcast domain. In response to selecting the second type, the system adds a multicast tunnel to the underlay multicast broadcast domain. In response to selecting the third type: the system adds, to the hybrid broadcast domain, a unicast tunnel associated with the respective remote endpoint capable of ingress replication; and the system adds, to the hybrid broadcast domain, a multicast tunnel associated with a multicast group for the respective remote endpoint.

In a further variation, the system communicates the replication threshold to remote tunnel endpoints via a Border Gateway Protocol (BGP) message.

In a further variation, the BGP message comprises a BGP Open message with an optional parameter. The optional parameter comprises: a code indicating a capability as the replication threshold; a length indicating a size of the optional parameter; and a value of the replication threshold.

In another aspect, a computer system comprises a processor and a storage device storing instructions that when executed by the processor cause the processor to perform a method, i.e., the instructions are to: identify a replication threshold for a device in an overlay network, wherein the device operates as a tunnel endpoint, and wherein the replication threshold indicates a capacity of replication resources of the device; determine a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN); define, for a broadcast domain specific to the device and the respective VXLAN, a replication type based on the replication threshold and the current replication count, wherein the replication type indicates one of a first type associated with an ingress replication broadcast domain comprising unicast tunnels, a second type associated with an underlay multicast broadcast domain comprising multicast tunnels, and a third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type; and create the broadcast domain specific to the device and the respective VXLAN based on the defined replication type. The instructions can further perform the operations described herein, including in relation to: the communications in FIGS. 1, 4A-4B, 6A-6B, and 7; the operations depicted in the flowcharts of FIGS. 8A-8B; and the instructions of non-transitory computer-readable storage medium 900 in FIG. 9.

In yet another aspect, a non-transitory computer-readable storage medium stores instructions to: determine a capacity of replication resources for a device in an overlay network; obtain a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN), wherein the device operates as a tunnel endpoint; select a replication type for a broadcast domain specific to the device and the respective VXLAN based on the capacity of replication resources and the current replication count, wherein the replication type indicates one or more of a first type associated with an ingress replication broadcast domain comprising unicast tunnels, a second type associated with an underlay multicast broadcast domain comprising multicast tunnels, and a third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type; and create the broadcast domain specific to the device and the respective VXLAN based on the selected replication type. The instructions can further perform the operations described herein, including in relation to: overlay network 100 of FIG. 1; algorithm 430 of FIG. 4B; efficient broadcast domain model table 470 of FIG. 4C; overlay network 500 of FIG. 5; the operations depicted in the flowcharts of FIGS. 7A-7C; and the instructions of content-processing system 818 in FIG. 8.

The foregoing description is presented to enable any person skilled in the art to make and use the aspects and examples, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed aspects will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other aspects and applications without departing from the spirit and scope of the present disclosure. Thus, the aspects described herein are not limited to the aspects shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.

Furthermore, the foregoing descriptions of aspects have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the aspects described herein to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the aspects described herein. The scope of the aspects described herein is defined by the appended claims.

Claims

1. A method, comprising: determining a replication threshold for a device in an overlay network, wherein the device operates as a tunnel endpoint, and wherein the replication threshold indicates a capacity of replication resources of the device;obtaining a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN);selecting a replication type for a broadcast domain specific to the device and the respective VXLAN based on the replication threshold and the current replication count, wherein the replication type indicates one of: a first type associated with an ingress replication broadcast domain comprising unicast tunnels;a second type associated with an underlay multicast broadcast domain comprising multicast tunnels; anda third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type; andcreating the broadcast domain specific to the device and the respective VXLAN based on the selected replication type.
2. The method of claim 1, wherein determining the replication threshold for the device comprises determining at least one of: a configuration set by an administrator based on a capacity of the device; andmaximum supported capabilities of the device.
3. The method of claim 1, wherein the overlay network comprises an Ethernet Virtual Private Network (EVPN), andwherein the device operates as a virtual tunnel endpoint (VTEP) in the EVPN.
4. The method of claim 3, further comprising: wherein the replication threshold is stored by the network device in a data structure based on control packets transmitted in the EVPN.
5. The method of claim 1, wherein selecting the replication type comprises: selecting the first type in response to the current replication count being less than or equal to the replication threshold; andselecting the second type in response to the current replication count being greater than the replication threshold.
6. The method of claim 5, wherein subsequent to selecting the second type, the method further comprises: iterating through remote endpoints advertised in the broadcast domain specific to the device and the respective VXLAN by: determining whether a respective remote endpoint is capable of ingress replication based on the first type of broadcast domain; andin response to determining that the respective remote endpoint is capable of ingress replication, selecting the third type as the replication type for the broadcast domain specific to the device and the respective VXLAN.
7. The method of claim 6, wherein creating the broadcast domain comprises: in response to selecting the first type, adding a unicast tunnel to the ingress replication broadcast domain;in response to selecting the second type, adding a multicast tunnel to the underlay multicast broadcast domain; andin response to selecting the third type: adding, to the hybrid broadcast domain, a unicast tunnel associated with the respective remote endpoint capable of ingress replication; andadding, to the hybrid broadcast domain, a multicast tunnel associated with a multicast group for the respective remote endpoint.
8. The method of claim 1, further comprising: communicating the replication threshold to remote tunnel endpoints via a Border Gateway Protocol (BGP) message.
9. The method of claim 8, wherein the BGP message comprises a BGP Open message with an optional parameter, andwherein the optional parameter comprises: a code indicating a capability as the replication threshold;a length indicating a size of the optional parameter; anda value of the replication threshold.
10. A computer system, comprising: a processor; anda storage device storing instructions to: identify a replication threshold for a device in an overlay network, wherein the device operates as a tunnel endpoint, and wherein the replication threshold indicates a capacity of replication resources of the device;determine a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN);define, for a broadcast domain specific to the device and the respective VXLAN, a replication type based on the replication threshold and the current replication count, wherein the replication type indicates one of: a first type associated with an ingress replication broadcast domain comprising unicast tunnels;a second type associated with an underlay multicast broadcast domain comprising multicast tunnels; anda third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type; andcreate the broadcast domain specific to the device and the respective 18 VXLAN based on the defined replication type.
11. The computer system of claim 10, wherein identifying the replication threshold for the device comprises determining at least one of: a configuration set by an administrator based on a capacity of the device; andmaximum supported capabilities of the device.
12. The computer system of claim 10, wherein the overlay network comprises an Ethernet Virtual Private Network (EVPN), andwherein the device operates as a virtual tunnel endpoint (VTEP) in the EVPN.
13. The computer system of claim 10, wherein defining the replication type for the broadcast domain comprises: selecting the first type in response to the current replication count being less than or equal to the replication threshold; andselecting the second type in response to the current replication count being greater than the replication threshold.
14. The computer system of claim 13, wherein subsequent to selecting the second type, the instructions are further to: iterate through remote endpoints advertised in the broadcast domain specific to the device and the respective VXLAN by: determining whether a respective remote endpoint is capable of ingress replication based on the first type of broadcast domain; andin response to determining that the respective remote endpoint is capable of ingress replication, selecting the third type as the replication type for the broadcast domain specific to the device and the respective VXLAN.
15. The computer system of claim 14, wherein creating the broadcast domain comprises: adding a unicast tunnel to the ingress replication broadcast domain responsive to selecting the first type;adding a multicast tunnel to the underlay multicast broadcast domain responsive to selecting the second type; andresponsive to selecting the third type: adding, to the hybrid broadcast domain, a unicast tunnel associated with the respective remote endpoint capable of ingress replication; andadding, to the hybrid broadcast domain, a multicast tunnel associated with a multicast group for the respective remote endpoint.
16. The computer system of claim 10, the instructions further to: communicate the replication threshold to remote tunnel endpoints via a Border Gateway Protocol (BGP) Open message which includes an optional parameter,wherein the optional parameter comprises: a code indicating a capability as the replication threshold;a length indicating a size of the optional parameter; anda value of the replication threshold.
17. A non-transitory computer-readable storage medium storing instructions to: determine a capacity of replication resources for a device in an overlay network;obtain a current replication count for the device corresponding to a respective virtual extensible local area network (VXLAN), wherein the device operates as a tunnel endpoint;select a replication type for a broadcast domain specific to the device and the respective VXLAN based on the capacity of replication resources and the current replication count, wherein the replication type indicates one or more of: a first type associated with an ingress replication broadcast domain comprising unicast tunnels;a second type associated with an underlay multicast broadcast domain comprising multicast tunnels; anda third type associated with a hybrid broadcast domain comprising a combination of the first type and the second type; andcreate the broadcast domain specific to the device and the respective VXLAN based on the selected replication type.
18. The non-transitory computer-readable storage medium of claim 17, wherein selecting the replication type comprises: selecting the first type in response to the current replication count being less than or equal to the replication threshold; andselecting the second type in response to the current replication count being greater than the replication threshold.
19. The non-transitory computer-readable storage medium of claim 18, wherein subsequent to selecting the second type, the instructions are further to: iterate through remote endpoints advertised in the broadcast domain specific to the device and the respective VXLAN by: determining whether a respective remote endpoint is capable of ingress replication based on the first type of broadcast domain; andin response to determining that the respective remote endpoint is capable of ingress replication, selecting the third type as the replication type for the broadcast domain specific to the device and the respective VXLAN.
20. The non-transitory computer-readable storage medium of claim 17, the instructions further to: communicate the replication threshold to remote tunnel endpoints via a Border Gateway Protocol (BGP) Open message based on an optional parameter.

Priority Claims (1)

Number	Date	Country	Kind
202441003286	Jan 2024	IN	national

BUILDING AN EFFICIENT EVPN VXLAN BROADCAST DOMAIN BASED ON WORKLOAD

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)