The present disclosure relates to communication networks. More specifically, the present disclosure relates to a system and a method for dynamic multi-destination traffic management in a distributed tunnel endpoint.
The exponential growth of the Internet has made it a popular delivery medium for a variety of applications running on physical and virtual devices. Such applications have brought with them an increasing demand for bandwidth. As a result, equipment vendors race to build larger and faster switches with versatile capabilities, such as efficient forwarding of multi-destination (e.g., broadcast, unknown unicast, and multicast) traffic. However, the capabilities of a switch cannot grow infinitely. It is limited by physical space, power consumption, and design complexity, to name a few factors. Furthermore, switches with higher capability are usually more complex and expensive. As a result, increasing efficiency in existing capabilities of a switch adds significant value proposition.
Typically, to facilitate a service to a network, a service tunnel is established between a switch in the network and a service node providing the service. To ensure high availability, instead of a single switch establishing the tunnel, the network may establish such service tunnels from a distributed tunnel endpoint (DTE) in the network. A distributed tunnel endpoint can include a plurality of switches operating as a single, logical tunnel endpoint. A tunnel endpoint for a tunnel can originate or terminate tunnel forwarding for the tunnel.
While a distributed tunnel endpoint brings many desirable features to service tunnels, some issues remain unsolved in facilitating efficient forwarding of multi-destination traffic via service tunnels from a distributed tunnel endpoint.
One embodiment of the present invention provides a switch. The switch includes a storage device, a mapping module, and a packet processor. During operation, the mapping module maintains a first and a second mappings. The first mapping, which can be in the storage device, is between a first service tunnel identifier and a first virtual local area network (VLAN) identifier. The second mapping is between the first VLAN identifier and an indicator, which indicates whether the switch is elected as a designated forwarder of multi-destination traffic for the first service tunnel identifier. If the indicator indicates that the switch is the designated forwarder of multi-destination traffic for the first service tunnel identifier, the packet processor determines an egress port, which corresponds to the first service tunnel, for a packet belonging to multi-destination traffic of the first VLAN.
In a variation on this embodiment, the switch also includes a tunnel management module, which operates the switch as a distributed tunnel endpoint in conjunction with a second switch for a plurality of service tunnels. The switch and the second switch are associated with an Internet Protocol (IP) address indicating the distributed tunnel endpoint.
In a further variation, the packet processor encapsulates the packet with an encapsulation header and sets the IP address as a source address of the encapsulation header.
In a further variation, the tunnel management module elects a distribution master from the first and second switches. The distribution master is responsible for generating the first mapping and sharing the first mapping with other switches in the distributed tunnel endpoint. These other switches are precluded from generating the first mapping.
In a variation on this embodiment, the mapping module maintains, in the storage device, a third mapping between a second VLAN identifier and a second indicator, which indicates that the switch is not a designated forwarder of multi-destination traffic for a second service tunnel identifier.
In a further variation, the packet processor is precluded from determining a second egress port corresponding to the second service tunnel for a packet belonging to multi-destination traffic of the second VLAN.
In a variation on this embodiment, the first mapping is based on one or more of: a number of tunnels, a number of VLANs, and a traffic volume of a respective tunnel.
In a variation on this embodiment, the second mapping is stored in the forwarding table of the switch. The first VLAN identifier then includes a multicast group identifier for the first VLAN.
In the figures, like reference numerals refer to the same figure elements.
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
In embodiments of the present invention, the problem of efficiently forwarding multi-destination traffic from a distributed tunnel endpoint to service nodes via service tunnels is solved by (i) dynamically mapping virtual local area networks (VLANs) to the tunnels; and (ii) allocating a designated forwarder of BUM traffic for each tunnel. Multi-destination traffic can also be referred to as broadcast, unknown unicast, and multicast (BUM) traffic. Typically, such traffic is forwarded to multiple destinations. Here, a service node can be any node (e.g., a server or a network appliance) that provides a service to a switch or a network. A distributed tunnel endpoint can include a plurality of switches operating as a single, logical tunnel endpoint sharing a common tunnel address (e.g., a virtual Internet Protocol (IP) address).
With existing technologies, if a distributed tunnel endpoint operates as a tunnel endpoint for a plurality of tunnels, one of these tunnels is elected as a designated forwarder of BUM traffic. This tunnel is designated to carry BUM traffic belonging to a respective VLAN associated with the distributed tunnel endpoint. As a result, other tunnels may remain underutilized. Furthermore, if the designated tunnel or the service node associated with the designated tunnel fails, another tunnel (and its service node) is selected as the new designated forwarder of BUM traffic. This leads to reprogramming of a respective VLAN (e.g., reprogramming of multicast group identifiers associated with that VLAN) for that newly selected tunnel. Such a migration can be intensive for a large number of VLANs and cause delay in traffic switchover.
To solve this problem, switches operating as the distributed tunnel endpoint select one of the switches as a distribution master (can also be referred to as a DN master). In some embodiments, the switch with the lowest (or highest) switch identifier value is selected as the distribution master. Since any switch in the distributed tunnel endpoint can operate as the tunnel endpoint (e.g., initiate or terminate tunnel forwarding), the distribution master is configured with a respective tunnel. In some embodiments, the distribution master maintains a list of VLANs configured in the distributed tunnel endpoint. This allows the distribution master to map a respective VLAN to one of the service tunnels.
This mapping can be based on one or more of: the number of tunnels, the number of VLANs, and the traffic volume of a respective tunnel. The distribution master then includes this mapping in a notification message and sends the notification message to other switches in the distributed tunnel endpoint. Upon receiving the mapping, a respective switch can forward a packet belonging to a VLAN via the tunnel mapped to the VLAN. However, this can lead to traffic redundancy. To avoid redundant traffic, a switch forwards a packet belonging to a VLAN if the switch is a replicator for the tunnel mapped to the VLAN.
For example, one of the switches in the distributed tunnel endpoint is elected as a replicator of BUM traffic for a specific tunnel. The elected replicator becomes responsible for forwarding BUM traffic belonging to VLANs designated for that tunnel. Furthermore, the replicator is configured with multicast forwarding information for these VLANs. For example, forwarding entries for the multicast group identifiers of these VLANs are configured in the forwarding table of the replicator (e.g., in a content addressable memory (CAM)). In this way, the VLANs are distributed among the tunnels and one of the switches in the distributed tunnel endpoint is designated as the replicator for BUM traffic for the VLANs designated for a specific tunnel. Forwarding of BUM traffic of different VLANs thus becomes load balanced among different tunnels and different switches in the distributed tunnel endpoint.
In some embodiments, the switch can be a member switch of a network of interconnected switches (e.g., a fabric switch). In a fabric switch, any number of switches coupled in an arbitrary topology can be controlled as a single logical switch. The fabric switch can be an Ethernet fabric switch or a virtual cluster switch (VCS), which can operate as a single Ethernet switch. In some embodiments, a respective switch in the fabric switch is an Internet Protocol (IP) routing-capable switch (e.g., an IP router). In some further embodiments, a respective switch in the fabric switch is a Transparent Interconnection of Lots of Links (TRILL) routing bridge (RBridge).
It should be noted that a fabric switch is not the same as conventional switch stacking. In switch stacking, multiple switches are interconnected at a common location (often within the same rack), based on a particular topology, and manually configured in a particular way. These stacked switches typically share a common address, e.g., an IP address, so they can be addressed as a single switch externally. Furthermore, switch stacking requires a significant amount of manual configuration of the ports and inter-switch links. The need for manual configuration prohibits switch stacking from being a viable option in building a large-scale switching system. The topology restriction imposed by switch stacking also limits the number of switches that can be stacked. This is because it is very difficult, if not impossible, to design a stack topology that allows the overall switch bandwidth to scale adequately with the number of switch units.
In contrast, a fabric switch can include an arbitrary number of switches with individual addresses, can be based on an arbitrary physical topology, and does not require extensive manual configuration. The switches can reside in the same location, or be distributed over different locations. These features overcome the inherent limitations of switch stacking and make it possible to build a large “switch farm,” which can be treated as a single, logical switch. Due to the automatic configuration capabilities of the fabric switch, an individual physical switch can dynamically join or leave the fabric switch without disrupting services to the rest of the network.
Furthermore, the automatic and dynamic configurability of the fabric switch allows a network operator to build its switching system in a distributed and “pay-as-you-grow” fashion without sacrificing scalability. The fabric switch's ability to respond to changing network conditions makes it an ideal solution in a virtual computing environment, where network loads often change with time.
It should also be noted that a fabric switch is distinct from a VLAN. A fabric switch can accommodate a plurality of VLANs. A VLAN is typically identified by a VLAN tag. In contrast, the fabric switch is identified by a fabric identifier (e.g., a cluster identifier), which is assigned to the fabric switch. Since a fabric switch can be represented as a logical chassis, the fabric identifier can also be referred to as a logical chassis identifier. A respective member switch of the fabric switch is associated with the fabric identifier. In some embodiments, a fabric switch identifier is pre-assigned to a member switch. As a result, when the switch joins a fabric switch, other member switches identifies the switch to be a member switch of the fabric switch.
In this disclosure, the term “fabric switch” refers to a number of interconnected physical switches which can form a single, scalable network of switches. The member switches of the fabric switch can operate as individual switches. The member switches of the fabric switch can also operate as a single logical switch in the provision and control plane, the data plane, or both. “Fabric switch” should not be interpreted as limiting embodiments of the present invention to a plurality of switches operating as a single, logical switch. In this disclosure, the terms “fabric switch” and “fabric” are used interchangeably.
Although the present disclosure is presented using examples based on an encapsulation protocol, embodiments of the present invention are not limited to networks defined using one particular encapsulation protocol associated with a particular Open System Interconnection Reference Model (OSI reference model) layer. For example, embodiments of the present invention can also be applied to a multi-protocol label switching (MPLS) network. In this disclosure, the term “encapsulation” is used in a generic sense, and can refer to encapsulation in any networking layer, sub-layer, or a combination of networking layers.
The term “end host” can refer to any device external to a network (e.g., does not perform forwarding in that network). Examples of an end host include, but are not limited to, a physical or virtual machine, a conventional layer-2 switch, a layer-3 router, or any other type of network device. Additionally, an end host can be coupled to other switches or hosts further away from a layer-2 or layer-3 network. An end host can also be an aggregation point for a number of network devices to enter the network. An end host hosting one or more virtual machines can be referred to as a host machine. In this disclosure, the terms “end host” and “host machine” are used interchangeably.
The term “VLAN” is used in a generic sense, and can refer to any virtualized network. Any virtualized network comprising a segment of physical networking devices, software network resources, and network functionality can be can be referred to as a “VLAN.” “VLAN” should not be interpreted as limiting embodiments of the present invention to layer-2 networks. “VLAN” can be replaced by other terminologies referring to a virtualized network or network segment, such as “Virtual Private Network (VPN),” “Virtual Private LAN Service (VPLS),” or “Easy Virtual Network (EVN).”
The term “packet” refers to a group of bits that can be transported together across a network. “Packet” should not be interpreted as limiting embodiments of the present invention to layer-3 networks. “Packet” can be replaced by other terminologies referring to a group of bits, such as “frame,” “cell,” or “datagram.”
The term “switch” is used in a generic sense, and can refer to any standalone or fabric switch operating in any network layer. “Switch” can be a physical device or software running on a computing device. “Switch” should not be interpreted as limiting embodiments of the present invention to layer-2 networks. Any device that can forward traffic to an external device or another switch can be referred to as a “switch.” Examples of a “switch” include, but are not limited to, a layer-2 switch, a layer-3 router, a TRILL RBridge, or a fabric switch comprising a plurality of similar or heterogeneous smaller physical switches.
The term “edge port” refers to a port on a network which exchanges data frames with a device outside of the network (i.e., an edge port is not used for exchanging data frames with another member switch of a network). The term “inter-switch port” refers to a port which sends/receives data frames among member switches of the network. A link between inter-switch ports is referred to as an “inter-switch link.” The terms “interface” and “port” are used interchangeably.
The term “switch identifier” refers to a group of bits that can be used to identify a switch. Examples of a switch identifier include, but are not limited to, a media access control (MAC) address, an Internet Protocol (IP) address, an RBridge identifier, or a combination thereof. In this disclosure, “switch identifier” is used as a generic term, is not limited to any bit format, and can refer to any format that can identify a switch.
The term “tunnel” refers to a data communication where one or more networking protocols are encapsulated using another networking protocol. Although the present disclosure is presented using examples based on a layer-3 encapsulation of a layer-2 protocol, “tunnel” should not be interpreted as limiting embodiments of the present invention to layer-2 and layer-3 protocols. A “tunnel” can be established for and using any networking layer, sub-layer, or a combination of networking layers.
In some further embodiments, network 100 is an IP network and a respective switch of network 100, such as switch 103, is an IP-capable switch, which calculates and maintains a local IP routing table (e.g., a routing information base or RIB), and is capable of forwarding packets based on its IP addresses. Under such a scenario, communication among the switches in network 100 is based on IP or IP-based tunneling. For example, upon receiving an Ethernet frame from end device 112, switch 103 encapsulates the received Ethernet frame in an IP header (and/or a tunneling header) and forwards the IP packet. Examples of a tunneling protocol include, but are not limited to, virtual extensible LAN (VXLAN), generic routing encapsulation (GRE), layer-2 tunneling protocol (L2TP), and multi-protocol label switching (MPLS).
In some embodiments, network 100 is a fabric switch (under such a scenario, network 100 can also be referred to as fabric switch 100). Fabric switch 100 is identified by and assigned with a fabric switch identifier (e.g., a fabric label). A respective member switch of fabric switch 100 is associated with that fabric switch identifier. This allows the member switch to indicate that it is a member of fabric switch 100. In some embodiments, whenever a new member switch joins fabric switch 100, the fabric switch identifier is associated with that new member switch. Furthermore, a respective member switch of fabric switch 100 is assigned a switch identifier (e.g., an RBridge identifier, a Fibre Channel (FC) domain ID (identifier), or an IP address). This switch identifier identifies the member switch in fabric switch 100. The fabric label can be included in a header of packet for any inter-fabric and/or intra-fabric communication.
Switches in network 100 use edge ports to communicate with end devices (e.g., non-member switches) and inter-switch ports to communicate with other member switches. For example, switch 103 is coupled to end device 112 via an edge port and to switches 101, 102, and 104 via inter-switch ports and one or more links. Data communication via an edge port can be based on Ethernet and via an inter-switch port can be based on an encapsulation protocol (e.g., VXLAN or TRILL). It should be noted that control message exchange via inter-switch ports can be based on a different protocol (e.g., the IP or FC protocol).
In this example, switches 101 and 102, in conjunction with each other, operate as a distributed tunnel endpoint 120 (e.g., a VXLAN tunnel endpoint (VTEP)). Here, switches 101 and 102 operate as a single, logical tunnel endpoint sharing a common tunnel address, which is virtual switch identifier 152 (e.g., a virtual IP address). In some embodiments, an administrator configures distributed tunnel endpoint 120 for switches 101 and 102. During operation, distributed tunnel endpoint 120 establishes service tunnels 132 and 134 with service nodes 142 and 144, respectively. Here, service node 142 or 144 can be a physical or a virtual node that provides a service to a switch or a network. Examples of a service node include, but are not limited to, a server, a tunnel gateway (e.g., a VXLAN gateway), a virtual machine, a storage device, and a network appliance.
With existing technologies, since distributed tunnel endpoint 120 operate as a tunnel endpoint for a plurality of tunnels, one of tunnels 132 and 134 is elected as a designated forwarder of BUM traffic. Suppose that tunnel 132 is elected as the designated forwarder to carry BUM traffic belonging to a respective VLAN associated with distributed tunnel endpoint 120. As a result, tunnel 132 can become bottlenecked while tunnel 134 can remain underutilized. Furthermore, if tunnel 132 or service node 142 fails, tunnel 134 (and service node 144) is selected as the new designated forwarder of BUM traffic. This leads to reprogramming of a respective VLAN (e.g., reprogramming of multicast group identifiers associated with that VLAN) for newly selected tunnel 134. Such a migration can be intensive for a large number of VLANs and cause delay in traffic switchover from service node 142 to service node 144.
To solve this problem, switches 101 and 102, which operate in conjunction with each other as distributed tunnel endpoint 120, select one of switches 101 and 102 as a distribution master based on selection criteria. In some embodiments, the selection criteria includes: the switch with the lowest (or highest) switch identifier value and with at least one active service tunnel. If a new switch joins network 100 with superior selection criteria, barring a failure, the already selected switch remains the distribution master. For example, if the switch identifier is an IP address or a TRILL RBridge identifier, the value of the bits representing the switch identifier can be used to determine the distribution master. Suppose that switch 101 is selected as the distribution master. Since both switches 101 and 102 can operate as the tunnel endpoint for tunnels 132 and 134, both switches 101 and 102 is configured with tunnels 132 and 134. In some embodiments, switches 101 and 102 maintain a list of VLANs configured in distributed tunnel endpoint 120.
Based on the list, switch 101 can map a respective VLAN to one of service tunnels 132 and 134. This mapping can be based on one or more of: the number of tunnels (i.e., two in
Upon receiving the mapping, a respective switch forwards a packet belonging to a VLAN via the tunnel mapped to the VLAN. Suppose that switch 102 receives a packet requiring service from a service node (e.g., requiring access to a tunnel gateway). If the packet belongs to a VLAN mapped to tunnel 134, switch 102 determines from the mapping that the packet should be forwarded via tunnel 134. Switch 102 then encapsulates the packet with a tunnel encapsulation header (e.g., a VXLAN header), and sets virtual switch identifier 152 and the identifier of service node 144 as source and destination addresses of the encapsulation header, respectively. Switch 102 identifies a local port associated with tunnel 134 as the egress port for the encapsulated packet (e.g., from a local forwarding table) and transmits the encapsulated packet via the port.
Furthermore, to avoid redundant traffic, one of switches 101 and 102 in the distributed tunnel endpoint is elected as a replicator for BUM traffic for a specific tunnel. For example, switches 101 and 102 can be elected as the replicator of BUM traffic for tunnels 132 and 134, respectively. Switches 101 and 102 then become responsible for forwarding BUM traffic belonging to VLANs designated for tunnels 132 and 134, respectively. As a result, switch 101 is configured with multicast forwarding information for the VLANs mapped to tunnel 132. For example, forwarding entries for the multicast group identifiers of these VLANs are configured in the forwarding table of switch 101 (e.g., in a CAM of switch 101).
Similarly, switch 102 is configured with multicast forwarding information for the VLANs mapped to tunnel 134. In this way, the VLANs are distributed among tunnels 132 and 134, and switches 101 and 102 are designated as the replicator for the VLANs designated for tunnels 132 and 134, respectively. Forwarding BUM traffic of different VLANs, therefore, becomes load balanced among tunnels 132 and 134, and among switches 101 and 102. In some embodiments, an administrator provides a replicator for a tunnel (e.g., during the tunnel configuration). In some further embodiments, the distribution master, which is switch 101, can determine a replicator for a tunnel based on one or more of: the number of tunnels, the number of VLANs, and the traffic volume of a respective tunnel.
In some embodiments, virtual switch 122 is further associated with a virtual MAC address 154. In response to receiving an Address Resolution Protocol (ARP) query for virtual switch identifier 152, switch 101 (or switch 102) responds with an ARP response comprising virtual MAC address 154. In some embodiments, the distribution master can be designated for responding to ARP queries for virtual switch identifier 152.
During operation, switch 101, operating as the distribution master, maps a respective VLAN to a corresponding tunnel. For example, VLANs 212 and 214 are mapped to tunnel 132, and VLANs 216 and 218 are mapped to tunnel 134. Switch 101 includes this mapping in VLAN distribution table 202. Switch 101 further maps the replicator of BUM traffic for a respective tunnel in VLAN distribution table 202. If switches 101 and 102 are selected as replicators for tunnels 132 and 134, respectively, switch 101 becomes responsible for forwarding BUM traffic of VLANs 212 and 214, and switch 102 becomes responsible for forwarding BUM traffic of VLANs 216 and 218. Switch 101 then includes VLAN distribution table 202 in a notification message and forwards the notification message to switch 102. In some embodiments, switch 101 forwards the notification message to a respective other switch in network 100.
Switch 102 receives the notification message and stores VLAN distribution table 202 in a local storage device. This allows switch 102 to select service tunnels for BUM traffic based on VLAN distribution table 202. Suppose that switch 102 receives a packet belonging to BUM traffic of VLAN 214. Switch 102 determines that switch 101 is the replicator for tunnel 132 mapped to VLAN 214 from VLAN distribution table 202. Switch 102 then forwards the packet to switch 101. Furthermore, based on VLAN distribution table 202, a respective switch maintains a tunnel mapping table. The tunnel mapping table maps a respective VLAN to an indicator, which indicates whether the switch is elected as the replicator for the tunnel mapped to the VLAN. In some embodiments, the tunnel mapping table is in the forwarding table of a switch and can also include an egress port corresponding to the tunnel (not shown in
In some embodiments, the indicator is represented by a service tunnel identifier or a “NIL” entry. For example, tunnel mapping table 204 of switch 101 indicates that VLANs 212 and 214 are mapped to tunnel 132. Since VLANs 216 and 218 are mapped to tunnel 134, and switch 101 is not the replicator for tunnel 134, tunnel mapping table 204 further indicates that VLANs 216 and 218 do not have a forwarding tunnel for BUM traffic from switch 101 (e.g., represented by a “NIL” entry). Similarly, tunnel mapping table 206 of switch 102 indicates that VLANs 216 and 218 are mapped to tunnel 134, and VLANs 212 and 214 do not have a forwarding tunnel for BUM traffic from switch 102. In tables 204 and 206, a VLAN can be represented by a multicast group identifier for that VLAN. Furthermore, tables 204 and 206 can be part of forwarding tables of switches 101 and 102, respectively.
Switches 101 and 102 then updates tunnel mapping tables 204 and 206, respectively, based on VLAN distribution table 202. Tunnel mapping table 204 of switch 101 indicates that tunnel 132 is the forwarding tunnel for VLANs 212 and 214. Since VLAN 218 is now mapped to tunnel 232, and switch 101 is the replicator for tunnel 232, tunnel mapping table 204 further indicates that tunnel 232 is the forwarding tunnel for VLAN 218. Tunnel mapping table 204 also indicates that VLAN 216 does not have a forwarding tunnel for BUM traffic from switch 101. Similarly, tunnel mapping table 206 of switch 102 indicates that tunnel 134 is the forwarding tunnel for VLAN 216, and VLANs 212, 214, and 218 do not have a forwarding tunnel for BUM traffic from switch 102.
Here, switches 101 and 102 remain replicators for tunnels 232 and 134, respectively. Correspondingly, switches 101 and 102 update tunnel mapping tables 204 and 206, respectively, based on VLAN distribution table 202. Tunnel mapping table 204 of switch 101 indicates that tunnel 232 is the forwarding tunnel for VLANs 214 and 218. Tunnel mapping table 204 also indicates that VLANs 212 and 216 do not have a forwarding tunnel for BUM traffic from switch 101. Similarly, tunnel mapping table 206 of switch 102 indicates that tunnel 134 is the forwarding tunnel for VLANs 212 and 216, and VLANs 214 and 218 do not have a forwarding tunnel for BUM traffic from switch 102.
Here, switches 101 and 102 remain replicators for tunnels 232 and 134, respectively. Correspondingly, switches 101 and 102 update tunnel mapping tables 204 and 206, respectively, based on VLAN distribution table 202. Tunnel mapping table 204 of switch 101 indicates that tunnel 232 is the forwarding tunnel for VLANs 216 and 218. Tunnel mapping table 204 also indicates that VLAN 212 does not have a forwarding tunnel for BUM traffic from switch 101. Similarly, tunnel mapping table 206 of switch 102 indicates that tunnel 134 is the forwarding tunnel for VLAN 212, and VLANs 216 and 218 do not have a forwarding tunnel for BUM traffic from switch 102.
If the local switch the distribution master, the switch recalculates the VLAN to tunnel mappings based on the tunnel event and one or more mapping parameters (operation 306). For example, if the tunnel event is a deletion of a tunnel, that tunnel is deleted from the mapping and the VLANs are reallocated to the remaining tunnels. Examples of mapping parameters include, but are not limited to, the number of tunnels, the number of VLANs, and the traffic volume of a respective tunnel. If the distribution master is selected for the first time, the distribution master simply calculates the mapping.
The switch then updates the VLAN distribution table based on the recalculated mappings (operation 308) and constructs a notification message comprising the updated VLAN distribution table (operation 310). The switch encapsulates the notification message with an encapsulation header (operation 312) and sends the encapsulated message via one or more egress ports corresponding to the other switches in the distributed tunnel endpoint (operation 314). The switch then determines the tunnels for which the switch is the replicator of BUM traffic (operation 316) and update the local tunnel mapping table based on the determined tunnels (operation 318).
The switch then checks whether the local switch is the replicator of BUM traffic for the tunnel (operation 410) (e.g., from the tunnel mapping table). If the local switch is the replicator, the switch determines forwarding information associated with the identified tunnel (operation 414) and encapsulates the packet with a tunnel encapsulation header (operation 416). This tunnel encapsulation header can be based on the identified tunnel. The switch sets the virtual switch identifier of the distributed tunnel gateway as the ingress address of the tunnel encapsulation header (operation 418) and forwards the encapsulated packet via the identified tunnel (operation 420). Since the switch sets the identifier of the service node associated with the tunnel as the egress address of the tunnel encapsulation header, forwarding the encapsulated packet includes determining an egress port corresponding to the identifier of the service node and transmitting via the port.
If the local switch meets the selection criteria, the switch elects the local switch as the new distribution master (operation 508). The switch then recalculates the VLAN distribution table (operation 510) and sends the recalculated VLAN distribution table to other switches in the distributed tunnel endpoint (operation 512). In some embodiments, the switch sends the recalculated VLAN distribution table to a respective other switches in the network. On the other hand, if the local switch does not meet the selection criteria, another switch is elected as the distribution master. The switch then receives recalculated VLAN distribution table from the other switch elected as the new distribution master (operation 514).
In some embodiments, switch 600 maintains a membership in a fabric switch, as described in conjunction with
Communication ports 602 can include inter-switch communication channels for communication within the fabric switch. This inter-switch communication channel can be implemented via a regular communication port and based on any open or proprietary format. Communication ports 602 can also include one or more extension communication ports for communication between neighbor fabric switches. Communication ports 602 can include one or more TRILL ports capable of receiving frames encapsulated in a TRILL header. Communication ports 602 can also include one or more IP ports capable of receiving IP packets. An IP port is capable of receiving an IP packet and can be configured with an IP address. Packet processor 610 can process TRILL-encapsulated frames and/or IP packets (e.g., tunnel encapsulated packets).
During operation, mapping module 620 maintains a first and a second mappings. The first mapping, which can be in a local VLAN distribution table in storage device 650, is between a service tunnel identifier and a VLAN identifier. The second mapping is between the VLAN identifier and an indicator, as described in conjunction with
In some embodiments, tunnel management module 630 operates switch 600 as a distributed tunnel endpoint in conjunction with another switch for a plurality of service tunnels. Switch 600 and the other switch are associated with an IP address indicating the distributed tunnel endpoint. Packet processor 610 encapsulates the packet with an encapsulation header and sets the IP address as a source address of the encapsulation header. Tunnel management module 630 can elect a distribution master from switch 600 and the other switch.
Note that the above-mentioned modules can be implemented in hardware as well as in software. In one embodiment, these modules can be embodied in computer-executable instructions stored in a memory which is coupled to one or more processors in switch 600. When executed, these instructions cause the processor(s) to perform the aforementioned functions.
In summary, embodiments of the present invention provide a switch and a method which facilitates efficient management of multi-destination traffic in a distributed tunnel endpoint. In one embodiment, the switch includes a storage device, a mapping module, and a packet processor. During operation, the mapping module maintains a first and a second mappings. The first mapping, which can be in the storage device, is between a first service tunnel identifier and a first VLAN identifier. The second mapping is between the first VLAN identifier and an indicator, which indicates whether the switch is elected as a designated forwarder of multi-destination traffic for the first service tunnel identifier. If the indicator indicates that the switch is the designated forwarder of multi-destination traffic for the first service tunnel identifier, the packet processor determines an egress port, which corresponds to the first service tunnel, for a packet of multi-destination traffic belonging to the first VLAN.
The methods and processes described herein can be embodied as code and/or data, which can be stored in a computer-readable non-transitory storage medium. When a computer system reads and executes the code and/or data stored on the computer-readable non-transitory storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the medium.
The methods and processes described herein can be executed by and/or included in hardware modules or apparatus. These modules or apparatus may include, but are not limited to, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), a dedicated or shared processor that executes a particular software module or a piece of code at a particular time, and/or other programmable-logic devices now known or later developed. When the hardware modules or apparatus are activated, they perform the methods and processes included within them.
The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. They are not intended to be exhaustive or to limit this disclosure. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. The scope of the present invention is defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
201641013840 | Apr 2016 | IN | national |
This application claims the benefit of Indian Provisional Application No. 201641013840, Attorney Docket Number BRCD-3496.0.1.IN, titled “Load Balancing of VXLAN BUM Traffic Across Nodes,” by inventors S. Jessu Paul Anand, Shivalingayya Chikkamath, and Mythilikanth Raman, filed 21 Apr. 2016, the disclosure of which is incorporated by reference herein. The present disclosure is related to U.S. Pat. No. 8,867,552, application Ser. No. 13/087,239, Attorney Docket Number BRCD-3008.1.US.NP, titled “Virtual Cluster Switching,” by inventors Suresh Vobbilisetty and Dilip Chatwani, issued 21 Oct. 2014, filed 14 Apr. 2011, the disclosure of which is incorporated by reference herein.