The present disclosure relates to data center network connectivity.
An Overlay Transport Virtualization (OTV) protocol has been developed to connect a layer 2 network domain that spans across multiple data center sites. The OTV protocol uses a “MAC-in-Internet Protocol (IP)” encapsulation technique to extend the layer 2 domain logically over a layer 3 IP network. Since the OTV protocol uses IP, it does not require any pseudo-wire or tunnel maintenance and provides multi-point connectivity using any available transport.
To avoid issues with data loops and the Spanning Tree Protocol (STP), the OTV protocol supports only a per-virtual local area network (VLAN) load-balancing. For a given VLAN, only one OTV edge switch is permitted to forward packets in and out of the network. This edge switch is known as an authorized edge device (AED).
Overview
Techniques are presented herein for designating in an overlay transport virtualization (OTV) network connected to a data center site an edge device to act as a backup authoritative edge device (BAED), also referred to herein as a “non-AED” for an authoritative edge device (AED) for multicast packet encapsulation and forwarding. Data traffic associated with a given virtual local area network (VLAN) is detected from a multicast source in the data center site addressed to recipients in a site group. A mapping is generated between the site group and a core data group in the OTV core network for traffic from the multicast source. Advertisements are sent of the mapping to edge devices in another data center across the OTV network. A similar configuration is presented herein for a BAED for inbound multicast traffic to a data center.
Referring first to
A plurality of access switches 125, 130 within a data center may communicate with each other within their respective layer 2 network using layer 2 protocols (for example, Ethernet), although for simplicity only one access switch per data center site is shown. The edge switches 135(a)-(d) communicate with the access switches 125, 130 at their respective data center using layer 2 network connectivity but communicate with devices in the OTV core network 140 which resides in an IP cloud using layer 3 network connectivity (for example, Transmission Control Protocol—Internet Protocol (TCP/IP)).
It is desirable to extend layer 2 domains over IP, as this allows multiple data centers to be treated as one logical data center or site. This is achieved using an Overlay Transport Virtualization (OTV) protocol. OTV is a “Media Access Control (MAC) in IP” technique for supporting layer 2 virtual private networks (VPNs) over any transport. The overlay nature of OTV allows it to work over any transport as long as this transport can forward IP packets. Any optimizations performed for IP in the transport will benefit the OTV encapsulated traffic. OTV can extend the layer 2 domains across geographically distant data centers by providing built-in filtering capabilities to localize the most common networking protocols (Spanning Tree Protocol, VLAN Trunking Protocol, and Hot Standby Router Protocol (HSRP), etc.) and prevent them from traversing the overlay network, therefore keeping protocol failures from propagating across data center sites. Unlike traditional layer 2 VPNs, which rely on layer 2 flooding to propagate MAC address reachability, OTV uses a protocol to proactively advertise the MAC addresses learned at each site. The protocol advertisement takes place in the background, with no configuration required by the network administrator.
OTV is referred to as an “overlay” method of virtualization versus traditional “in-the-network” type systems in which multiple routing and forwarding tables are maintained in every device between a source and a destination. With OTV, state is maintained at the network edges (edge switches at a data center, for example, edge switches 135(a)-135(d)), but is not required at other devices in a network site or in a core network. OTV operates at edge devices interposed between the data center sites and the OTV core network 140. The edge devices perform layer 2 learning and forwarding functions (similar to a traditional layer 2 switch) on their site-facing interfaces (internal data center interfaces) and perform IP-based virtualization functions on their OTV core network-facing interfaces, for which an overlay network is created. The dual functionality of the edge device provides the ability to connect together layer 2 networks, layer 3 networks, or hybrid (layer 2 and layer 3) networks.
Since Spanning Tree Protocols (STP) are maintained locally (i.e. within each given data center), data loops may result between two or more data centers connected via an OTV network. To avoid this problem, exactly one edge device may be designated to send and receive data packets across the OTV network for a given VLAN. This device is called the authoritative edge device (AED). A given edge device may act as the AED for one or more VLANs, but does not act as the AED for a second set of one or more VLANs, while another edge device may act as the AED for the second set of VLANs, etc. AEDs may be designated by an AED server, not shown in
In the example of
When source AED 135(a) forwards layer 2 traffic onto the layer 3 OTV core network 140, it encapsulates layer 2 packets (or frames) into layer 3 packets. Similarly, when receiver AED 135(c) receives one or more layer 3 packets, it decapsulates them into layer 2 packets before forwarding to the receiver 120. Encapsulation and decapsulation may be programmed into the hardware of each edge device.
When sending a source-specific multicast message for a given VLAN, source 115 may forward the multicast message to its AED 135(a). The multicast message is forwarded to the OTV core network 140, where it is duplicated for each recipient. The OTV core network 140 maintains one or more multicast trees, also known as source-specific multicast (SSM) trees, for each VLAN. The OTV core network 140 forwards the multicast message only to receivers that have requested to receive that multicast message stream. The group of receivers for a given source-specific multicast stream on a given VLAN for a multicast-enabled OTV core network 140 is known as a data group or core group. Multiple VLANs may also use the same data group and the same multicast tree. This helps reduce the number of multicast trees in the core.
Reference is now made to
It is useful to distinguish between control plane messages sent to a control group and data messages sent to a data group. The mapping and advertising of steps 215 and 220 may take place in the control plane. Advertising is done to all edge devices across the OTV network listening on the control group, while multicast data packets, once the network is configured, are sent in the data plane to members of the data group. The purpose of the advertising is so that other edge devices connected to the OTV core network may be updated so that mappings are uniform across the system 100. This process of unifying topological information across network devices using a protocol is known as convergence. The control group is an identifier that is associated with all OTV edge devices for a given multicast overlay network. The control group is used to discover all remote sites in the control plane (using neighbor discovery packets, exchanging MAC address reachability, etc.). Control groups are specifically configured to transport the OTV protocol control packets across the data center sites. The data group corresponds to a source-specific multicast group—an identifier of all members that have subscribed to receive multicast data traffic from a given source.
As mentioned previously, potential multicast recipients subscribe or join to receive multicast messages. This may be done after the sender AED has advertised site group to data group mapping in step 220. This process is shown in
One problem with this approach is shown in
The process is similar if there is an AED failover at a receiver site, for example a failure of AED 135(c). The failed AED would stop receiving and/or decapsulating traffic into the receiver site, causing a traffic outage. The AED server may elect a new edge device, such as device 135(d), to act as the new AED. The new AED would need to discover receivers of multicast messages, and then join the specific multicast tree advertised by the multicast source AED. In addition, the new receiver AED would have to install the necessary layer 3 packet decapsulation routes to stream traffic into the site.
Whether the sending or receiving AED device fails, all of these steps may cause perceptible delays as the replacement AED is converged, and in the interim there may be a complete traffic loss for the VLANs using the failed AED. As the size of the network is scaled, the delays can be significant and non-deterministic.
A solution is shown in
Both the source AED 135(a) and backup source AED 135(b) may periodically send dummy packets to maintain their multicast trees in the OTV core. This is to ensure that, for each advertised data group, the associated multicast tree is maintained in the core even if multicast traffic is not being sent on the channel. This might happen, for example, if no traffic is being streamed to a data group because the edge device is a backup AED, or if there are no receivers on any of the VLANs mapped to the data group.
Reference is now made to
Reference is now made to
Reference is now made to
Reference is now made to
The memory 1325 stores instructions for OTV packet routing process logic 1335. Thus, the memory 1325 may comprise one or more computer readable storage media encoded with software comprising computer executable instructions and when the software is executed operable to perform the operations described herein for the process logic 1335. The processor 1310 executes the instructions stored in memory 1325 for process logic 1335 in order to perform the operations described herein.
As will become more apparent from the foregoing description, the processor 1310 generates messages to be transmitted and processes received messages for communication between with at least one edge switch device at another site using the OTV protocol that extends layer 2 network connectivity to layer 3 network connectivity over IP between edge switches at a plurality of sites. The processor 1310 also generates and sends via the OTV protocol a layer 3 message that is configured to advertise the MAC address of the at least one endpoint at a first site to enable the at least one edge switch at a second site to perform multipath routing to the endpoint device at the first site based on each of the edge switches in the edge switch cluster at the first site. Although AED devices above are commonly depicted as “source” and “receiver” AEDs, it should be understood that each AED device may be bidirectional. This is to say that each AED which may be the “source” AED forwarding a source-specific multicast message onto the OTV core network for one multicast may also be a “receiver” AED receiving source-specific multicast messages from the OTV core network for another multicast.
In summary, a method is provided designating an edge device, connected to an OTV network, in a first data center to act as a BAED for an AED for multicast message encapsulation and forwarding. Traffic may be detected from a multicast source in the first data center destined for recipients in a site group, wherein the traffic is associated with a particular VLAN. Mappings may be generated between the site group and a core group in the OTV network for the traffic from the multicast source, and an advertisement may be sent of the mapping to one or more edge devices in a second data center across the OTV network.
In addition, an apparatus (e.g., an edge switch) is provided that comprises a network interface device configured to enable communications over a layer 2 network and over a layer 3 network. Switch hardware may be configured to perform switching operations in the layer 2 network and the layer 3 network. A processor may also be provided configured to be coupled to the network interface device and to the switch hardware circuitry, the processor configured to operate an edge switch at a first data center site that comprises one or more endpoint devices, the processor further configured to receive a designation as a BAED for an AED for multicast message encapsulation and forwarding. Traffic from a multicast source may be detected in the first data center destined for recipients in a site group, wherein the traffic is associated with a particular VLAN. A mapping may be generated between the site group and a core group in the OTV network for the traffic from the multicast source, and an advertisement of the mapping is generated to be sent to one or more edge devices in a second data center across the OTV network.
Further provided herein is one or more computer readable storage media encoded with software comprising computer executable instructions and when the software is executed operable to receive a designation as a BAED for an AED for multicast message encapsulation and forwarding. Traffic from a multicast source may be detected in the first data center destined for recipients in a site group, wherein the traffic is associated with a particular VLAN. A mapping may be generated between the site group and a core group in the OTV network for the traffic from the multicast source, and an advertisement of the mapping is generated to be sent to one or more edge devices in a second data center across the OTV network.
A scheme is presented to ensure rapid and deterministic convergence in an OTV network upon the failure of an AED device. This is accomplished by a model for the backup-AED devices to pre-learn the sources, pre-map to data groups in the multicast core, join the relevant channels, and communicate, maintain, and program all necessary forwarding states to enable instantaneous encapsulation/decapsulation of multicast frames in the event of the failover of the AED device or as soon as an AED failover is detected. Since the solution is independent of the number of OTV Edge devices, or the scale of multicast routes or VLANs enabled in an OTV network, it guarantees ultra-fast convergence in a scaled setup.
This solution has at least the following advantages: The presented scheme guarantees rapid convergence by ensuring that forwarding of multicast frames remains uninterrupted during the failover of an AED device in an OTV network. The solution is deterministic and is independent of the number of OTV sites, edge devices, or number of multicast routes, sources, receivers, or VLANs enabled in the OTV network. This is advantageous over the current behavior where the AED failover causes traffic loss that is significant and of a non-deterministic duration.
The above description is intended by way of example only.
This application claims priority to U.S. Provisional Application No. 61/864,019, filed Aug. 9, 2013, the entirety of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
6856991 | Srivastava | Feb 2005 | B1 |
7159034 | Rai | Jan 2007 | B1 |
7483387 | Guichard et al. | Jan 2009 | B2 |
8166205 | Farinacci et al. | Apr 2012 | B2 |
20030026260 | Ogasawara et al. | Feb 2003 | A1 |
20030112799 | Chandra et al. | Jun 2003 | A1 |
20030142685 | Bare | Jul 2003 | A1 |
20030225887 | Purnadi et al. | Dec 2003 | A1 |
20050083955 | Guichard et al. | Apr 2005 | A1 |
20050141499 | Ma et al. | Jun 2005 | A1 |
20050149531 | Srivastava | Jul 2005 | A1 |
20050163146 | Ota et al. | Jul 2005 | A1 |
20050283531 | Chen et al. | Dec 2005 | A1 |
20060002299 | Mushtaq et al. | Jan 2006 | A1 |
20060159083 | Ward et al. | Jul 2006 | A1 |
20060198323 | Finn | Sep 2006 | A1 |
20060209831 | Shepherd et al. | Sep 2006 | A1 |
20070076732 | Kim | Apr 2007 | A1 |
20070189221 | Isobe et al. | Aug 2007 | A1 |
20070264997 | Chaudhary et al. | Nov 2007 | A1 |
20080120176 | Batni et al. | May 2008 | A1 |
20080304412 | Schine et al. | Dec 2008 | A1 |
20080304476 | Pirbhai et al. | Dec 2008 | A1 |
20090037607 | Farinacci | Feb 2009 | A1 |
20090083403 | Xu et al. | Mar 2009 | A1 |
20100020806 | Vahdat et al. | Jan 2010 | A1 |
20100315946 | Salam | Dec 2010 | A1 |
20110310729 | Raman et al. | Dec 2011 | A1 |
20120106322 | Gero | May 2012 | A1 |
20120131216 | Jain et al. | May 2012 | A1 |
20120182885 | Bradford | Jul 2012 | A1 |
20130021896 | Pu et al. | Jan 2013 | A1 |
20130114465 | McGovern | May 2013 | A1 |
Entry |
---|
Rosen et al., “BGP/ MPLS VPNs”, Network Working Group Request for Comments: 2547, Mar. 1999, http://tools.ietf.org/html/rfc2547, 26 Pages. |
Number | Date | Country | |
---|---|---|---|
20150043329 A1 | Feb 2015 | US |
Number | Date | Country | |
---|---|---|---|
61864019 | Aug 2013 | US |