1. Field
The present disclosure relates to network design. More specifically, the present disclosure relates to a method and system for constructing a scalable switching system that supports layer-3 routing while facilitating automatic configuration.
2. Related Art
The growth of the Internet has brought with it an increasing demand for bandwidth. As a result, equipment vendors race to build larger and faster switches with versatile capabilities, such as layer-3 forwarding, to move more traffic efficiently. However, the size of a switch cannot grow infinitely. It is limited by physical space, power consumption, and design complexity, to name a few factors. Furthermore, switches with higher capability are usually more complex and expensive. More importantly, because an overly large and complex system often does not provide economy of scale, simply increasing the size and capability of a switch may prove economically unviable due to the increased per-port cost.
One way to increase the throughput of a switch system is to use switch stacking. In switch stacking, multiple smaller-scale, identical switches are interconnected in a special pattern to form a larger logical switch. The amount of required manual configuration and topological limitations for switch stacking becomes prohibitively tedious when the stack reaches a certain size, which precludes switch stacking from being a practical option in building a large-scale switching system.
Meanwhile, layer-2 (e.g., Ethernet) switching technologies continue to evolve. More routing-like functionalities, which have traditionally been the characteristics of layer-3 (e.g., Internet Protocol or IP) networks, are migrating into layer-2. Notably, the recent development of the Transparent Interconnection of Lots of Links (TRILL) protocol allows Ethernet switches to function more like routing devices. TRILL overcomes the inherent inefficiency of the conventional spanning tree protocol, which forces layer-2 switches to be coupled in a logical spanning-tree topology to avoid looping. TRILL allows routing bridges (RBridges) to be coupled in an arbitrary topology without the risk of looping by implementing routing functions in switches and including a hop count in the TRILL header.
While TRILL brings many desirable features to layer-2 networks, some issues remain unsolved when layer-3 processing is desired.
One embodiment of the present invention provides a switch. The switch includes an IP header processor and a forwarding mechanism. The IP header processor identifies a destination IP address in a packet encapsulated with an inner Ethernet header, a TRILL header, and an outer Ethernet header. The forwarding mechanism determines an output port and constructs a new header for the packet based on the destination IP address. The switch also includes a packet processor which determines whether (1) an inner destination media access control (MAC) address corresponds to a local MAC address assigned to the switch; (2) a destination RBridge identifier (RBridge ID) corresponds to a local RBridge identifier assigned to the switch; and (3) an outer destination MAC address corresponds to the local MAC address.
In a variation on this embodiment, the packet processor determines a first virtual local area network (VLAN) tag in the inner Ethernet header, wherein the new header includes a new inner Ethernet header which comprises a second VLAN tag.
In a variation on this embodiment, the switch includes a control mechanism which forms a virtual cluster switch in conjunction with one or more additional switches.
In a variation on this embodiment, the virtual cluster switch is an Ethernet fabric switch functioning as a logical Ethernet switch.
In a variation on this embodiment, the switch includes a switching mechanism switches the packet between VLANs based on the destination IP address.
In a variation on this embodiment, the RBridge identifier is a virtual RBridge identifier and the destination IP address is a virtual IP address assigned to a virtual IP router associated with the virtual RBridge identifier.
In a variation on this embodiment, the virtual IP router is formed by operating the switch in conjunction with at least another physical switch as a single logical router.
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
In embodiments of the present invention, the problem of providing scalable and flexible layer-3 (e.g., IP) support in a TRILL network is solved by facilitating IP routing in a number of RBridges in the TRILL network. The availability of IP processing within a TRILL network allows cross-layer-2-domain traffic (e.g., traffic across different VLANs) to be forwarded within a TRILL network, which reduces forwarding overhead. Usually, the IP router portion of one of these IP-capable RBridges is assigned as a default gateway router to an end device coupled to a TRILL network. Wherever the end device sends a frame to outside of its local network (e.g., a VLAN), the frame is forwarded to and processed by the IP router portion of the RBridge. This layer-3 processing occurs within the TRILL network. Note that, in a conventional TRILL network, such layer-3 processing has to be done by an IP router residing outside the TRILL network.
In some embodiments, the end-device may be coupled to the TRILL network via an ingress RBridge without IP processing capability. Under such a scenario, the TRILL RBridge portion of an IP-capable RBridge acts as an egress RBridge and the IP router portion of the RBridge can act as the default gateway router. A frame from the end device is received at the ingress RBridge and encapsulated in a TRILL packet, wherein the TRILL packet sets the egress RBridge identifier as the destination RBridge identifier, and the MAC address of the egress RBridge as the inner destination MAC address. The packet is then forwarded though the TRILL network and reaches the egress RBridge, where the outer destination MAC address of the packet is the MAC address of the egress RBridge. The IP router portion of the egress RBridge then processes the IP header in the frame and makes the layer-3 forwarding decision based on the destination IP address of the frame.
In some embodiments, the IP router portion of an IP-capable RBridge may be associated with multiple VLANs associated with the TRILL network. If the destination end device of the frame belongs to one of the associated VLANs, the IP router can obtain the MAC address of the destination end device using ARP requests within that VLAN. The corresponding RBridge of the IP router then sets the RBridge to which the destination end device is coupled as the egress RBridge and forwards the frame to the egress RBridge over the TRILL network.
Although the present disclosure is presented using examples based on the TRILL protocol, embodiments of the present invention are not limited to TRILL networks, or networks defined in a particular Open System Interconnection Reference Model (OSI reference model) layer.
The term “RBridge” refers to routing bridges, which are bridges implementing the TRILL protocol as described in IETF Request for Comments (RFC) “Routing Bridges (RBridges): Base Protocol Specification,” available at http://tools.ietf.org/html/rfc6325, which is incorporated by reference herein. Embodiments of the present invention are not limited to applications among RBridges. Other types of switches, routers, and forwarders can also be used.
In this disclosure, the term “edge port” refers to a port which sends/receives data frames in native Ethernet format. The term “TRILL port” refers to a port which sends/receives data frames encapsulated with a TRILL header and outer MAC header.
The term “end device” refers to a network device that is typically not TRILL-capable. “End device” is a relative term with respect to the TRILL network. However, “end device” does not necessarily mean that the network device is an end host. An end device can be a host, a conventional layer-2 switch, or any other type of network device. Additionally, an end device can be coupled to other switches or hosts further away from the TRILL network. In other words, an end device can be an aggregation point for a number of network devices to enter the TRILL network.
The term “IP-capable RBridge” refers to a physical RBridge that can process and route IP packets. An IP-capable RBridge can be coupled to a layer-3 network and can forward IP packets from end devices to the layer-3 network. A number of IP-capable RBridges can form a virtual RBridge and a corresponding virtual IP router, thereby facilitating a virtual gateway router for end devices that supports redundancy and load-balancing. In this disclosure, an RBridge which forms a virtual RBridge and a virtual IP router is also referred to as a “gateway” RBridge. A gateway RBridge responds to ARP requests for the virtual IP address with a virtual MAC address. In various embodiments, any arbitrary number of gateway RBridges can form the virtual RBridge. As gateway RBridges can process both TRILL and IP packets, in this disclosure the term “gateway RBridge” can refer to a physical RBridge in a TRILL network or a physical router in an IP network.
The term “IP router” refers to the IP-capable portion of an RBridge or a stand-alone IP router. In this disclosure, the terms “IP router” and “router” are used interchangeably.
The term “frame” refers to a group of bits that can be transported together across a network. “Frame” should not be interpreted as limiting embodiments of the present invention to layer-2 networks. “Frame” can be replaced by other terminologies referring to a group of bits, such as “packet,” “cell,” or “datagram.”
The term “RBridge identifier” refers to a group of bits that can be used to identify an RBridge. Note that the TRILL standard uses “RBridge ID” to denote a 48-bit intermediate-system-to-intermediate-system (IS-IS) System ID assigned to an RBridge, and “RBridge nickname” to denote a 16-bit value that serves as an abbreviation for the “RBridge ID.” In this disclosure, “RBridge identifier” is used as a generic term and is not limited to any bit format, and can refer to “RBridge ID” or “RBridge nickname” or any other format that can identify an RBridge.
RBridges in network 100 use edge ports to communicate to end devices and TRILL ports to communicate to other RBridges. For example, RBridge 104 is coupled to end device 122 via an edge port and to RBridges 105, 101, and 102 via TRILL ports. An end host coupled to an edge port may be a host machine or an aggregation node. For example, end devices 122, 124, 126, and 128 are host machines, wherein end devices 122 and 128 are directly coupled to network 100, and end devices 124 and 126 are coupled to network 100 via their aggregation node, a layer-2 bridge 130.
In
During operation that does not involve layer-3 processing in RBridges, an end device coupled to the TRILL network may select the default gateway from a layer-3 network and use the corresponding IP address as a default gateway router address. For example, in
In embodiments of the present invention, as illustrated in
In some embodiments, the TRILL network may be a virtual cluster switch (VCS). In a VCS, any number of RBridges in any arbitrary topology may logically operate as a single switch. Any new RBridge may join or leave the VCS in “plug-and-play” mode without any manual configuration.
Note that TRILL is only used as a transport between the switches within network 100. This is because TRILL can readily accommodate native Ethernet frames. Also, the TRILL standards provide a ready-to-use forwarding mechanism that can be used in any routed network with arbitrary topology. Embodiments of the present invention should not be limited to using only TRILL as the transport. Other protocols (such as multi-protocol label switching (MPLS)), either public or proprietary, can also be used for the transport.
In the example in
In the example in
In some embodiments, layer-3 processing capabilities can be distributed to multiple or all TRILL RBridges. In some embodiments, layer-3 processing capabilities associated different the VLANs can be distributed selectively across multiple RBridges.
In some embodiments, a layer-3 interface on an RBridge corresponding to a VLAN is a Switch Virtual Interface (SVI). For example, RBridge 304 in
However, when end device 311 sends a frame to end device 317, RBridge 304 cannot forward the frame to end device 317 because RBridge 304 does not have an SVI on VLAN 322, to which end device 317 belongs. As a result, upon receiving a frame destined to end device 317 from end device 311, RBridge encapsulates the frame using a TRILL header with egress RBridge identifier corresponding to RBridge 306 because it has SVIs to all VLANs. RBridge 304 then forwards the frame to RBridge 306. The frame is routed though the TRILL network and reaches RBridge 306 when the outer destination MAC addresses match the MAC address of RBridge 306. Upon receiving the frame, RBridge 306 recognizes that the frame's outer destination MAC address is a local MAC address. RBridge 306 then removes the TRILL encapsulations, encapsulates the IP packet with a new Ethernet header with a destination MAC address corresponding to end device 317 in VLAN 322, and forwards the frame accordingly.
In this example, if end device 317 sends a frame to end device 318, the frame can be routed on layer-3 at RBridge 307 because RBridge 307 has SVIs for VLANs 322 and 326. As the frame does not travel to any other RBridge in network 300, it incurs lower latency while saving bandwidth in network 300. Similarly, if end device 317 sends a frame to end device 312, the frame can be routed on layer-3 at the IP router portion of either RBridge 306 or 307 as both have SVIs for VLANs 322 and 326. If all RBridges in the TRILL network have SVIs for all VLANs, inter-VLAN switching is possible at each RBridge.
If the frame is received on an edge port and the destination is coupled to a local edge port (operation 410), then the RBridge transmits the frame to the destination end device coupled to a local edge port (operation 414).
If the frame is received from a TRILL port (operation 404), the RBridge checks whether itself is the egress RBridge of the TRILL packet (operation 408). If not, then the RBridge forwards the TRILL packet to the TRILL network (operation 418). Otherwise, the RBridge transmits the frame to the destination end device coupled to a local edge port (operation 414).
If the frame's destination MAC address is not coupled to a local edge port, then the RBridge determines whether the frame's destination MAC address is the RBridge's MAC address (operation 458). If the destination MAC address is not the RBridge's MAC address, then the RBridge encapsulates the frame in a TRILL packet and sets the RBridge identifier of a gateway RBridge as the egress RBridge identifier (operation 466). The RBridge then forwards the TRILL packet to the TRILL network (operation 476). On the other hand, if the frame's destination MAC address is the RBridge's MAC address (operation 458), then the RBridge performs layer-3 processing on the frame (operation 468) and determines the outgoing port (operation 470).
The RBridge then determines the type of the outgoing port (operation 462). If the outgoing port is an edge port, which means the destination end device is coupled locally, the RBridge forwards the frame, which is Ethernet encapsulated with the end device's MAC address as the destination MAC address, to the destination end device (operation 480). In some embodiments, the end device can be a layer-3 (e.g., IP) router. If the outgoing port is a TRILL port, then the end device is connected to a remote RBridge. Hence, the RBridge obtains the
RBridge identifier of the RBridge to which the destination end device is coupled to based on the MAC address of the destination end device (operation 472). The RBridge then encapsulates the frame in a TRILL packet and sets the obtained RBridge identifier as the egress RBridge identifier (operation 474). The RBridge then forwards the TRILL packet to the TRILL network (operation 476).
If the frame is received from a TRILL port (operation 454), the RBridge checks whether itself is the egress RBridge of the TRILL packet (operation 460). If not, then the RBridge forwards the TRILL packet to the TRILL network (operation 476). Otherwise, the RBridge forwards the frame to the destination end device coupled to a local edge port (operation 480). In some embodiments, the end device can be a layer-3 router, in which case the forwarding includes layer-3 processing on the frame.
In some embodiments, a number of TRILL RBridges with IP processing capabilities may act as layer-3 routers for an end device. These RBridges can form a virtual RBridge, which is assigned with a virtual RBridge identifier. Furthermore, these RBridges form a virtual IP router, which is assigned with a virtual IP address and a corresponding virtual MAC address. This virtual IP router operates as a default gateway router, which can provide redundancy and load balancing.
Gateway RBridges 511, 512, and 513 form a virtual RBridge 530 by operating as a single logical RBridge in TRILL network 500. Similarly, the corresponding IP routers 521, 522, and 523 form a virtual IP router 540 by operating as a single logical IP router. An end device 562 coupled to network 500 through RBridge 507 can use virtual IP router 540 as the default gateway router to layer-3 network 550.
In embodiments of the present invention, as illustrated in
All the IP-layer router portions of these gateway RBridges are configured to operate as the layer-3 gateway router (i.e., virtual IP router 540) for end device 562. End device 562 uses virtual IP router 540 as the default gateway. Because virtual RBridge 530 is associated with virtual IP router 540, incoming frames from end device 562 destined to network 550 are marked with virtual RBridge 530's identifier as the egress RBridge identifier. Consequently, all frames from end device 562 to network 550 are delivered to one of the gateway RBridges 511, 512, and 513. Hence, load balancing can be achieved among gateway RBridges 511, 512, and 513 for frames sent to virtual RBridge 530.
Also included in network 600 are RBridges 622 and 624, which are layer-3 capable and coupled to an IP network 680. Gateway RBridges 622 and 624 form virtual RBridge 640 with a virtual RBridge identifier 645. Physically co-located IP Routers 632 and 634 within gateway RBridges 622 and 624, respectively, form a virtual IP router 670 which is assigned a virtual IP address 660 and a virtual MAC address 650. Virtual IP address 660 maps to virtual MAC address 650 for ARP requests directed to virtual IP router 670. Furthermore, virtual RBridge identifier 645 is associated with virtual MAC address 650. End devices 652 and 654 can set virtual IP address 660 as their default gateway router address and use ARP to obtain virtual MAC address 650. End devices 652 and 654 send frames with virtual MAC address 650 as the destination address into network 600. The frames are encapsulated in TRILL packets and routed toward virtual RBridge 640 using the corresponding virtual RBridge identifier 645.
In some embodiments, a virtual IP address can be assigned for each VLAN associated with a TRILL network. For example, in
Note that in one embodiment, the virtual MAC address is known to all RBridges in the network 600. Otherwise, both IP routers 632 and 634 receive a frame forwarded to virtual MAC address 650 and results in packet duplication. Hence, after formation of virtual RBridge 640 and virtual IP router 670, all RBridges in network 600 are provided with the knowledge about virtual MAC address 650. That is, virtual MAC address 650 is always “known” to all ingress RBridges in network 600, and frames destined to virtual MAC address 650 are routed through network 600 using TRILL unicast.
In some embodiments, only one gateway RBridge is elected to reply to ARP requests for the virtual IP address. This election can also be VLAN specific.
In some embodiments, all RBridges in network 600 are associated with virtual RBridge 640 and a virtual IP router 670, and share a virtual RBridge identifier 645, a virtual IP address 660, and a virtual MAC address 650. In some embodiments, all RBridges in network 600 may be connected to IP network 680.
Suppose that a failure 864 occurs to link 831 adjacent to gateway RBridge 811. As a result, link 831 is removed from routing decisions in network 800. All frames from end device 870 are still using the virtual MAC address as the destination address, and thus are still forwarded to any of the gateway RBridges via alternative links (e.g., links 832, 833, and 834).
Suppose that a failure 862 occurs during operation that fails link 836 adjacent to IP router 821. Consequently, IP router 821 is disconnected from network 880 and is incapable of forwarding frames to network 880. Under such a scenario, IP router 821 is removed from virtual IP router 850. As a result, IP router 821 stops operating as a layer-3 gateway router for end device 870. However, gateway RBridge 811 still remains connected to network 800 and continues to operate as a regular TRILL RBridge. As virtual IP router 850 still operates as a default gateway for end device 870, IP routers 822 and 823 can continue to operate as layer-3 gateway routers (as virtual IP router 850) for end device 870. Hence, all frames from end device 870 to network 880 are then distributed among gateway RBridges 812 and 813.
In some embodiments, with failure 862, an elected gateway RBridge stops responding to ARP requests for the virtual IP address and notifies other gateway RBridges. Consequently, the other gateway RBridges then elect among themselves another gateway RBridge to respond to ARP requests.
In some embodiments, with failure 862, IP router 821 might not immediately remove its membership from virtual IP router 850 and might continue to receive layer-3 traffic from end devices. Under such circumstances, gateway RBridge 811, the TRILL counterpart of IP router 821, forwards the layer-3 traffic with TRILL encapsulation to other gateway RBridges (e.g., gateway RBridge 812) which, in turn, forward the traffic to network 880. However, if all similar IP routers suffer link failures and lose their connection to network 880, IP router 821 along with the other gateway RBridges with link failures are removed from virtual IP router 850. However, all gateway RBridges continue operating as TRILL RBridges.
Suppose that a node failure 866 occurs at gateway RBridge 811 (and essentially IP router 821 as they are the same physical device). As a result, links 831, 833, 835, and 836 fail as well. Consequently, gateway RBridge 811 and IP router 821 are disconnected from both network 800 and network 880, and are incapable of transmitting to or receiving from either network. Under such a scenario, IP router 821 is removed from virtual IP router 850 and gateway RBridge 811 is removed from virtual RBridge 840. As a result, IP router 821 stops operating as a layer-3 gateway node. Furthermore, gateway RBridge 811 is disconnected from network 800 and removed from all TRILL routes in network 800.
With failure 866, as virtual IP router 850 still operates as a default gateway for end device 870, routers 822 and 823 continue operating as layer-3 gateway nodes for end device 870. Hence, all frames from end device 870 to network 880 are distributed between gateway RBridges 812 and 813. Furthermore, if IP router 821 had been an elected router, it stops responding to ARP requests for the virtual IP address. Other RBridges coupled to the failed gateway RBridge can detect the failure and notify all RBridges, including other active gateway RBridges. Consequently, the active gateway RBridges can elect another gateway RBridge to respond to ARP requests.
TRILL ports 904 include inter-switch communication channels for communication with one or more RBridges. This inter-switch communication channel can be implemented via a regular communication port and based on any open or proprietary format. Furthermore, the inter-switch communication between RBridges is not required to be direct port-to-port communication.
During operation, TRILL ports 904 receive TRILL frames from (and transmit frames to) other RBridges. TRILL header processing module 922 processes TRILL header information of the received frames and performs routing on the received frames based on their TRILL headers, as described in conjunction with
In some embodiments, RBridge 900 may form a virtual RBridge and a virtual IP address, wherein TRILL management and forwarding module 920 further includes a virtual RBridge configuration module 924, and IP management module 930 further includes a virtual IP router configuration module 938. TRILL header processing module 922 generates the TRILL header and outer Ethernet header for ingress frames corresponding to the virtual RBridge. Virtual RBridge configuration module 924 manages the communication with gateway RBridges and handles various inter-switch communications, such as link and node failure notifications. Virtual RBridge configuration module 924 allows a user to configure and assign the identifier for the virtual RBridges, and decides whether a frame has to be promoted to layer-3, as described in conjunction with
Furthermore, virtual IP router configuration module 938 handles various inter-switch communications, such as layer-3 link failure notifications. Virtual IP router configuration module 938 allows a user to configure and assign virtual IP addresses and a virtual MAC address.
ARP module 934 is responsible for ARP request replies, as described in conjunction with
In some embodiments, gateway RBridge 900 may include a number of edge ports 902, as described in conjunction with
In some embodiments, gateway RBridge 900 may include a VCS configuration module 944 that includes a virtual switch management module 940 and a logical switch 942 as described in conjunction with
Note that the above-mentioned modules can be implemented in hardware as well as in software. In one embodiment, these modules can be embodied in computer-executable instructions stored in a memory which is coupled to one or more processors in gateway RBridge 900. When executed, these instructions cause the processor(s) to perform the aforementioned functions.
In summary, embodiments of the present invention provide a switch, a method and a system for providing layer-3 support in a TRILL network. In one embodiment, the switch includes an IP header processor and a forwarding mechanism. The IP header processor identifies a destination IP address in a packet encapsulated with an inner Ethernet header, a TRILL header, and an outer Ethernet header. The forwarding mechanism determines an output port and constructs a new header for the packet based on the destination IP address. The switch also includes a packet processor which determines whether (1) an inner destination media access control (MAC) address corresponds to a local MAC address assigned to the switch; (2) a destination RBridge identifier corresponds to a local RBridge identifier assigned to the switch; and (3) an outer destination MAC address corresponds to the local MAC address. Such configuration provides a scalable and flexible solution to enable layer-3 processing in the switch.
The methods and processes described herein can be embodied as code and/or data, which can be stored in a computer-readable non-transitory storage medium. When a computer system reads and executes the code and/or data stored on the computer-readable non-transitory storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the medium.
The methods and processes described herein can be executed by and/or included in hardware modules or apparatus. These modules or apparatus may include, but are not limited to, an application-specific integrated circuit (ASIC) chip, a field-programmable gate array (FPGA), a dedicated or shared processor that executes a particular software module or a piece of code at a particular time, and/or other programmable-logic devices now known or later developed. When the hardware modules or apparatus are activated, they perform the methods and processes included within them.
The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. They are not intended to be exhaustive or to limit this disclosure. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. The scope of the present invention is defined by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 61/481,643, Attorney Docket Number BRCD-3093.0.1.US.PSP, titled “Layer-3 Support in Virtual Cluster Switching,” by inventors Phanidhar Koganti, Anoop Ghanwani, Suresh Vobbilisetty, Rajiv Krishnamurthy, Nagarajan Venkatesan, and Shunjia Yu, filed 2 May 2011, and U.S. Provisional Application No. 61/503,265, Attorney Docket Number BRCD-3093.0.2.US.PSP, titled “IP Routing in VCS,” by inventors Phanidhar Koganti, Anoop Ghanwani, Suresh Vobbilisetty, Rajiv Krishnamurthy, Nagarajan Venkatesan, and Shunjia Yu, filed 30 Jun. 2011, which are incorporated by reference herein. The present disclosure is related to U.S. patent application Ser. No. 13/087,239, (attorney docket number BRCD-3008.1.US.NP), titled “Virtual Cluster Switching,” by inventors Suresh Vobbilisetty and Dilip Chatwani, filed 14 Apr. 2011, and U.S. patent application Ser. No. 12/725,249, (attorney docket number BRCD-112-0439US), titled “Redundant Host Connection in a Routed Network,” by inventors Somesh Gupta, Anoop Ghawani, Phanidhar Koganti, and Shunjia Yu, filed 16 Mar. 2010, the disclosures of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61481643 | May 2011 | US | |
61503265 | Jun 2011 | US |