This disclosure relates in general to optimization of traffic flow over etherchannel in multiple topologies.
Switches comprise backside ports and frontside ports. Backside ports are used to, for example, connect one switch to another switch to form a switch stack, or a stacked switch. Backside ports typically have a maximum link distance of five meters or less, but communicate at a very high speed. Frontside ports are ports used to typically attach devices to the switch. The advantage of frontside Ethernet ports is that they can connect devices over long distances, but at a speed slower than the connection speeds of backside ports.
In the past, switches that were spaced far apart could be connected together in a ring using two of the frontside ports. As only two frontside ports were available for frontside stacking, ring topologies have been the only topologies available, making mesh and other interesting configurations impossible. In addition, prior switches could not support hierarchical etherchannel communications, such as that which may be implemented in “ring of rings” topologies. Furthermore, in hierarchical topologies that implement multiple levels of stacks between access nodes and a core node, there have not been methods to efficiently load balance the traffic or avoid traffic flowing over unnecessary links. In addition, prior systems failed to perform déjà vu checks based on global port numbers and intermediate port numbers.
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
Overview
Methods and systems are disclosed. The method includes: designating a first plurality of links from a first stack segment to a second stack segment as a first etherchannel link; designating a second plurality of links from the first stack segment to a third stack segment as a second etherchannel link, where the second stack segment and the third stack segment are in communication with a fourth stack segment; designating the first etherchannel link and the second etherchannel link as members of a hierarchical etherchannel link; and sending a packet from the first stack segment to the fourth stack segment using the hierarchical etherchannel link.
Additional embodiments include a method of transmitting a packet via a plurality of stack segments, including: receiving a native Ethernet packet; appending a source global port number to a header on the packet; transmitting the packet and the header from a first stack segment to a second stack segment; and performing a déjà vu on the source global port number at the second stack segment.
Also disclosed is a method of communicating between an access node and a core node via an aggregation node, including: hashing a packet at the access node to select a link within an etherchannel upon which to transmit the packet to the aggregation node; and hashing a packet at the aggregation node to select a link within a second etherchannel upon which to transmit the packet to the core node.
In a “ring of rings,” or “stack of stack” topology such as that shown in
But in the case of a Stackwise Virtual Link being down completely, there is a need to redistribute its traffic to the other segment. This traffic convergence, where the system is redirecting the forwarding plane traffic can be solved by using Hierarchical Etherchannels (HEC).
In HEC, Stackwise Virtual Links from the first stack segment to a first intermediate stack segment are part of etherchannel 1. Similarly Stackwise Virtual Links from the first stack segment to a second intermediate stack segment are part of etherchannel 2. Etherchannel 1 and etherchannel 2 are members of a hierarchical etherchannel group, thus vastly improving traffic convergence by doing convergence in the forwarding ASIC.
For the first phase of implementation the traffic redirection may be managed by software tables as currently there may not be support for Hierarchical Etherchannel tables in prior hardware. Should etherchannel 2 be broken, the software will walk through the forwarding tables and modify required forwarding entries where the software will remove etherchannel 2 as a destination and add etherchannel 1 as the alternative destination. With hardware/ASIC support to handle Hierarchical Etherchannel (“EC”) tables, where both EC destination indices can be added as part of a Hierarchical EC table, on a complete EC bundle going down, hierarchical load balancing will kick in and traffic will be redirected to the other alternate etherchannel. Software forwarding entries will continue to point to the HEC Table forwarding entry.
By making etherchannel ports members of a Hierarchical Etherchannel group, the ability to converge data traffic in a stack of stacks topology is significantly improved.
Additional features of the present disclosure include the ability to load balance traffic that flows between a core, an aggregation layer, and an access. When traffic flows from access to core via an aggregation layer, the traffic from access will hash and arrive via any EC member connected to the aggregation. From the aggregation to the core, traffic can again hash out from any EC member connected to the core.
When traffic flows from the core to the aggregation layer to access, traffic from the core will hash and arrive via any EC member connected to the aggregation. From the aggregation to the access, traffic can again hash out from any EC member connected to the access.
Stackwise Virtual enables one or more front panel ports to be combined and designated as a stack port, as indicated in the topology shown in
Users can choose if they want their traffic to go over a set of local links or work in a regular etherchannel mode. Traffic will be sent over SVL if all local EC members go down. We go through each flows below to highlight the differences.
For unicast traffic, the hash programming for the EC member is done to make sure a local link is available on egress. The egress destination programming will make sure that no packets are sent across the SVL. This is comparable to current frontside stacking implementations.
For multicast and broadcast traffic, the EC is added to the multicast group, and the hash programming is done to make sure a local link is available on egress. However, it is possible that a particular multicast group has non-EC receivers on the remote switch. In this case, the system allows a copy of the packet to go to other end but then make use of hardware pruning to weed out the EC member ports (using a frontside stack drop table). Hardware pruning allows the achievement of local forwarding behavior without having to duplicate egress forwarding resources. This is an important differentiation of this solution over the previous ones—it is much improved, and can now span across network domains (access and aggregation).
Control frames from any Stackwise Virtual member over a remote EC port will not be impacted by the above mechanism since there is a bit in the SVL header to allow override of drop decisions.
The solution proposed above has a faster re-convergence time, and does not require significant processor and memory resources to recalculate and reprogram in the case of a link failure. In addition, this solution can be used for any other protocol or technology requirement mandating redundant links with or without the use of hashing mechanisms, and is thereby extensible as well.
Traffic flowing from Stack Segment 4140 to Stack Segment 2120 may flow over hierarchical etherchannel 1170 that includes etherchannel 1150 and etherchannel 2160. So long as both etherchannel 1150 and etherchannel 2160 are up and running, a hash is performed on data packets in order to select whether the packet will flow via etherchannel 1150 or etherchannel 2160. Thus, hashing can be used as one method of load balancing the system. Once an etherchannel (1 or 2) is selected, the particular link within that etherchannel is also selected based on load balancing that etherchannel link, typically via use of a hash.
Should one of the etherchannels go down, then the address for that etherchannel is masked off and traffic redirected to the healthy etherchannel. Should one of the links within an etherchannel go down, that link is masked and traffic will flow down that etherchannel on the links that are still healthy.
A check is made to see if one of the etherchannels is down (stage 220). If so, then a mask is applied to block out the down etherchannel the healthy etherchannel is selected (stage 240). If no etherchannels are down, an etherchannel that is part of the hierarchical etherchannel is selected for load balancing purposes, typically through the use of a hashing algorithm (stage 230).
Next, a test is made on the selected etherchannel to see if any links are down (stage 250). If so, that link or links is masked off and a link that is active is selected to carry the traffic (stage 270). If not, a link is selected based on load balancing, typically through the use of a hashing algorithm.
So, in summary, typically two hashing are performed: a first to select an etherchannel from among the etherchannels that form the hierarchical etherchannel, and a second to select a link within the selected etherchannel to carry the traffic.
When traffic flows from an access node, such as access node 330a, to the core 310 via the aggregation layer 320, the traffic is initially hashed to select a link within the etherchannel to send the packet through. For example, the hash result of a packet send from access node 330a may result in the packet traveling up the left-most link to aggregation node 320a. From aggregation node 320a to core 310, another hash is performed to select one of the two links in the interconnecting etherchannel upon which to send the packet.
When traffic flows from the core 310 to an access node via the aggregation layer 320, the traffic is initially hashed to select a link within the etherchannel to send the packet through. For example, the hash result of a packet sent from core 310 may result in the packet traveling up the left-most link to aggregation node 320a. From aggregation node 320a to access node 330a, another hash is performed to select one of the two links in the interconnecting etherchannel upon which to send the packet.
In addition, intermediate source GPN and intermediate destination GPN are affixed to the frame descriptor as the packet travels in and out of the various stack segments. Intermediate déjà vu checks are performed using the intermediate source GPN and intermediate destination GPN to make sure that a packet is not being sent out of a port upon which it entered. The source GPN and destination GPN are used at the destination GPN on stack segment 3630 for a déjà vu check to ensure the packet is not leaving on the same port from which it entered.
Any process, descriptions or blocks in flow charts or flow diagrams should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved. In some embodiments, steps of processes identified in
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the switching systems and methods. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. Although all such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims, the following claims are not necessarily limited to the particular embodiments set out in the description.
This application is a division of U.S. patent application Ser. No. 16/833,770, filed on Mar. 30, 2020, which is a division of U.S. patent application Ser. No. 15/636,933 filed on Jun. 29, 2017, now U.S. Pat. No. 10,608,957, the disclosure of each is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
7274694 | Cheng | Sep 2007 | B1 |
8244909 | Hanson | Aug 2012 | B1 |
9237100 | Mizrahi | Jan 2016 | B1 |
10608957 | Cheng | Mar 2020 | B2 |
11516150 | Cheng | Nov 2022 | B2 |
20030140124 | Burns | Jul 2003 | A1 |
20050068993 | Russell | Mar 2005 | A1 |
20050198371 | Smith | Sep 2005 | A1 |
20050259646 | Smith | Nov 2005 | A1 |
20060039384 | Dontu | Feb 2006 | A1 |
20070207591 | Rahman | Sep 2007 | A1 |
20080259917 | Hua | Oct 2008 | A1 |
20080259951 | Cardona | Oct 2008 | A1 |
20080298236 | Ervin | Dec 2008 | A1 |
20090052326 | Bergamasco | Feb 2009 | A1 |
20100146323 | Hu | Jun 2010 | A1 |
20110176544 | Wong | Jul 2011 | A1 |
20110261811 | Battestilli | Oct 2011 | A1 |
20120198270 | Jain | Aug 2012 | A1 |
20120224486 | Battestilli | Sep 2012 | A1 |
20120307641 | Arumilli | Dec 2012 | A1 |
20130155902 | Feng | Jun 2013 | A1 |
20130336164 | Yang | Dec 2013 | A1 |
20130347103 | Veteikis | Dec 2013 | A1 |
20140025736 | Wang | Jan 2014 | A1 |
20140115137 | Keisam | Apr 2014 | A1 |
20140198793 | Allu | Jul 2014 | A1 |
20140226644 | Raman | Aug 2014 | A1 |
20140254597 | Tian | Sep 2014 | A1 |
20150036479 | Gopalarathnam | Feb 2015 | A1 |
20150188806 | Anumala | Jul 2015 | A1 |
20150195218 | Smith | Jul 2015 | A1 |
20150350111 | Xiu | Dec 2015 | A1 |
20160212041 | Krishnamurthy | Jul 2016 | A1 |
20160301597 | Jayakumar | Oct 2016 | A1 |
20160359641 | Bhat | Dec 2016 | A1 |
20170085622 | Gopinath | Mar 2017 | A1 |
20170171067 | Kapadia | Jun 2017 | A1 |
20180176118 | Adler | Jun 2018 | A1 |
20180359811 | Verzun | Dec 2018 | A1 |
20190007262 | Cheng | Jan 2019 | A1 |
20190007302 | Cheng | Jan 2019 | A1 |
20190007343 | Cheng | Jan 2019 | A1 |
20200228464 | Cheng | Jul 2020 | A1 |
20220329525 | Lu | Oct 2022 | A1 |
20230043073 | Cheng | Feb 2023 | A1 |
Entry |
---|
IP.com Search in NPL, Jul. 2019, 1 page. |
IP.com Search in Patents, Jul. 2019, 4 pages. |
Jo J. Y., et al., “Hash-based Internet Traffic Load Balancing,” Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004, I RI 2004, Nov. 2004, 6 Pages. |
Srebrny P. H., et al., “No More Deja Vu—Eliminating Redundancy with CacheCast: Feasibility and Performance Gains,” IEEE/ACM Transactions on Networking, Dec. 2013, vol. 21, No. 6, 14 Pages. |
U.S. Appl. No. 15/637,034, filed Jun. 29, 2017, Title: “Mechanism for Dual Active Detection Link Monitoring in Virtual Switching System with Hardware Accelerated Fast Hello,” 22 Pages. |
U.S. Appl. No. 15/637,146, filed Jun. 29, 2017, Title: “Systems and Methods for Enabling Frontside Stacking of Switches,” 23 Pages. |
Number | Date | Country | |
---|---|---|---|
20230043073 A1 | Feb 2023 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16833770 | Mar 2020 | US |
Child | 17971842 | US | |
Parent | 15636933 | Jun 2017 | US |
Child | 16833770 | US |