The present disclosure relates to load balancing in a network switch device.
An EtherChannel is a logical bundling of two or more physical ports between two switches to achieve higher data transmission. The assignment of an output port within an EtherChannel group is usually done at the time the frame enters the switch using a combination of hashing schemes and lookup tables, which are inherently static in nature. Moreover, conventional port mapping does not take into account the individual output port utilization, i.e., queue level. This can result in poor frame forwarding decisions to the output ports within an EtherChannel group, leading to underutilization of some ports and dropping of frames due to congestion in other output ports.
Overview
Dynamic load balancing techniques among ports of a network device are provided. At a device configured to forward packets in a network, a plurality of queues are generated, each associated with a corresponding one of a plurality of output ports of the device and from which packets are to be output from the device into the network. It is detected when a number of packets or bytes in at least one queue exceeds a threshold. When the number of packets in the at least one queue exceeds the threshold for new packets that are to be enqueued to the at least one queue, packets are enqueued to a plurality of sub-queues such that packets are assigned to different ones of the plurality of sub-queues. Each of the plurality of sub-queues is associated with a corresponding one of the plurality of output ports. Packets of the plurality of sub-queues are output from corresponding ones of the plurality of output ports.
Referring first to
The switches 20(1) and 20(2) are configured to implement EtherChannel techniques. EtherChannel is a port link aggregation technology or port-channel architecture that allows grouping of several physical Ethernet links to create one logical Ethernet link for the purpose of providing fault-tolerance and high-speed links between switches, routers and servers. An EtherChannel can be created from between two and eight Ethernet ports, with an additional one to eight inactive (failover) ports which become active as the other active ports fail.
At least one of the switches, e.g., switch 20(1), is configured to dynamically allow for the segregation of outgoing flows to optimally load balance traffic among the output ports within an EtherChannel group and, as a result, maximize individual link utilization while guaranteeing in order packet delivery. These techniques can target problem output ports that are, for example, experiencing congestion. These techniques can be invoked when one or more physical ports in an EtherChannel group are overutilized, i.e., congested. Overutilization of a port indicates that other ports in the same EtherChannel group are underutilized. In some implementations, these techniques are only invoked when one or more physical ports are overutilized.
Reference is now made to
The queuing subsystem 58 comprises a memory 59 that is referred to herein as the link list memory. In one form, the memory 59 is implemented by a plurality of registers, but it may be implemented by allocated memory locations in the memory arrays 56, by a dedicated memory device, etc. In general, the memory 59 serves as a means for storing a queue link list defining the plurality of queues of packets stored in the memory arrays 56 and for storing a sub-queue link list defining the plurality of sub-queues.
The link list memory 59 comprises memory locations (e.g., registers) allocated for at least one queue 70 (herein also referred to as a “regular” queue) and a plurality of sub-queues 72(0)-72(L−1). The regular queue stores an identifier for each packet stored in memory 56 that is part of the regular queue in order from head (H) to tail (T) of the queue. Likewise, each sub-queue stores an identifier for each packet stored in memory 56 that is part of a sub-queue also in order from H to T for each sub-queue. Each of the sub-queues 72(0)-72(L−1) is associated with a corresponding one of a plurality of physical output ports, designated as Port 0 to Port L−1. These ports correspond to the ports 22(4)-22(7), for example, shown in
The queuing subsystem 58 also comprises an 8-bit to 3-bit hashing circuit 74, a round robin (RR) arbiter 76 and an adder or sum circuit 78. The 8-bit to 3-bit hashing circuit 74 is configured to compute a 3-bit hash computation on packet headers to determine which of a plurality of sub-queues to assign a packet when it is determined to use sub-queues, as will become more apparent hereinafter. The 8-bit to 3-bit hashing circuit 74 is provided because the 8-bit hashing circuit 52 is a common component in switches and rather than re-design the switch to provide a lesser degree of hashing for enqueuing packets to the plurality of sub-queues, the additional hashing circuit 74 is provided. The hashing circuit 52 serves as a means for adding entries to a queue link list for at least one queue as new packets are added to the at least one queue. Moreover, the hashing circuit 52 in combination with the hashing circuit 74 serves as a means for adding entries to the sub-queue link list for the plurality of sub-queues such that packets are assigned to different ones of the plurality of sub-queues when congestion is detected on at least one port that is part of an EtherChannel group.
The RR arbiter 76 selects a packet from one of the plurality of same COS sub-queues from ports of the same EtherChannel group and directs it to the adder 78. The RR arbiter 76 comprises a digital logic circuit, for example, that is configured to select a packet from one of same COS sub-queues from ports of the same EtherChannel according to any of a variety of round robin selection techniques. The other input to the adder 78 is an output from the regular queue 70.
The queue level monitor 60 is a circuit that compares the current number of packets in the regular queue and in the sub-queues with a predetermined threshold. In another form, the queue level monitor 60 determines the total number of bytes in a queue or sub-queue. Thus, it should be understood that references made herein to the queue level monitor circuit comparing numbers of packets with a threshold may involve comparing numbers of bytes with a threshold. In one example, the queue level monitor 60 comprises a counter and a comparator that is configured to keep track of the amount of data (in bytes) stored in memory 56 for each queue. There can be a dedicated queue level monitor 60 for each regular queue. Thus, since only one regular queue is shown in
The read logic circuit 62 is configured to read packets from the memory 56 to be transmitted from the switch via the output 64. The order that the read logic circuit 62 follows to read packets from the memory 56 is based on the identifiers supplied from the link list memory 59 in the regular queue or plurality of sub-queues as described further hereinafter.
The read logic circuit 62 and output circuit 64 serve as a means for outputting packets from the memory 56. As will become apparent hereinafter, the read logic circuit 62 and output circuit 64 serve as a means for outputting packets from the memory 56 for the plurality of sub-queues according to the sub-queue link list in memory 59 after all packets in the queue link list in memory 59 for at least one queue have been output from the memory 56.
The hashing circuit 52 serves as a means for adding entries to a queue link list for at least one queue as new packets are added to the at least one queue. Moreover, the hashing circuit 52 in combination with the hashing circuit 74 serves as a means for adding entries to the sub-queue link list for the plurality of sub-queues such that packets are assigned to different ones of the plurality of sub-queues when at least one queue exceeds the aforementioned threshold indicative of a congested port.
There is also a priority arbiter logic circuit 80 that is configured to schedule which of a plurality of regular queues is serviced based on a software configuration. Multiple COS queues are described hereinafter in connection with
Request from the queues (when multiple regular queues are employed) are sent to the priority arbiter 80. The priority arbiter 80 generates a queue number grant and sends it back to the queuing subsystem 58. The RR arbiter 76 generates a packet pointer for a packet (from the selected sub-queue corresponding to one of the ports of the EtherChannel group for the same COS) and sends the packet pointer information to the read logic circuit 62, which retrieves the appropriate packet from the packet memory 56 for output via the output circuit 64. The read logic circuit 62 also feeds back information concerning the output packet to the priority arbiter 80 in order to update its own internal counters.
The load balancing sub-queues can be activated by a combination of register configurations and congestion indication by the queue level monitoring logic. For example, there are configuration registers (not shown) that can be allocated to enable/disable the LB sub-queues, and to specify the number of ports in an EtherChannel group and the hashing-to-port mapping.
The general sequence of events for operation of the priority arbiter 80 and related logic circuits shown in
The flows that were being enqueued to the congested queue are separated into the sub-queues using a hashing scheme (e.g., the 8-bit to 3-bit hashing scheme) that provides in order packet delivery within a flow and also that any particular flow will be forwarded to the same sub-queue. The 3-bit hash is again collapsed into values that ranges from 0 to N−1 which in turn indexes to one of the sub-queues. The 8-bit to 3-bit rehashing scheme minimizes clumping to one single queue. All the sub-queues corresponding to the ports of the EtherChannel group forwarding flows to a particular physical port are then serviced in a round robin (RR), weighted round robin (WRR) or deficit WRR (DWRR) fashion. This effectively relieves the congestion and rebalances the flows to the other links within the EtherChannel group. Once the level of the original (problem) queue falls below a certain threshold (indicating that the links are no longer overutilized), the logical sub-queues are collapsed into a single queue. Creation and collapsing of the queues are initiated by the level of fullness of any queue. The sub-queues can be reused again for other problem queues in the same manner.
The sub-queuing techniques described herein are applicable when there is one or a plurality of classes of services of packet flows handled by the switch.
Creation of Sub-Queues
Reference is now made to
Packets are enqueued to one of the COS regular queues 70(0) to 70(7) based on their COS. For example, packets in COS 0 are all enqueued to queue 70(0), packets in COS 1 are enqueued to queue 70(1), and so on. The priority arbiter 80 selects packets from the plurality of COS regular queues 70(0)-70(7) after adders shown at 78(0)-78(7) associated with each regular queue 70(0)-70(7) and sub-queues (of the same COS) from other ports that are in the same EtherChannel group. There is a RR arbiter for each COS, e.g., RR arbiter 76(0), . . . , 76(7) in this example. The RR arbiters 76(0)-76(7) select packets from the plurality of sub-queues from other ports (for a corresponding COS) according to a round robin scheme. The outputs of the respective RR arbiters 76(0)-76(7) are coupled to a corresponding one of the adders 78(0)-78(7) associated with the regular queues 70(0)-70(7), respectively, depending on which of the COS regular queues is selected for sub-queuing.
In this example, the states of the 8 regular queues 70(0)-70(7) are sent to the priority arbiter 80. The priority arbiter 80 then checks the software configuration parameters (which are tied to the classes of services served by the device) to determine which is the next COS queue to be serviced. A higher priority COS will be serviced more often than a lower priority COS. The priority arbiter 80 then sends an indication of the queue to be serviced next, referred to as the queue number grant in
Any of the COS regular queues 70(0)-70(7) (most likely the lowest priority queue) can accumulate packets (grow) beyond a configured predetermined threshold. A sequence of events or operations labeled “1“−”4” in
At “2”, the COS queue 70(0) is declared to be congested and new packets are no longer enqueued into COS queue 70(0) only. Instead, they are queued into the LB sub-queues 72(0)-72(7). Packets to other COS queues continue to be sent to their respective COS queues. An 8- to 3-bit hashing number and port map is used to select which of the sub-queues 72(0)-72(7) a packet is enqueued. The LB sub-queues are not de-queued yet. A plurality of COS sub-queues are effectively created on fly and, as explained above, the number of sub-queues created depends on the number of ports in the EtherChannel group under evaluation. In this example, there are 8 LB sub-queues because there are 8 physical ports in the EtherChannel group. The sub-queue number specifies to which output port the packet will eventually be forwarded.
At “3”, COS queue 70(0) is continued to be de-queued via the priority arbiter grant operation 80 until COS queue 70(0) is empty.
At “4”, after the COS 70(0) queue is empty, packets from the sub-queues 72(0)-72(7) are de-queued by the RR arbiter 76(0) of the respective ports 0-7 in the EtherChannel group. Since the COS queue 70(0) is completely de-queued before the sub-queues are de-queued, packets within a given flow are ensured to always be de-queued in order.
If the 3-bit hash function puts all the flows into one of the sub-queues (that is assigned to one, e.g., the same, port), then the queuing and de-queuing operations will operate as if there are no sub-queues.
Sub-Queue Collapsing
At “7”, packets are continued to be de-queued from the sub-queues 72(0)-72(7) until all of sub-queues 72(0)-72(7) are empty. At “8”, after all the sub-queues 72(0)-72(7) are empty, the original COS queue is de-queued. This ensures that packets within a flow are always de-queued in proper order.
At this point, the sub-queues 72(0)-72(7) are declared to be free and available for use by any COS queue that is determined to be congested.
Reference is now made to
At 120, the switch adds entries to the plurality of queue link lists as new packets are added to the plurality of queues based on the hashing by the hashing circuit 52. When multiple classes of service are supported by the switch, the adding operation 120 involves adding entries to corresponding ones of the plurality of queue link lists for new packets based on the classes of service of the new packets.
At 125, the read logic circuit 62 reads packets from the memory arrays 56 for output via output circuit 64 for the plurality of queues according to entries in the plurality of queue link lists stored in the memory 59.
At 130, the queue level monitor circuit 60 detects when the number of packets (or bytes) enqueued in at least one queue exceeds a threshold indicating overutilization of the output port corresponding to that queue. The queue level monitor circuit 60 may make this determination based on the number of packets in the at least one queue exceeding a threshold or the number of bytes in the queue exceeding a threshold (to account for packets of a variety of payload sizes such that some packets may comprise more bytes than other packets). The detecting operation at 130 may detect when any one of the plurality of queues exceeds a threshold. When this occurs, at 135, packets intended for that queue are no longer enqueued to it and adding of entries to the queue link list for the at least one queue is terminated.
At 140, when the at least one queue exceeds the threshold, a sub-queue link list is generated and stored in memory 59. The sub-queue link list defines a plurality of sub-queues 72(0)-72(L−1) each associated with a corresponding one of the plurality of output ports in an EtherChannel group. Moreover, the plurality of sub-queues is generated when any one of the plurality of queues is determined to exceed the threshold. At 145, for new packets that are to be enqueued to the at least one queue, entries are added to the sub-queue link list for the plurality of sub-queues 72(0)-72(L−1) to enqueue packets to the plurality of sub-queues such that packets are assigned to different ones of the plurality of sub-queues when the at least one queue exceeds a threshold. For example, the assignment of packets to sub-queues is made by the 8-bit to 3-bit hashing circuit 74 that performs a hashing computation that is configured to ensure that packets for a given flow of packets are assigned to the same sub-queue to maintain in-order output of packets within a given flow.
While operation 145 is performed for newly received packets for the at least one queue, packets are output from the memory 56 that were in the at least one queue. Eventually, the at least one queue will become empty.
At 150, after all packets in the queue link list for the at least one queue have been output from the memory 59, packets are output for the plurality of sub-queues 72(0)-72(L−1), via read logic circuit 62 and output circuit 64, from the memory 56 according to the sub-queue link list in memory 59, and ultimately from corresponding ones of the plurality of output ports. Packets of the plurality of sub-queues may be output in a RR, WRR, or DRR manner.
At 155, when traffic intended for the at least one queue (that is currently using the plurality of sub-queues 72(0)-72(L−1)) reduces to a predetermined threshold, then enqueuing of entries to the sub-queue link list for the plurality of sub-queues is terminated. The queue level monitor circuit 60 generates a control signal to terminate enqueuing of packets to the plurality of sub-queues when the number of packets in the plurality of sub-queues reduces to a predetermined threshold. Packets can be enqueued to the original queue link list for the at least one queue. Thus, at 160, adding of entries to the queue link list for new packets to be added to the at least one queue is resumed. At 165, packets are continued to be output from the plurality of sub-queues, and at 170, after all packets in the sub-queue link list for the plurality of queues have been output from memory 56, via read logic circuit 62 and output circuit 64, packets are output from the memory 56 for at least one queue according to the queue link list for that queue. Also, after the plurality of sub-queues are empty, they can be freed up for use for another congested output port.
In summary, operations 130-145 are associated with creation of the plurality of sub-queues, operation 150 involves de-queuing of the plurality of sub-queues and operations 155-170 are associated with the collapsing of the plurality of sub-queues.
Reference is now made to
In this example, a switch has 8 ports labeled Port 1 to Port 8. Port 5 to Port 8 are configured to be an EtherChannel group. Port 1 is receiving flows A, B, C, D and Port 2 is receiving flows E, F, G, H, I while all the other ports are inactive. These flows are all associated with the same COS for purposes of this example. There is input port logic 90 associated with Ports 1-4, respectively, and queues 92(5)-92(8) associated with Ports 5-8, respectively. The input port logic 90 shown in
The same example of
In
Turning now to
The memory 28 may comprise read only memory (ROM), random access memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. The memory 28 stores executable software instructions for packet sub-queuing process logic 100 as well as the link lists for the regular queues and for the sub-queues as well as the packets to be output. Thus, the memory 28 may comprise one or more computer readable storage media encoded with software comprising computer executable instructions and when the software is executed operable to perform the operations described in connection with
The sub-queuing techniques described herein provide a dynamic scheme to optimally utilize the physical links within an EtherChannel. These techniques are used when congestion is detected on a physical port and is applied only for the problem port. Furthermore, these techniques improve over the inefficient input static port assignment in an EtherChannel, resulting in optimal link utilization, improved latency and reduced congestion and dropped packets.
The above description is intended by way of example only.