The present invention relates generally to packet communication networks, and particularly to methods and systems for control of congestion in such networks.
Network congestion occurs when a link or node in the network is required to carry more data traffic than it is capable of transmitting or forwarding, with the result that its quality of service deteriorates. Typical effects of congestion include queuing delay, packet loss, and blocking of new connections. Modern packet networks use congestion control (including congestion avoidance) techniques in efforts to mitigate congestion before catastrophic results set in.
A number of congestion avoidance techniques are known in the art. In random early detection (RED, also known as random early discard or random early drop), for example, network nodes, such as switches, monitor their average queue size and drop packets based on statistical probabilities: If a given queue (or set of queues) is almost empty, all incoming packets are accepted. As the queue grows, the probability of dropping an incoming packet grows accordingly, reaching 100% when the buffer fill level passes the applicable threshold. Weighted RED (WRED) works in a similar fashion, except that different traffic classes are assigned different congestion avoidance thresholds, so that for a given queue length, low-priority packets have a greater probability of being dropped than high-priority packets. Congestion control techniques of this sort, which operate on a fraction of packets that is determined by statistical probabilities, are referred to herein as statistical congestion control techniques.
Another congestion avoidance technique is Explicit Congestion Notification (ECN), which is an extension to the Internet Protocol (IP) and the Transmission Control Protocol (TCP). ECN was initially defined by Ramakrishnan, et al., in “The Addition of Explicit Congestion Notification (ECN) to IP,” which was published as Request for Comments (RFC) 3168 of the Internet Engineering Task Force (2001) and is incorporated herein by reference. ECN provides end-to-end notification of network congestion by signaling impending congestion in the IP header of transmitted packets. The receiver of an ECN-marked packet of this sort echoes the congestion indication to the sender, which reduces its transmission rate as though it had detected a dropped packet. ECN functionality has recently been extended to other transport and tunneling protocols.
Embodiments of the present invention that are described hereinbelow provide improved methods for congestion control in a network and apparatus implementing such methods.
There is therefore provided, in accordance with an embodiment of the invention, communication apparatus, including multiple interfaces configured to be connected to a packet data network so as to serve as both ingress and egress interfaces in receiving and forwarding of data packets from and to the network by the apparatus. A memory is coupled to the interfaces and configured as a buffer to contain the data packets received through the ingress interfaces in multiple queues while awaiting transmission to the network via the egress interfaces. Congestion control logic includes a packet discard machine, which is configured to drop a first fraction of the data packets from at least a first queue in the buffer in response to a status of the queues, and a packet marking machine, which is configured to apply a congestion notification to a second fraction of the data packets from at least a second queue in the buffer in response to the status of the queues. Machine control circuitry is coupled to selectively enable and disable at least the packet discard machine.
In some embodiments, the machine control circuitry is further coupled to selectively enable and disable the packet marking machine.
In a disclosed embodiment, the packet discard machine and the packet marking machine are configured to drop and apply the congestion notification to respective fractions of the data packets in a same one or more of the queues.
In some embodiments, the congestion notification includes setting an explicit congestion notification (ECN) or a traffic class (TC) field in a header of the data packets.
In the disclosed embodiments, the congestion control logic includes a profile calculator, which is configured to compute the first and second fractions responsively to respective statuses of the first and second queues. Typically, the profile calculator is configured to compute the first and second fractions by comparing lengths of the queues to respective buffer allocations of the queues in the memory, and/or based on respective transmission rates of the queues. Additionally or alternatively, the apparatus includes packet classification logic, which is configured to assign the data packets received through the ingress to the multiple queues, and to convey information regarding the received data packets to the profile calculator.
There is also provided, in accordance with an embodiment of the invention, a method for communication, which includes, in a network element having multiple interfaces connected to a packet data network so as to serve as both ingress and egress interfaces and a memory coupled to the interfaces, placing data packets received through the ingress interfaces in multiple queues in the memory while the data packets await transmission to the network. Congestion control is applied to the data packets that are queued for transmission using a packet discard machine, which is configured to drop a first fraction of the data packets from at least a first queue in the buffer in response to a status of the queues, and using a packet marking machine, which is configured to apply a congestion notification to a second fraction of the data packets from at least a second queue in the buffer in response to the status of the queues. At least the packet discard machine is selectively enabled and disabled, so that when the packet discard machine is disabled, the data packets are not dropped by the network element in response to congestion indicated by the status of the queues.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
In network elements, such as switches, that are known in the art, packet marking by ECN operates in conjunction with packet discard by RED (including WRED), under the control of a single logical congestion avoidance machine in accordance with the model defined in the above-mentioned RFC 3168. Therefore, ECN packet marking cannot be enabled for applicable packets without also allowing the congestion avoidance machine to drop packets that are not subject to ECN marking when congestion is severe. Conversely, when it is necessary to avoid dropping packets of a certain type, such as TCP control packets (for example, SYN and SYN/ACK packets) or other lossless traffic classes, marking of packets for purposes of congestion avoidance is also disabled.
Embodiments of the present invention that are described herein provide a more flexible model for congestion avoidance, in which the packet discard and packet marking mechanisms are applied separately and independently. In the disclosed embodiments, congestion control logic in communication apparatus, such as a network switch, comprises both a packet discard machine and a packet marking machine. (The term “machine,” as used in the present description and in the claims, refers to a distinct logic circuit that performs a certain, well-defined task.) Machine control circuitry in the apparatus is coupled to selectively enable and disable at least the packet discard machine, and possibly the packet marking machine, as well.
This separation of the packet discard and marking machines enables the system operator to configure the apparatus for different sorts of congestion responses: mark only, drop only, or both mark and drop appropriate fractions of the packets in case of congestion. Furthermore, the machine control circuitry can set the packet discard and marking machines to apply different congestion responses to different queues, as well as to different types of traffic, so that TCP control packets, for example, are marked (but not dropped) in case of congestion, while other sorts of packets may be dropped. Separation of the packet discard and marking machines can also enhance the efficiency of congestion control, since packet discard can be applied, for example, early in the processing pipeline of a network switch in order to free buffer space promptly, while packet marking can be applied late in the processing pipeline to enable rapid response to changes in congestion level.
Allocations 38 (i.e., the amount of buffer that the queue is permitted to use, or equivalently, the control threshold for purposes of congestion control) may be static, or they may vary over time. Furthermore, different queues may receive respective allocations 38 of different sizes, depending, for example, on traffic priority levels or other system considerations. Multiple different queues directed to the same egress interface may receive their own, separate allocations 38. Alternatively or additionally, a memory allocation may be shared among multiple queues that are directed to the same egress interface or even to multiple different egress interfaces. Various sorts of dynamic buffer allocations can be handled by decision and queuing logic 40 in switch 20 and will have an impact on the thresholds applied by congestion control logic 42 in the switch, but these buffer allocation mechanisms themselves are beyond the scope of the present description. Buffer allocation mechanisms that can be used in this context are described, for example, in U.S. patent application Ser. No. 14/672,357, filed Mar. 30, 2015, whose disclosure is incorporated herein by reference.
Congestion control logic 42 in this example applies congestion control, such as ECN and/or WRED, based on statistical or other congestion control criteria, to a respective fraction of the packets that are queued for transmission to network 24 from each queue in memory 36. Logic 42 typically sets the fraction of the packets to be marked or dropped in this context for each queue at any given time based on a relation between the length of the queue and the size of the respective allocation 38. Thus, in response to the status of the queues and depending upon congestion conditions, congestion control logic 42 can drop a certain fraction of the data packets from a certain queue or set of queues in the buffer, while applying a congestion notification marking to another fraction of the data packets from another queue or set of queues. These two sets of queues may intersect, meaning that in some or all of the queues, some packets may be dropped while others are marked with a congestion notification.
In the example shown in
Although the present description relates, for the sake of concreteness and clarity, to the specific switch 20 that is shown in
Upon receiving an incoming packet, an ingress port 22A (such as one of ports 22 in
When a descriptor reaches the head of its queue, queuing system 52 passes the descriptor to a packet modifier 54 for execution. In response to the descriptor, packet modifier 54 reads the appropriate packet data from memory 36, and makes whatever changes are called for in the packet header for transmission to network 24 through egress port 22B. These changes may include marking the packet header, for example by setting ECN field 34 as a congestion notification, in response to instructions from congestion control logic 42.
Congestion control logic 42 comprises a profile calculator 56, which computes congestion control probabilities for each queue to which an incoming packet may be assigned. These probabilities are expressed as fractions, which are input from profile calculator 56 to a packet discard machine 58 and a packet marking machine 62 for purposes of the drop and ECN decisions that are to be made in case of congestion. In other words, for any given queue at any given time, the probability value provided by profile calculator 56 to packet discard machine 58 indicates the fraction of the packets in the queue that are to be dropped; while the probability value provided to packet marking machine 62 (which may be the same as or different from that provided to the packet discard machine) indicates the fraction of the packets in the queue that are to be marked with a congestion notification.
Profile calculator 56 computes and updates these probability values based on queue status information provided by queuing system 52, as well as packet header information analyzed by packet classifier 50. For example, the packet classifier may refer for this purpose to the IP and transport header fields indicating the traffic class and congestion status. As another example, when MPLS is in use, the packet classifier can use the corresponding fields in the MPLS header (as provided by IETF RFC 5129, entitled “Explicit Congestion Marking in MPLS,” by Davie et al.), and particularly the QoS and congestion notification information in the MPLS Traffic Class (TC) field (as defined in IETF RFC 5462, by Andersson et al.) The queue status information typically includes the lengths and/or the respective transmission rates of the queues in question, and the probability values depend on a comparison of these lengths to the available buffer allocations 38 of the queues. The packet header fields of relevance include, inter alia, the ECN and differentiated services code point (DSCP) fields in the IP header. Packet classifier 50 may also indicate to packet discard machine 58 and packet marking machine 62 whether a given queue or packet type is eligible for packet dropping, marking, or both.
Congestion control logic 42 also comprises machine control circuitry, including a drop enable circuit 60 and, optionally, an ECN enable circuit 64. Drop enable circuit 60 is coupled to selectively enable and disable the packet discard machine 58, while ECN enable circuit 64 selective enables and disables packet marking machine 64. When drop enable circuit 60 disables packet discard machine 58, for example, congestion control logic 42 will still mark packets in case of congestion but will not drop packets. Thus, by setting circuits 60 and 64, the system operator of switch 20 is able to determine how the switch will respond to congestion: by dropping packets, marking packets, or both or neither of these functions. These settings may change over time, either automatically or under direct operator control, on the basis of network configuration and status, as well as other system requirements.
When enabled by drop enable circuit 60, packet discard machine 58 chooses, based on the probability value from profile calculator 58, the appropriate fraction of packets to drop from each queue. These packets are deleted from memory 36 and from the respective queues in queuing system 52.
By the same token, when enabled by ECN enable circuit 64, packet marking machine 62 chooses, based on the probability value from profile calculator 58, the appropriate packets in each queue to mark with a congestion notification, and instructs packet modifier 54 to modify the packet headers accordingly. The congestion notification may be marked, for example, in the ECN field of the IP header, as explained above, or in another appropriate header field, such as the MPLS TC field. The packets are then transmitted via egress port 22B to network 24.
It will be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
Number | Name | Date | Kind |
---|---|---|---|
6108713 | Sambamurthy et al. | Aug 2000 | A |
6178448 | Gray et al. | Jan 2001 | B1 |
6594263 | Martinsson et al. | Jul 2003 | B1 |
7321553 | Prasad et al. | Jan 2008 | B2 |
7821939 | Decusatis et al. | Oct 2010 | B2 |
8078743 | Sharp et al. | Dec 2011 | B2 |
8345548 | Gusat et al. | Jan 2013 | B2 |
8473693 | Muppalaneni et al. | Jun 2013 | B1 |
8576715 | Bloch et al. | Nov 2013 | B2 |
8767561 | Gnanasekaran et al. | Jul 2014 | B2 |
8811183 | Anand et al. | Aug 2014 | B1 |
8879396 | Guay et al. | Nov 2014 | B2 |
8989017 | Naouri et al. | Mar 2015 | B2 |
8995265 | Basso et al. | Mar 2015 | B2 |
9014006 | Haramaty et al. | Apr 2015 | B2 |
9325619 | Guay et al. | Apr 2016 | B2 |
9356868 | Tabatabaee et al. | May 2016 | B2 |
9426085 | Anand et al. | Aug 2016 | B1 |
20020191559 | Chen et al. | Dec 2002 | A1 |
20030108010 | Kim et al. | Jun 2003 | A1 |
20030223368 | Allen, Jr. | Dec 2003 | A1 |
20040008714 | Jones | Jan 2004 | A1 |
20050053077 | Blanc et al. | Mar 2005 | A1 |
20050169172 | Wang et al. | Aug 2005 | A1 |
20050216822 | Kyusojin et al. | Sep 2005 | A1 |
20050226156 | Keating et al. | Oct 2005 | A1 |
20050228900 | Stuart et al. | Oct 2005 | A1 |
20060088036 | De Prezzo | Apr 2006 | A1 |
20060092837 | Kwan et al. | May 2006 | A1 |
20060092845 | Kwan et al. | May 2006 | A1 |
20070097257 | El-Maleh et al. | May 2007 | A1 |
20070104102 | Opsasnick | May 2007 | A1 |
20070104211 | Opsasnick | May 2007 | A1 |
20070201499 | Kapoor | Aug 2007 | A1 |
20070291644 | Roberts et al. | Dec 2007 | A1 |
20080037420 | Tang et al. | Feb 2008 | A1 |
20080175146 | Van Leekwuck et al. | Jul 2008 | A1 |
20080192764 | Arefi et al. | Aug 2008 | A1 |
20090207848 | Kwan et al. | Aug 2009 | A1 |
20100220742 | Brewer et al. | Sep 2010 | A1 |
20130014118 | Jones | Jan 2013 | A1 |
20130039178 | Chen et al. | Feb 2013 | A1 |
20130250757 | Tabatabaee et al. | Sep 2013 | A1 |
20130250762 | Assarpour | Sep 2013 | A1 |
20130275631 | Magro et al. | Oct 2013 | A1 |
20130286834 | Lee | Oct 2013 | A1 |
20130305250 | Durant | Nov 2013 | A1 |
20140133314 | Mathews et al. | May 2014 | A1 |
20140269324 | Tietz et al. | Sep 2014 | A1 |
20150026361 | Matthews et al. | Jan 2015 | A1 |
20150124611 | Attar et al. | May 2015 | A1 |
20150127797 | Attar et al. | May 2015 | A1 |
20150180782 | Rimmer et al. | Jun 2015 | A1 |
20150200866 | Pope et al. | Jul 2015 | A1 |
20150381505 | Sundararaman | Dec 2015 | A1 |
20160135076 | Grinshpun et al. | May 2016 | A1 |
20170118108 | Avci et al. | Apr 2017 | A1 |
20170142020 | Sundararaman | May 2017 | A1 |
20170180261 | Ma et al. | Jun 2017 | A1 |
20170187641 | Lundqvist | Jun 2017 | A1 |
Number | Date | Country |
---|---|---|
1720295 | Nov 2006 | EP |
2466476 | Jun 2012 | EP |
2009107089 | Sep 2009 | WO |
2013136355 | Sep 2013 | WO |
2013180691 | Dec 2013 | WO |
Entry |
---|
CISCO Systems, Inc., “Priority Flow Control: Build Reliable Layer 2 Infrastructure”, 8 pages, 2015. |
CISCO Systems, Inc.,“Advantage Series White Paper Smart Buffering”, 10 pages, 2016. |
Hoeiland-Joergensen et al., “The FlowQueue-CoDel Packet Scheduler and Active Queue Management Algorithm”, Internet Engineering Task Force (IETF) as draft-ietf-aqm-fq-codel-06 , 23 pages, Mar. 18, 2016. |
U.S. Appl. No. 14/718,114 Office Action dated Sep. 16, 2016. |
U.S. Appl. No. 14/672,357 Office Action dated Sep. 28, 2016. |
Gran et al., “Congestion Management in Lossless Interconnection Networks”, Submitted to the Faculty of Mathematics and Natural Sciences at the University of Oslo in partial fulfillment of the requirements for the degree Philosophiae Doctor, 156 pages, Sep. 2013. |
Pfister et al., “Hot Spot Contention and Combining in Multistage Interconnect Networks”, IEEE Transactions on Computers, vol. C-34, pp. 943-948, Oct. 1985. |
Zhu et al., “Congestion control for large-scale RDMA deployments”, SIGCOMM'15, pp. 523-536, Aug. 17-21, 2015. |
Hahne et al., “Dynamic Queue Length Thresholds for Multiple Loss Priorities”, IEEE/ACM Transactions on Networking, vol. 10, No. 3, pp. 368-380, Jun. 2002. |
Choudhury et al., “Dynamic Queue Length Thresholds for Shared-Memory Packet Switches”, IEEE/ACM Transactions Networking, vol. 6, Issue 2 , pp. 130-140, Apr. 1998. |
Gafni et al., U.S. Appl. No. 14/672,357, filed Mar. 30, 3015. |
Ramakrishnan et al., “The Addition of Explicit Congestion Notification (ECN) to IP”, Request for Comments 3168, Network Working Group, 63 pages, Sep. 2001. |
IEEE Standard 802.1Q™-2005, “IEEE Standard for Local and metropolitan area networks Virtual Bridged Local Area Networks”, 303 pages, May 19, 2006. |
Infiniband TM Architecture, Specification vol. 1, Release 1.2.1, Chapter 12, pp. 657-716, Nov. 2007. |
IEEE Std 802.3, Standard for Information Technology—Telecommunications and information exchange between systems—Local and metropolitan area networks—Specific requirements; Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications Corrigendum 1: Timing Considerations for PAUSE Operations, Annex 31B (MAC Control PAUSE operation), pp. 763-772, year 2005. |
IEEE Std 802.1Qbb., IEEE Standard for Local and metropolitan area networks—“Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks—Amendment 17: Priority-based Flow Control”, 40 pages, Sep. 30, 2011. |
Elias et al., U.S. Appl. No. 14/718,114, filed May 21, 2015. |
Elias et al., U.S. Appl. No. 14/994,164, filed Jan. 13, 2016. |
Shipner et al., U.S. Appl. No. 14/967,403, filed Dec. 14, 2015. |
U.S. Appl. No. 14/994,164, office action dated Jul. 5, 2017. |
U.S. Appl. No. 14/967,403 office action dated Nov. 9, 2017. |
U.S. Appl. No. 15/081,969 office action dated Oct. 5, 2017. |
European Application # 17172494.1 search report dated Oct. 13, 2017. |
European Application # 17178355 search report dated Nov. 13, 2017. |
U.S. Appl. No. 15/063,527 office action dated Feb. 8, 2018. |
U.S. Appl. No. 15/161,316 office action dated Feb. 7, 2018. |
U.S. Appl. No. 15/081,969 office action dated May 17, 2018. |
U.S. Appl. No. 15/432,962 office action dated Apr. 26, 2018. |
U.S. Appl. No. 15/161,316 Office Action dated Jul. 20, 2018. |
Number | Date | Country | |
---|---|---|---|
20170272372 A1 | Sep 2017 | US |