This application relates to method and system to adjust Congestion Notification (CN) control loop parameters at a congestion point.
Congestion is the networking term that refers to a situation where too much network traffic is clogging network pathways. Common causes of congestion may include too many users on a single network segment or collision domain, high demand from bandwidth-intensive networked applications, a rapidly growing number of users accessing the Internet, the increased power of personal computers (PCs) and servers, etc.
Some common indicators of network congestion include increased network delay. All networks have a limited data-carrying capacity. When the load is light, the average time from when a host submits a packet for transmission until it is actually sent on the network is relatively short. When many users are vying for connections and communicating, the average delay increases. This delay has the effect of making the network appear “slower,” because it takes longer to send the same amount of data under congested conditions than it does when the load is light.
In extreme circumstances, an application can fail completely under a heavy network load. Sessions may timeout and disconnect, and applications or operating systems may actually crash, requiring a system restart.
Embodiments of the present invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
In networking, a network flow from a traffic source directed to a traffic destination (sometimes referred to as a traffic sink) may go though one or more interconnect devices, such as, for example, network switches and network routers. An interconnect device that is responsible for forwarding network flow from the source of traffic to the traffic sink may maintain a queue to facilitate storing and processing of the incoming traffic. An interconnect device may be unable to process the incoming messages (or frames) at the rate the messages are arriving, which may result in the queue getting filled up. In order to address this issue, the interconnect may be configured to detect that the queue is filled beyond a predetermined threshold and to notify the source of the traffic that it is necessary for the source of the traffic to slow down the rate at which the source outputs network messages. The state of the queue, where the queue is filled beyond a predetermined threshold, may be referred to as a state of congestion. The interconnect thus may be referred to as a congestion point (CP), and the queue to facilitate the storing and processing of the incoming messages may be referred to as a congestion point queue.
One example congestion management mechanism is a so-called Congestion Notification (CN). CN, in one example embodiment, relies on detecting congestion condition by monitoring the state of the congestion point queue and sending a feedback message back to the source of traffic in response to the determined state of the congestion point queue. The flow of notification messages (or CN messages) from the congestion point to reaction point in response to the traffic flow from the reaction point to the congestion point may be referred to as a Congestion Notification loop (CN loop). A network within which CN mechanism has been implemented may be referred to as a CN-enabled network.
An example feedback message may include various information related to the state of the associated congestion queue. The source of the traffic receives the feedback message, and takes appropriate measures based on the queue state information provided with the feedback message. CN, thus, may permit shifting network congestion from the core of the network towards the edge of the network, where there is less traffic aggregation and where more resources may be available to address congestion more effectively.
A device at which congestion mitigation measures may be put in place, e.g., the source of the traffic or another interconnect device, may be referred to as a reaction point (RP). Network frames entering a CN-enabled network may be tagged by the reaction point with a congestion management tag (CM-Tag). A CM-Tag identifies, e.g., with a FlowID value, those traffic flows to which congestion mitigation measures (or rate control measures) should be applied. Congestion mitigation measures may include, for example, lowering the rate at which network frames are transmitted from the reaction point. The reaction point may be configured to analyze the responsiveness of the network and determine the adjustment parameters to alter the rate of the output network flow based on the queue state information provided from the congestion point via a feedback message and the determined responsiveness of the network. For example, if the feedback message indicates that the congestion point queue is oversubscribed and the responsiveness of the network is high, the rate of the flow may be decreased less dramatically than if the if the responsiveness of the network is low. The nature of the rate adjustment performed at the reaction point may be long term (semi-static), short-term (e.g., associated with a single congestion event), or even time-of-day (based on heuristics).
In one example embodiment, the congestion point may include a CN control loop adjustment system to provide the reaction point with feedback messages generated based on observations with respect to the responsiveness of the network made at the congestion point. An example CN control loop adjustment system may be configured to measure the responsiveness of the CN loop by monitoring how the CN messages that are being sent from the congestion point to the reaction point affect both the reduction and the increase of the network flow load. The responsiveness of the CN loop may be determined, in one example embodiment, by monitoring the depth of the congestion point queue over time. CN messages may be then generated to include control parameters that are based on the responsiveness of the CN loop. The congestion point may utilize the determined responsiveness of the CN loop to alter the parameters in the CN message that are related to the state of the congestion point queue. The parameters in the CN message that are related to the state of the congestion point queue, may include values associated with the relationship between the current depth of the queue and the depth of the queue that determines congestion, the severity of congestion indicator, etc. An example embodiment of a network environment where a system to adjust a CN control loop resides at a congestion point may be described with reference to
The end nodes 150 and 160 may simultaneously send network traffic at a line rate of, e.g., 10 Gbps, to the end node 170. The aggregate rate of traffic originating from the end nodes 150 and 160 may exceed the capacity of the link 142 connecting the core switch 110 to the edge switch 140. Specifically, the depth of a congestion point queue 112, which is associated with the core switch 110 and the link 142, may increase significantly above the target level around which the queue length should oscillate under normal congestion conditions.
The core switch 110 may be configured to support CN or some other congestion management technique. A system to adjust the CN control loop parameters 114 may reside at the core switch 110 and be configured to monitor the depth of the congestion point queue 112. The system to adjust the CN control loop parameters 114 may utilize the depth of the congestion point queue 112 over time in order to determine the responsiveness of the CN control loop. When a congestion condition is detected, the system to adjust the CN control loop parameters 114 may generate and send an appropriate feedback message towards the end nodes 150 and 160.
The feedback messages may be processed at the edge switches 120 and 130 that are shown to include respective rate limiters 122 and 132. It will be noted that, in some embodiments, feedback messages may be processed at end nodes 150 and 160, provided the end nodes 150 and 160 support CN. The processing of a feedback message at a reaction point may result in the instantiation of a rate limiter, unless a rate limiter has already been installed. In one example embodiment, a rate limiter may be configured to slow down a congesting traffic flow in order to mitigate congestion at the congestion point (e.g., at the core switch 110). In some example embodiments, if congestion improves or dissipate completely, feedback messages may be generated at the congestion point too cause the rate limiters to increase the rate of traffic flow in order to avoid under-utilizing the bandwidth at the congestion point.
It will be noted, that a system to manage network traffic congestion adjust a CN control loop may be provided at any congestion point, e.g., at the edge switch 140, at the end node 170, etc. Example components of a system to manage network traffic congestion configured to adjust a CN control loop parameters at a congestion point may be described with reference to
Referring to
As shown in
As shown in
The length of every frame that arrives at the queue 310 is accumulated in a length (L) variable. An incoming frame is sampled as soon as the value of L is determined to be greater than the sampling interval S. A new random interval Sr is then selected and the value of L is set to zero. The fixed interval Sf may be configurable in the range [0, 256] KB with 1 byte increments. The random interval Sr may be generated in the range [0, 64] KB with 1 byte increments. In some embodiments, mechanisms may be deployed that indicate that some frames should not be subject to congestion management.
The two components of the CN feedback message, Qoff and Qdelta, may be computed. As shown in
The value of Qoff being positive indicates that the queue 310 is above the equilibrium threshold. In certain networks, Qeq may be set differently for different congestion points. In order to generate CN messages carrying a normalized feedback, a scaling factor Qscale may be used to multiply the values of Qoff and Qdelta copied into a CN frame. For example, Qoff and Qdelta may be calculated and the actual CN frame may include Qscale.Qoff and Qscale.Qdelta values. For example, a congestion point with a relatively small queue may have a lower Qeq, which, in turn, results in a relatively smaller range for the Qoff and Qdelta values being generated by such congestion point. In one example embodiment, when Qoff and Qdelta are both zero, no CN message is being generated. A CN frame generated by a congestion point may carry in the payload the CM-Tag copied from the sampled frame, as described below, with reference to
Returning to
Each time the congestion point queue monitor 210 samples a frame to determine the current state of the congestion point queue, the network reactivity history generator 220 generates an updated reaction to congestion history by consolidating the current state of the congestion point queue with the current reaction to congestion history. The feedback message generator 230 then generates a feedback message. If a feedback message already exists, the feedback message generator 230 utilizes a feedback message update module 232 to update the current feedback message, based on the updated reaction to congestion history. Example operations performed by the congestion management system 200 may be described with reference to
As shown in
As mentioned above, the exchange of messages between a device where congestion is detected (congestion point) and a device where congestion mitigation measures are put in place (reaction point) may be performed utilizing a CM-Tag. The frames entering a CN-enabled network may be tagged by the reaction point with a CM-Tag. A CM-Tag, in one example embodiment, identifies traffic flows that are subject to rate control. Thus, if a congestion point receives a frame without a CM-Tag, an exception flag may be raised, which may cause the frame to be dropped. A CM-Tag may also denote a particular traffic flow that cannot be rate-limited (e.g., network control traffic).
When congestion is detected at a congestion point, a congestion management system starts generating and sending feedback messages to the reaction point(s) associated with the traffic flows that are believed to have caused congestion. The feedback message, in one example embodiment, is an Ethernet frame known as the CN Frame. An example format of a CN frame 500 may be described with reference to
A feedback message represented by the CN frame 500 may be generated by a congestion point in response to sampling incoming network traffic, as described above, e.g., with reference to
Field 506 includes the IEEE 802.1Q Tag or S-Tag, which may be copied from the sampled frame. Field 506 of the CN Frame (802.1Q Tag or S-Tag, sometimes referred to as a priority field of the CN Frame) is set, in one embodiment, either to the priority of the sampled frame or to a configurable priority value. The S-Tag in the field 506 may be set to the highest priority in order to minimize the latency experienced by CN Frames.
Field 510 is shown in
The Version field 512 indicates the version of the CN protocol. An M field 516, in one example embodiment, indicates a condition of mild buffer congestion, while an S field 518 indicates a condition of severe buffer congestion.
A CPIDhsh field 520 is used to carry a hash value associated with a Congestion Point Identifier (CPID) field 522 that follows. The CPIDhsh field 520 may be utilized to minimize the amount of false positive feedback messages generated by multiple congestion points along the path from a source device to a destination device. The CPID field 522 may be utilized to univocally identify a congested entity (e.g., a congestion point queue) within a contiguous set of devices that support CN. A contiguous set of devices that support CN is sometimes referred to as a Congestion Management Domain. CPID information may be propagated to the reaction point in order to create a bi-univocal association between the congestion point and one or more reaction points. Because the CPID is an opaque object, the format of the CPID may be only relevant to the congestion point that assigns it. In order to ensure global uniqueness, the CPID may include the MAC address of the switch with which the congestion point is associated. The CPID may also include a local identifier to ensure local uniqueness.
A Qoff field 524 and Qdelta field 526 may include the actual feedback information conveyed by the congestion point to the reaction point. As described above, with reference to
When a reaction point receives a feedback message (e.g., a CN frame) from a congestion point, and the feedback message causes a congestion mitigation action to be performed on a particular traffic flow (e.g., the activation of a rate limiter or an adjustment of one or more parameters of the rate limiter), the CPID field 522 and the CPIDhsh field 520 from the CN Frame may be stored in local registers associated with the corresponding rate limiter. The network flow frames that may be subsequently injected by the reaction point in the network will carry a CM-Tag with a Rate Limited Tag (RLT) Option containing the CPIDhsh from the local register at the reaction point. The CN frame 500 may also include other fields, not shown in
It will be noted, that while the embodiments of the inventive techniques have been described with reference to CN, the techniques to adjust control loop parameters at the congestion point may be utilized with congestion management systems other than CN, such as Transmission Control Protocol (TCP) and Frame Relay protocol.
The example computer system 600 includes a processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 604 and a static memory 606, which communicate with each other via a bus 608. The computer system 600 may further include a video display unit 610 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 600 also includes an alphanumeric input device 612 (e.g., a keyboard), optionally a user interface (UI) navigation device 614 (e.g., a mouse), optionally a disk drive unit 616, a signal generation device 618 (e.g., a speaker) and a network interface device 620.
The disk drive unit 616 includes a machine-readable medium 622 on which is stored one or more sets of instructions and data structures (e.g., software 624) embodying or utilized by any one or more of the methodologies or functions described herein. The software 624 may also reside, completely or at least partially, within the main memory 604 and/or within the processor 602 during execution thereof by the computer system 600, the main memory 604 and the processor 602 also constituting machine-readable media.
The software 624 may further be transmitted or received over a network 626 via the network interface device 620 utilizing any one of a number of well-known transfer protocols, e.g., a Hyper Text Transfer Protocol (HTTP).
While the machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals. Such medium may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAM), read only memory (ROMs), and the like.
The embodiments described herein may be implemented in an operating environment comprising software installed on any programmable device, in hardware, or in a combination of software and hardware.
Thus, a method and system to adjust Congestion Notification control loop parameters at a congestion point have been described. Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.