A wide and heterogeneous range of data travels across data networks such as the internet. Each application which sends flows of data may have different requirements in terms of required throughput, delay, and loss. The achieved network characteristics are a function of the network capacity, network conditions, other flows on the network and the congestion control methods, if any, being used by each network application.
A computing system manages communications congestion by selecting a transmission rate differently in different operating modes. In a delay-plus-loss mode, the transmission rate is selected as the lesser of a rate that would be selected by loss-based algorithm and by a delay-based algorithm. In a loss-based mode, the transmission rate is selected as the lesser of a rate that would be selected by loss-based algorithm, on one hand, and the maximum of a rate that would be selected by a delay-based algorithm or a rate proportional to the maximum estimated link rate divided by the number of data flows estimated to be competing for link bandwidth on the other hand.
A database may be maintained of observations of network and link performance over time, where the database contains such information as the maximum estimated link rate capacity, minimum delays, and minimum losses. In a learning mode, the system may observe and record the estimated link capacity rate, minimum delays, and minimum losses.
The system may transition from delay-plus-loss mode to the loss-based mode when there is a large reduction in the transmission rate in a short time span. Similarly, the system may transition from the loss-based mode to the delay-plus-loss mode when the rate that would be selected by a delay-based algorithm exceeds, by a factor, the maximum estimated link rate capacity divided by the estimated number of competing flows. The system may transition from the loss-based mode to the learning mode when the system has been too long in the loss-based mode. The system may transition from the learning mode when learning is complete, and may re-enter learning mode at random intervals to reassess network capacities.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to limitations that solve any or all disadvantages noted in any part of this disclosure.
An effective rate control strategy may be achieved by automatically switching between delay-based and loss-based rate control through observation of channel properties and changing loss and delay conditions. A multi-mode approach, which combines delay-based and loss-based congestion control algorithms, may be useful, for example, to control communication rates for real-time interactive applications, which prefer delay-based rate control, when those real-time interactive applications are competing for communications link bandwidth with data flows that are utilizing loss-based rate control.
If a real-time application were to use only delay-based control, it may be difficult to obtain a fair share of the bandwidth when competing with one or more loss-based flows. Further, when using purely loss-based control, a real-time application would have poor performance due to increased congestion-induced queuing delay and loss.
These difficulties may be overcome using combined delay and loss based congestion control algorithms with multiple modes of operation. For example, an algorithm may be implemented by a system using a state machine with three communications rate setting modes: a learning mode, a delay-plus-loss mode, and a loss-based.
In the learning mode, the system observes network conditions and selects the next mode to use accordingly. In the delay-plus-loss mode, the transmission rate is selected as the lesser of a rate that would be selected by loss-based algorithm or by a delay-based algorithm. In the loss-based mode, the transmission rate is selected as the lesser of a rate that would be selected by loss-based algorithm, on one hand, and the maximum of a rate that would be selected by a delay-based algorithm or a rate proportional to the maximum estimated link rate divided by the number of data flows estimated to be competing for link bandwidth on the other hand.
A database may be maintained of observations of network and link performance over time, where the database contains such information as maximum estimated link rate capacities, minimum delays, and minimum losses. In a learning mode, the system may observe and record the estimated link capacity rate, minimum delays, and minimum losses.
The multi-mode algorithm includes conditions for transitioning between modes. Changes in observed delays or losses may be used to determine when a different mode will be more advantageous for the performance of the application. For example, the system may transition from delay-plus-loss mode to the loss-based mode when there is a large reduction in the transmission rate in a short time. Similarly, the system may transition from the loss-based mode to the delay-plus-loss mode when the rate that would be selected by a loss-based algorithm exceeds the maximum estimated link rate capacity divided by the estimated number of competing flows. The system may transition from the loss-based mode to the learning mode when the system has been too long in the loss-based mode. The system may transition from the learning mode when learning is compete, and may re-enter learning mode at random intervals to reassess network capacities.
Applications may generally be divided into real-time applications and non-real-time applications. Non real-time-applications are those where communication performance, and perhaps ultimate application performance, is determined primarily by average throughput. In these applications, the delay between when a sender generates a packet to when the receiver consumes the packet may be significantly larger than inherent network latencies without adversely impacting application performance. Examples include file-transfer protocol (FTP), non-interactive web traffic, and video-on-demand (VOD). For non-real-time applications often what matters most is how much information is transported, rather than precisely when it arrives. Congestion control protocols for non-real-time applications are often based on observed communications losses. I.e., the rate of transmission is determined by examining packet loss. Under such protocols, the transmission rate may be increased until the loss rate is above some threshold, which may occur when transmission backlogs exceed buffer capacities.
Real-Time applications are those where the application performance may be principally determined by throughput, delay, and loss. In these applications, generally the delay between when a sender generates a packet to when the receiver consumes the packet is on the order of inherent network latencies. Any network delays or losses may be critical to a user's experience of the application. Examples include VoIP, video conferencing, online-gaming, and interactive cloud applications. For real-time applications, congestion control protocols are preferably based on delay-based rate control, i.e. the rate of transmission is determined by examining packet delay.
Ideally, all the applications using a given network link would utilize compatible rate control algorithms to provide for easier coordination of fair sharing of the link. However, makers of non-real-time applications are unlikely to adopt the use of delay-based rate control algorithms, for example. Delay-based rate control algorithms may be significantly more difficult to implement than loss-based rate control algorithms. This is because delay is typically a noisier signal than packet loss. Further, delay-based rate control algorithms may result in lower throughput when competing with loss-based rate control algorithms. Thus for non-real-time applications, where performance is primarily determined by throughput, it is generally preferred to use loss-based rate control algorithms.
At the same time, the use of loss-based rate control algorithms by non-real-time applications may result in significant increases in congestion-induced packet loss and queuing delay on a link. This in turn results in poorer performance of real-time applications that share the network link. Congestion induced packet loss typically occurs at higher congestion levels than congestion induced queueing delay. Therefore, to maintain some share of channel capacity, an application which prefers delay-based flow may nonetheless benefit from using loss-based rate control when sharing a link on which other applications are using loss-based rate controls.
For example, a real-time application running on computer 114 may use a link 150 to communicate with computer 134. Link 150 is physically implemented via local link 116 and local router 112 and external link 118 to the network 120, and then via the external link 138 down to router 132 and internal link 146 to computer 134. Throughput, loss, and delay on link 150 will be a function of the physical capacities of the component devices, the traffic flows thereon, and the congestion management algorithms implemented by the applications directing the traffic flows.
In
The computer 241 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media provide storage of computer readable instructions, data structures, program modules and other data for the computer 241. In
The computer 241 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 246. The remote computer 246 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 241, although only a memory storage device 247 has been illustrated in
When used in a LAN networking environment, the computer 241 is connected to the LAN 245 through a network interface or adapter 237. When used in a WAN networking environment, the computer 241 typically includes a modem 250 or other means for establishing communications over the WAN 249, such as the Internet. The modem 250, which may be internal or external, may be connected to the system bus 221 via the user input interface 236, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 241, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
Herein, algorithms and devices are described in terms of controlling the transmission rate R, which is the number of bits or bytes to be transmitted in some given unit of time, by a given computer application. It will be appreciated that such algorithms and devices may equally be described as controlling the window, W, which is the maximum number of bits or bytes in flight that can be outstanding, i.e., that have not yet been acknowledged or declared to be lost. A simple approximation between the two can be given W≈R*SRTT, where SRTT is a smoothed version of the current round-trip time RTT.
In learning mode 310, the system observes network properties and conditions. The transitions 301-305 denote times when the system will switch from using one mode to another. All modes may inform, and be informed by, a database 340 containing records of observed network properties, e.g., estimates of maximum link data carrying capacities, and minimum expected link losses and delays. RMAX, δMIN, and εMIN, may be determined from network observations in learning mode 310. The delay-plus-loss mode 320 and loss mode 330 may similarly inform, and be informed by, the database 340, whereby data from the database 340 is used to initialize operation of each mode and to store observations made during each mode.
When learning is complete, the algorithm 300 may follow transition 301 to switch to the delay-plus-loss mode 320. Learning mode 310 may be invoked by any other mode. For example, the delay-plus-loss mode 320 may reinitiate the learning mode 310, following transition 302, at random intervals. Similarly, the loss-mode 330 may initiate the learning mode 310, following transition 305, upon the expiration of a watchdog time, for instance. Alternatively, learning may continue in the background of the other modes at all times.
During learning mode, congestion signals are not used to control the rate R. To obtain an estimate of δMIN, and εMIN, the transmission rate is set to a very low rate which is known with high probability not to cause network congestion. The observed queuing delay and loss are assumed to be inherent to the network and are the values for δMIN, and εMIN respectively. To obtain an estimate of RMAX, the transmission rate is set to a very high rate which is known with high probability to cause congestion. The estimate of the received rate is the value for the estimated RMAX.
A system, such as a computer application, that is operating in a delay-plus-loss mode 320 may compute a first transmission rate based on delays, a second rate based on losses, and then select one of these rates as the rate to be used. Let Rδ be the transmission rate suggested by a rate controller which uses a congestion control algorithm based on observed queuing delay, and ΔRδ be the suggested change in transmission rate based on delay. Similarly, let Rε be the rate given by a rate controller which responds to loss data, and ΔRε be the suggested change in rate based on loss.
In delay-plus-loss mode 320, R may be selected by first assigning present values of Rδ and Rε to be equal to prior values plus suggested deltas, and then selecting the minimum of the two suggested rates, according to the following equations:
R
δ
:=R
δ
+ΔR
δ Equation 1
R
ε
:=R
ε
+ΔR
ε Equation 2
R:=min(Rδ, Rε) Equation 3
If a loss-based flow, such as a TCP flow, is introduced somewhere on the link being used by the application, the delay seen by the application will likely increase. This is because a loss-based flow will fill the buffer of the router on the bottleneck link, i.e., until packets are lost because they cannot be buffered. If the router buffer is large, a large delay will be seen, and the rate will be low. For example, consider a rate controller which utilizes the following rate control adjustment in response to queuing delay:
Rδ:=Rδ+k2(k0−δRδ) Equation 4
In Equation 4, δ is the queueing delay and k0 is the number of bits sent at the operating point. In steady state, the rate of bits sent R will be equal to the parameter k0 over the delay, as given here:
If the capacity of the link is 1 Mbps, and there are two delay-based flows which have the same k0 of 5000 bits, each flow with have a rate of will be 500 kbps, with a steady state queueing delay of 10 ms.
However, now if one of the two flows is a loss-based flow, such as a TCP flow, the situation will be different. A TCP flow will fill the buffer completely. If the router buffer is 200 ms in length, with the same k0, the delay-based flow only get 25 kbps, and the TCP flow will get 975 kbps.
The arrival of a competing loss-based flow may be sensed by the application, e.g., as a large reduction of R in a short time. Similarly, the presence of one more or more competing loss-based loss based flows may be inferred from receiving a small share of the link capacity. If the application is aware that the maximum link rate is close to 1 Mbps, and that the application is currently achieving a throughput of 25 kbps, i.e., 2.5% of the maximum, it is probable that a TCP flow is clogging a buffer. In such cases, the algorithm may follow transition 303 switch to its own the loss-based mode 330.
In the loss-based mode 330, Rδ and Rε are again computed separately based observed delay and loss, respectively. The transmission rate R is computed as follows:
Where RMAX is an estimate of the maximum link rate, and N is defined as follows:
Here avg(Rε) is the average rate reported by the loss based controller over some time window. γ is some constant less than 1, e.g., between 0.5 and 1. Typically γ is set close to one, e.g., γ=0.9. N is an estimate of the number of flows present. A default value of N=1 may be used initially. If no competing flows are estimated to be present, i.e., the flow from the application is expected to be the only flow on the link, N=1, and thus:
R=min(Rε, max(Rδ, γRMAX))=min(Rε, γRMAX)≈Rε Equation 8
This setting allows the flow managed by the algorithm 300 to compete effectively with a TCP flow. If there is one competing flow, N will converge to around two, and the rate we can achieve will be close to
for γ=0.9. This may be viewed as an approximate fair sharing of the contested resource. With M competing flows and L of flows under the control of algorithm 300, N converges to approximately M+L. Then, the R rate determined by loss-based mode 330 is:
This again is desirable. When the competing flows depart, N will start approaching L. However, since
If the averaging of Rε is done over a sufficiently long duration, then R will stay below RE for a sufficient duration of time for the queues to clear. Therefore, Rδ will start increasing and will eventually surpass γ(avg(Rε)). At this point, algorithm 300 may follow transition 304 to switch back to delay-plus-loss mode 320.
An alternative is to allow N to only increase during the loss-based mode. This guarantees convergence back to the delay-plus-loss mode. If we only allow N to increase, then essentially we can write it as
every time Rε updates.
In the example of
When learning is complete, in step 440 the results may be recorded in a database. This may include such parameters as maximum throughput, minimum delay, and minimum loss, along with the time, date, and other conditions under which the observations were made.
Next, in step 420, the system enters delay-plus-loss mode 420, and operates according to the Equations 1-3 as described for this mode in connection to
In loss-based mode 430, the system operates according to the Equation 6 described for this mode in connection to
In step 406, the system checks whether it may be likely to achieve better performance in delay-plus-loss based mode 420. For example, the system may check whether Rδ exceeds γ(avg(Rε)). This may be the case, for example, when a previously competing loss-based flow, such as a TCP data flow, is no longer contributing to losses along a shared network link. If a determination to switch modes is reached in step 406, the system switches to delay-plus-loss based mode 420.
In step 408 the system may further check whether to reinitiate learning mode 410, e.g., in response to a periodic timer or at random. Otherwise, the system returns to loss based mode 420 to re-compute a transmission rate in accordance with current network conditions.
The methods described herein may be implemented in a variety of ways on a variety of computing equipment. For example, a computing system may be configured to determine a final communications transmission rate to be used to send a flow over a shared network link, where the computing system comprises a processor and a memory storing thereon computer-executable instructions, the computing system being configured such that, when executed by the processor, the computer-executable instructions cause the computing system to perform one or more of the methods described herein in reference to
The methods implemented by the computing system may include: computing a first transmission rate based on observed communication delays; computing a second transmission rate based on observed communications losses; estimating a number of competing flows from an estimated link capacity rate divided by the second rate; in a first mode, determining the final rate as the minimum of the first rate and the second rate; and in a second mode, determining a third rate as the estimated link capacity rate divided by the estimated number of competing flows times a factor between 0.5 and 1, determining a fourth rate as the maximum of first rate and the third rate, determining the final rate as the minimum of the second rate and the fourth rate.
The observed communication delays and losses, and the estimated link capacity, may be drawn from current observation of network conditions. Alternatively these measurements may be drawn from a stored record of prior observations of network performance, or derived based on current conditions and prior observations.
The method may include transitioning from the first mode to the second mode when the final rate drops more rapidly than a predetermined performance shift rate. For example, a threshold may be set at 20% per second. If the final rate drops more than 20% in one second, the system may transition from the first mode to the second mode. The exact performance shift rate used as a threshold may be determined empirically, e.g., through observation of network performance over time. Alternatively, the performance shift rate may be a factory setting in the system. Further, the performance shift rate may be determined experimentally, e.g., by injecting a test TCP along the same being used by the application utilizing the system to set the communications rate, or simply by testing performance with various rate settings.
The method may include transitioning from the second mode to the first mode when the first rate is greater than the third rate. The method may include a third mode, which is a learning mode that includes observing the estimated link capacity rate, a minimum link loss, and a minimum link delay, and storing a second record comprising the estimated link capacity rate, the minimum link loss, and the minimum link delay. The method may include transitioning from the second mode to the third mode when a period of time operating in in loss-based mode exceeds a predetermined maximum period. For example, if the system has been operating in the loss-based mode for more than 30 seconds, it may automatically switch over to the learning mode. Similarly, the method may include transitioning from the third mode to the first mode when learning is completed. Further, the method may include transitioning from the first mode to the third mode at a random interval. For example, the system may switch to learning mode at a random interval between 1 and 120 minutes.