The present disclosure relates to transmission control protocol (TCP). More particularly, the present disclosure relates to round trip time (RTT) in a TCP environment.
A TCP connection operating in an environment of highly variable RTT experiences many spurious retransmissions.
RFC 6298 describes an algorithm that computes a current retransmission timeout (RTO). In order to compute the current RTO, a TCP sender maintains two state variables, smoothed round trip time (SRTT) and round trip time variation (RTTVAR). Section 2.4 of RFC 6298 states that whenever RTO, e.g., retransmit timer, is computed, if less than 1 second, the RTO should be rounded up to 1 second.
The action to round up to 1 second has the undesirable effect of causing a sending device running TCP to wait longer to retransmit than needed. Most RTTs are less than 100 milliseconds. For example, a ping from a home in Santa Clara, Calif. to a server in Boston, Mass. can be around 92 mS. Additionally, a ping from Santa Clara, Calif. to distant Perth, Australia can be around 325 mS.
TCP code that runs in a virtual machine or in a thread in user-space outside of a kernel may experience several hundred milliseconds in which the TCP code does not run. One example of why the TCP code may not run is because the central processing unit (CPU) is busy running other threads. During the time that the code does run, RTTs may all be a few milliseconds. When the code gets another slice of runtime, some of the RTT measurements may be much longer. Several RTT measurements during the period of optimal thread runtime cause the RTT estimator to estimate a short RTT. When the kernel swaps the TCP thread out and begins to run a different thread, sessions that have a TCP packet in transit will experience a retransmission after the TCP thread is swapped back in again.
Any RTT estimator that considers short RTTs in its calculation will cause spurious retransmissions in this running environment. This occurs because there are many more short RTTs than long ones and the long RTTs are very much longer than the short RTTs.
Other algorithms for determining RTT estimates have been documented. These algorithms typically reduce the RTT estimate for every low RTT sample recorded. Some algorithms give higher weight to high samples than low ones, e.g., the Peak-Hopper-RTO algorithm. However, the Peak-Hopper-RTO algorithm does not adequately address the spurious retransmission problem for a TCP running in a thread.
Prior art algorithms use exponential smoothing algorithms to calculate RTO. However, smoothing algorithms do not work well in bursty or stochastic environments.
Therefore, there is a need in the art to solve the above described problems in order to minimize spurious retransmissions.
Disclosed is a method for reducing spurious retransmissions in a transmission control protocol (TCP) environment. In one embodiment, an interval is established. A retransmission timeout (RTO) is set to remain constant during the interval. A maximum of all round trip time (RTT) measurements is used during the interval to set a new RTO for a next interval. An interval boundary is determined.
Also disclosed is an apparatus for reducing spurious retransmissions in a transmission control protocol (TCP) environment. The apparatus can include a processor. The processor can be configured, in one embodiment, to: establish an interval; set a retransmission timeout (RTO) to remain constant during the interval; use a maximum of all round trip time (RTT) measurements during the interval to set a new RTO for a next interval; and determine an interval boundary.
The interval boundary can be determined when a RTT measurement of the interval is measured to be higher than a RTT used to determine an RTO for the interval.
The maximum of all RTT measurements for the interval can be set as the new RTO for the next interval.
The maximum of all RTT measurements for the interval can be used to calculate the new RTO for the next interval.
The interval boundary can be determined once a certain amount of data has been transmitted into a connection.
The interval boundary can be defined as an end of the interval or a beginning of the next interval.
TCP may be run on a plurality of network elements. The plurality of network elements can include one or more of physical network elements and virtual machines running TCP.
A determination can be made as to when a spurious transmission is acceptable. The acceptability of the spurious retransmission can be determined according to a packet retransmission rate threshold.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
References in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. “Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. “Connected” is used to indicate the establishment of communication between two or more elements that are coupled with each other.
As used herein, a network element (e.g., a router, switch, bridge) is a piece of networking equipment, including hardware and software that communicatively interconnects other equipment on the network (e.g., other network elements, end stations). Some network elements are “multiple services network elements” that provide support for multiple networking functions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video). Subscriber end stations (e.g., servers, workstations, laptops, netbooks, palm tops, mobile phones, smartphones, multimedia phones, Voice Over Internet Protocol (VOIP) phones, user equipment, terminals, portable media players, tablets, GPS units, gaming systems, set-top boxes) access content/services provided over the Internet and/or content/services provided on virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet. The content and/or services are typically provided by one or more end stations (e.g., server end stations) belonging to a service or content provider or end stations participating in a peer to peer service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., username/password accessed webpages providing email services), and/or corporate networks over VPNs. Typically, subscriber end stations are coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge network elements, which are coupled (e.g., through one or more core network elements) to other edge network elements, which are coupled to other end stations (e.g., server end stations).
Different embodiments of the invention may be implemented using different combinations of software, firmware, and/or hardware. Thus, the techniques shown in the figures can be implemented using code and data stored and executed on one or more electronic devices (e.g., an end station, a network element). Such electronic devices store and communicate (internally and/or with other electronic devices over a network) code and data using computer-readable media, such as non-transitory computer-readable storage media (e.g., magnetic disks; optical disks; random access memory; read only memory; flash memory devices; phase-change memory) and transitory computer-readable transmission media (e.g., electrical, optical, acoustical or other form of propagated signals—such as carrier waves, infrared signals, digital signals). In addition, such electronic devices typically include a set of one or more processors coupled to one or more other components, such as one or more storage devices (non-transitory machine-readable storage media), user input/output devices (e.g., a keyboard, a touchscreen, and/or a display), and network connections. The coupling of the set of processors and other components is typically through one or more busses and bridges (also termed as bus controllers). Thus, the storage device of a given electronic device typically stores code and/or data for execution on the set of one or more processors of that electronic device.
In an environment where the high spikes in RTT measurements are an order of magnitude higher than the regular measurements and the number of high spikes are an order of magnitude less, any RTT estimator that reduces its estimate as a response to receiving an RTT measurement will cause a spurious retransmit when the spike occurs. This environment is seen, for example, when TCP is operating in a thread in a performance stressed router.
There can be various causes for spurious retransmissions. One such cause can be attributed to the inability of the receiver 110 to send an ACK to the sender because the TCP stack runs in a user space of the network element, e.g., receiver 110. This user space runs in a thread that is not always running. When the thread is not running, a delay can be introduced to the RTT variable as measured by the sender 105 because the receiver 110 can receive a packet from sender 105 but will not be able to send a response to the sender 105. In addition, if this delay is too long, a spurious retransmission from the sender 105 to the receiver 110 will occur because an ACK has not been received from receiver 110 prior to the commencement of the RTO period.
At block 310, the RTO is set to remain constant during the interval. At block 315, a maximum of all RTT measurements during the interval is used to set a new RTO for a next interval. In one embodiment, the RTO is set to 1.25 times the highest measured RTT. In another embodiment, the RTO can be set using a high-biased exponential smoothing algorithm.
At block 320, an interval boundary is determined. The interval boundary, e.g., the end of an interval or the beginning of the next interval, is determined when either: 1. A RTT is measured to be higher than the RTT used to determine the RTO for the current interval; or 2. TCP has transmitted a certain amount of data into the connection. In one embodiment, the RTO of a present interval is set to a value of a maximum RTT of the previous interval.
High RTTs cause spurious retransmits, not low RTTs. Therefore, the present disclosure considers high RTT measurements. Of course, if an RTT estimator only ever used the highest measured RTT and the network conditions improve, the RTT estimator would never reduce the estimate to track the improved network conditions. However the risk of a spurious retransmission is increased if the RTT estimator decreases its estimate. In one embodiment, a determination is made as to when a spurious transmission is acceptable, e.g., according to a packet retransmission rate threshold. Whenever a retransmission occurs, a maximum of a full window is retransmitted. In one embodiment, one in 20 to 50 packets can be retransmissions. Therefore, in this embodiment, the RTT estimate is not reduced until 20 to 50 maximum windows of data have been transmitted.
One advantage of the present disclosure is that spurious retransmissions are reduced in an environment of highly variable RTT. Also, the RTT is not constrained by any minimum value. Thus, for example, if the RTT is never more than 1 millisecond, then the RTT estimator will use 1 millisecond to determine the RTO period and not the RFC 6298 mandated minimum of 1 second.
The processes described above, including but not limited to those presented in connection with
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. The description is thus to be regarded as illustrative instead of limiting.
This application claims priority to U.S. Provisional Application Ser. No. 61/842,581, filed on Jul. 3, 2013, the entire disclosure of which is hereby incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61842581 | Jul 2013 | US |