Computer networks have become an essential part of modern life. The convenience and efficiency of providing information, communication or computational power to individuals at their personal computer or other end user device has led to rapid growth of network computing, including internet as well as intranet systems and applications. Computer Networks, Third Edition (1996) by Andrew S. Tanenbaum, which is incorporated by reference herein, describes computer networks in detail.
Most computer network communication uses a layered software architecture for moving information between host computers connected to the network. The layers help to segregate information into manageable pieces. The rules and conventions for each layer are called the protocol of that layer.
One widely implemented reference model of a layered architecture for network computer communication is called TCP/IP. TCP denotes Transport Control Protocol, and IP denotes Internet Protocol. TCP/IP is described in detail in TCP/IP Illustrated, Volume 1: The Protocols (1994) by W. Richard Stevens and in TCP/IP Illustrated, Volume 2: The Implementation (1995) by Gary R. Wright and W. Richard Stevens, both of which are incorporated by reference herein.
TCP transmits data over a TCP connection in packages called segments; each segment comprises many bytes of data plus a header of control information. To ensure reliable transmission of data, TCP must recover from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. TCP assigns a sequence number to each byte transmitted and uses that sequence number in various procedures that guarantee reliability.
When TCP sends a segment, it starts a timer and waits for the other end to acknowledge reception of the segment. If an acknowledgment is not received before the end of the timeout interval, the sender concludes that the segment was lost and retransmits the segment. If the lost segment later arrives at the receiver, it represents a duplicate of the retransmitted segment. Any such old duplicate segment must be identified and discarded or it will corrupt the data transmission.
A sender must know how long an interval to wait for an acknowledgment before concluding that a segment has timed out. The time required to send a segment and receive an acknowledgment, called the round-trip time (RTT), will be greater on a busy connection, so the sender must adjust its timeout interval to reflect changes in network traffic. TCP continually modifies the timeout interval using a statistical analysis of RTTs for segments transmitted recently.
TCP achieves faster rates of data transmission by sending multiple segments before waiting for an acknowledgement. Because segments are not acknowledged individually, the measurement of RTT is not very accurate. The TCP Timestamps option provides a means to achieve more accurate measurement of RTT. This option is described in RFC 1323 is incorporated by reference herein.
The TCP Timestamps option allows the sender to place a timestamp value in every segment. The receiver reflects this value in the acknowledgement, allowing the sender to calculate by a single subtract operation an accurate RTT for each segment. This is called the RTTM (Round-Trip Time Measurement) mechanism.
TCP is a symmetric protocol, allowing data to be sent at any time in either direction, and therefore timestamp echoing may occur in either direction. For simplicity and symmetry, RFC 1323 specifies that timestamps should always be sent and echoed in both directions. For efficiency, RFC 1323 combines the timestamp and timestamp reply fields into a single TCP Timestamps option field which is part of the header for a TCP segment. Use of the TCP Timestamp option is not mandatory; the hosts negotiate the use of the Timestamp option during establishment of the TCP connection.
The timestamp value to be sent in a Timestamps option is to be obtained from a (virtual) clock that RFC 1323 calls the “timestamp clock”. The values of the timestamp clock must be at least approximately proportional to real time, in order to measure actual RTT.
In addition to allowing more accurate RTT calculations, the Timestamps option makes possible a simple mechanism to reject old duplicate segments. As noted above, old duplicate segments must be rejected so that they do not corrupt data transmission. The mechanism for identifying and rejecting old duplicate segments is called PAWS (Protect Against Wrapped. Sequence numbers) and is described in RFC 1323.
PAWS assumes that every received TCP segment (including data and acknowledgement segments) contains a timestamp whose values are monotone non-decreasing in time. The basic idea of PAWS is that a segment can be discarded as an old duplicate if it is received with a timestamp less than (ie earlier than) some timestamp recently received on the connection. In both the PAWS and the RTT mechanism, the “timestamps” are 32-bit unsigned integers in a modular 32-bit space. Thus, “less than” is defined the same way it is for TCP sequence numbers, and the same implementation techniques apply. If s and t are timestamp values, s<t if 0<(t−s)<2**31, computed in unsigned 32-bit arithmetic.
RTTM was specified in a symmetrical manner, so that sender timestamps are carried in both data and acknowledgement segments and are echoed in separate fields carried in returning acknowledgement or data segments. PAWS submits all incoming segments to the same test, and therefore protects against duplicate acknowledgement segments as well as data segments.
TCP connections demand significant processing power from a host computer. To reduce the processing load on a host, TCP connections may be offloaded to a network interface device (NID), such as a network interface card, a port that handles specific connections on a multiport card, or an auxiliary processor for a CPU. U.S. Pat. Nos. 6,226,680, 6,434,620, 6,427,171 and 6,807,581, which are incorporated by reference herein, describe devices and methods for network communication wherein the host allocates some of the most common and time consuming network processes to the NID (“fast-path”), while retaining the ability to handle less time intensive and more varied processing on the host stack (“slow-path”). Commonly, multiple NIDs may be coupled to single host.
In a typical embodiment, the host initiates a TCP connection and then transfers the connection to the NID, which has specialized hardware to perform the data transfer portion of the TCP protocol. If the NID encounters a problem, or if the host decides to take control of the connection, the connection is transferred back to the host. After the host solves the problem or performs some other action concerning the connection, the host may then return the connection to the NID to continue the data transfer. A particular TCP connection may “migrate” back and forth several times between the host and the NID before data transfer is completed and the connection is closed.
A TCP connection offloaded to a NID presents significant challenges when that connection is using the TCP timestamp option. For RTT measurement and PAWS to work correctly, the output segments on a connection must be sent with monotonically non-decreasing timestamps; in other words, a segment sent later should have a higher timestamp value than a segment sent earlier. If this requirement is not met, accurate calculation of RTT is impossible. Furthermore, the PAWS mechanism will assume that segments with lower timestamp values are old duplicates and will discard those segments. These erroneous discards will cause excessive retransmissions, leading to very poor performance and possibly dropped connections.
Timestamp values that do not increase monotonically can occur when a TCP connection migrates from a host to a NID or vice versa. For example, if a connection migrates from a host to a NID, and the NID clock is behind (slower than) the host clock, the NID might transmit segments with timestamp values that are lower (earlier) than the timestamp values of segments sent previously by the host.
One possible solution to this problem is to provide each connection with its own timestamp timer. This solution has two disadvantages: 1) it increases overhead because of the need to store additional TCP state variables, and 2) the host or NID must maintain and increment a separate timer for each connection using timestamps.
The present invention provides a better solution, which is to synchronize the timestamp clocks for the host and the NID so as to avoid poor performance and dropped connections. In accordance with one embodiment of the present invention, the host and the NID each maintain separate timestamp clocks which are synchronized by transfer of a clock value. The NID or host receives the transferred TCP connection and the transferred clock value, and decides whether to update its own clock to equal the transferred clock value, the decision being guided by the requirement to never allow the timestamp clock to run backward. Methods are disclosed for initializing the NID clock and for preventing acceleration of the host and NID timestamp clocks.
Referring now to
The NID 22 includes a processor 27 and a memory 71. The NID 22 provides a network interface that may be added with an adapter card, for example, or integrated as a part of the host computer. The NID 22 is connected to the host 20 by a conventional bus 52, which may be a host bus or an input/output (I/O) bus such as a peripheral component interconnect (PCI) bus. For the situation in which bus 52 is an I/O bus, the internal NID memory bus 53 and the host memory bus 51 may be coupled to I/O bus 52 with conventional interface mechanisms. Although
When a TCP connection migrates from a host to a NID or vice versa, a communication control block (CCB) can provide a mechanism for that migration. In
A CCB is a data structure containing the set of variables used to represent the state of a particular TCP connection. A portion of the CCB corresponds to most if not all variables of a Transmission Control Block (TCB), whereas other variables are used by the connection migration mechanism. A list of variables for a conventional TCB can be found in a textbook entitled TCP/IP Illustrated, Volume 2 (7th Edition, 1999) by Gary R. Wright and W. Richard Stevens, which is incorporated by reference herein, on pages 803-805. The migration mechanism can vary and need not include transfer of all of the CCB variables.
The TCP Timestamp option uses a “timestamp clock” or timer which is described in RFC 1323. In
In an initial state (step 201), a TCP connection is already running at the first processing mechanism, having been established by the host 20. In one embodiment of the invention, the first step of synchronizing the clocks is for the first processing mechanism to transfer its clock value to the second processing mechanism. For efficiency, the clock value is typically “piggybacked” on a message that transfers a TCP connection (step 203); the message contains the CCB, which represents the migrating connection, plus the clock value. Alternatively, the clock value may be piggybacked on some other message (step 204), or it may be transferred as an independent message (step 205).
There is a special case for initial synchronization of clocks 61 and 63. After the second processing mechanism receives the clock value (step 208), it checks whether any TCP connection was previously transferred (step 211). If not, the receiving NID updates (step 213) its clock 63 to have the same value as the clock value received from the host, without performing any additional tests concerning clock values.
When a clock value is sent from the first processing mechanism to the second processing mechanism, that received clock value may be ahead of (greater than) or behind (less than) the current value of the second clock for the second processing mechanism. The PAWS mechanism assumes, however, that timestamp values never decrease; the timestamp clock may not run backwards. So the second processing mechanism checks whether the value received is greater than the current value for the second clock (step 221), before updating the second clock to equal the value received (step 225).
If a connection migrates more than once between establishment and closing of the connection, then synchronization of clocks may occur at each migration. At clock update, the updated clock will be “nudged” ahead slightly. The repeated “nudging” may cause small increases in estimated RTTs. This small inaccuracy is preferable to enduring the many problems that result when the timestamp clock can run backwards.
Acceleration of timestamp clocks can result from the combination of out of phase clocks and clock updates caused by connection migration. TABLE 1 shows an example of clock acceleration where a single connection moves between host and NID. In this example, the clocks for the host and NID each tick once every 200 milliseconds (msec). The clocks are out of phase, however; the host clock ticks at time 100 msec, while the NID clock ticks at time 200 msec. After 600 msec has elapsed, each clock should have a value of 3. In fact, the NID clock has the value 5 and the host clock has the value 4. If the clocks have the same resolution but are out of phase, as much as a two-fold acceleration can occur, for example if the connection migrated between every tick of the out-of-phase clocks.
Although such acceleration is unlikely to occur due to a single migrating connection, it may be more problematic for the situation in which multiple migrating connections exist. For example, in some server implementations each NID may maintain thousands of connections. Moreover, for an embodiment in which multiple NIDs that share the same clock are coupled to a single host, as may be the case for a multiport card in which each port handles specific connections, the problem of timestamp clock acceleration may be exacerbated. It is desirable to prevent acceleration because acceleration of the timestamp clock will make RTT measurements less accurate.
Although we have described in detail various embodiments of the present invention, other embodiments and modifications will be apparent to those of skill in the art in light of this text and accompanying drawings. Therefore, the present invention is to be limited only by the following claims, which are intended to include all such embodiments, modifications and equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4366538 | Johnson et al. | Dec 1982 | A |
4485455 | Boone et al. | Nov 1984 | A |
4485460 | Stambaugh | Nov 1984 | A |
4589063 | Shah et al. | May 1986 | A |
4700185 | Balph et al. | Oct 1987 | A |
4991133 | Davis et al. | Feb 1991 | A |
5056058 | Hirata et al. | Oct 1991 | A |
5058110 | Beach et al. | Oct 1991 | A |
5097442 | Ward et al. | Mar 1992 | A |
5163131 | Row et al. | Nov 1992 | A |
5212778 | Dally et al. | May 1993 | A |
5280477 | Trapp | Jan 1994 | A |
5289580 | Latif et al. | Feb 1994 | A |
5303344 | Yokoyama et al. | Apr 1994 | A |
5412782 | Hausman et al. | May 1995 | A |
5418912 | Christenson | May 1995 | A |
5448566 | Richter et al. | Sep 1995 | A |
5485579 | Hitz et al. | Jan 1996 | A |
5506966 | Ban | Apr 1996 | A |
5511169 | Suda | Apr 1996 | A |
5517668 | Szwerinski et al. | May 1996 | A |
5524250 | Chesson et al. | Jun 1996 | A |
5535375 | Eshel et al. | Jul 1996 | A |
5548730 | Young et al. | Aug 1996 | A |
5566170 | Bakke et al. | Oct 1996 | A |
5574919 | Netravali et al. | Nov 1996 | A |
5588121 | Reddin et al. | Dec 1996 | A |
5590328 | Seno et al. | Dec 1996 | A |
5592622 | Isfeld et al. | Jan 1997 | A |
5598410 | Stone | Jan 1997 | A |
5619650 | Bach et al. | Apr 1997 | A |
5629933 | Delp et al. | May 1997 | A |
5633780 | Cronin et al. | May 1997 | A |
5634099 | Andrews et al. | May 1997 | A |
5634127 | Cloud et al. | May 1997 | A |
5642482 | Pardillos | Jun 1997 | A |
5664114 | Krech, Jr. et al. | Sep 1997 | A |
5671355 | Collins | Sep 1997 | A |
5678060 | Yokoyama et al. | Oct 1997 | A |
5682534 | Kapoor et al. | Oct 1997 | A |
5692130 | Shobu et al. | Nov 1997 | A |
5699317 | Sartore et al. | Dec 1997 | A |
5699350 | Kraslavsky | Dec 1997 | A |
5701434 | Nakagawa | Dec 1997 | A |
5701516 | Cheng et al. | Dec 1997 | A |
5727142 | Chen | Mar 1998 | A |
5742765 | Wong et al. | Apr 1998 | A |
5749095 | Hagersten | May 1998 | A |
5751715 | Chan et al. | May 1998 | A |
5752078 | Delp et al. | May 1998 | A |
5758084 | Silverstein et al. | May 1998 | A |
5758089 | Gentry et al. | May 1998 | A |
5758186 | Hamilton et al. | May 1998 | A |
5758194 | Kuzma | May 1998 | A |
5768618 | Erickson et al. | Jun 1998 | A |
5771349 | Picazo, Jr. et al. | Jun 1998 | A |
5774660 | Brendel et al. | Jun 1998 | A |
5778013 | Jedwab | Jul 1998 | A |
5778419 | Hansen et al. | Jul 1998 | A |
5790804 | Osborne | Aug 1998 | A |
5794061 | Hansen et al. | Aug 1998 | A |
5802258 | Chen | Sep 1998 | A |
5802580 | McAlpine | Sep 1998 | A |
5809328 | Nogales et al. | Sep 1998 | A |
5809527 | Cooper et al. | Sep 1998 | A |
5812775 | Van Seeters et al. | Sep 1998 | A |
5815646 | Purcell et al. | Sep 1998 | A |
5828835 | Isfeld et al. | Oct 1998 | A |
5848293 | Gentry et al. | Dec 1998 | A |
5872919 | Wakeland et al. | Feb 1999 | A |
5878225 | Bilansky et al. | Mar 1999 | A |
5892903 | Klaus | Apr 1999 | A |
5898713 | Melzer et al. | Apr 1999 | A |
5913028 | Wang et al. | Jun 1999 | A |
5920566 | Hendel et al. | Jul 1999 | A |
5930830 | Mendelson et al. | Jul 1999 | A |
5931918 | Row et al. | Aug 1999 | A |
5935205 | Murayama et al. | Aug 1999 | A |
5937169 | Connery et al. | Aug 1999 | A |
5941969 | Ram et al. | Aug 1999 | A |
5941972 | Hoese et al. | Aug 1999 | A |
5950203 | Stakuis et al. | Sep 1999 | A |
5970804 | Osborne | Oct 1999 | A |
5987022 | Geiger et al. | Nov 1999 | A |
5991299 | Radogna et al. | Nov 1999 | A |
5996013 | Delp et al. | Nov 1999 | A |
5996024 | Blumenau | Nov 1999 | A |
6005849 | Roach et al. | Dec 1999 | A |
6009478 | Panner et al. | Dec 1999 | A |
6016513 | Lowe | Jan 2000 | A |
6021446 | Gentry et al. | Feb 2000 | A |
6026452 | Pitts | Feb 2000 | A |
6034963 | Minami et al. | Mar 2000 | A |
6038562 | Anjur et al. | Mar 2000 | A |
6041058 | Flanders et al. | Mar 2000 | A |
6041381 | Hoese | Mar 2000 | A |
6044438 | Olnowich | Mar 2000 | A |
6047356 | Anderson et al. | Apr 2000 | A |
6049528 | Hendel et al. | Apr 2000 | A |
6057863 | Olarig | May 2000 | A |
6061368 | Hitzelberger | May 2000 | A |
6065096 | Day et al. | May 2000 | A |
6067569 | Khaki et al. | May 2000 | A |
6070200 | Gates et al. | May 2000 | A |
6078733 | Osborne | Jun 2000 | A |
6097734 | Gotesman et al. | Aug 2000 | A |
6101555 | Goshey et al. | Aug 2000 | A |
6111673 | Chang et al. | Aug 2000 | A |
6115615 | Ota et al. | Sep 2000 | A |
6122670 | Bennett et al. | Sep 2000 | A |
6141701 | Whitney | Oct 2000 | A |
6141705 | Anand et al. | Oct 2000 | A |
6145017 | Ghaffari | Nov 2000 | A |
6157944 | Pedersen | Dec 2000 | A |
6157955 | Narad et al. | Dec 2000 | A |
6172980 | Flanders et al. | Jan 2001 | B1 |
6173333 | Jolitz et al. | Jan 2001 | B1 |
6181705 | Branstad et al. | Jan 2001 | B1 |
6202105 | Gates et al. | Mar 2001 | B1 |
6223242 | Sheafor et al. | Apr 2001 | B1 |
6226680 | Boucher et al. | May 2001 | B1 |
6246683 | Connery et al. | Jun 2001 | B1 |
6247060 | Boucher et al. | Jun 2001 | B1 |
6279051 | Gates et al. | Aug 2001 | B1 |
6289023 | Dowling et al. | Sep 2001 | B1 |
6298403 | Suri et al. | Oct 2001 | B1 |
6324649 | Eyres et al. | Nov 2001 | B1 |
6334153 | Boucher et al. | Dec 2001 | B2 |
6343360 | Feinleib | Jan 2002 | B1 |
6345301 | Burns et al. | Feb 2002 | B1 |
6345302 | Bennett et al. | Feb 2002 | B1 |
6356951 | Gentry et al. | Mar 2002 | B1 |
6370599 | Anand et al. | Apr 2002 | B1 |
6385647 | Willis et al. | May 2002 | B1 |
6389468 | Muller et al. | May 2002 | B1 |
6389479 | Boucher | May 2002 | B1 |
6393487 | Boucher et al. | May 2002 | B2 |
6421742 | Tillier | Jul 2002 | B1 |
6421753 | Hoese et al. | Jul 2002 | B1 |
6427169 | Elzur | Jul 2002 | B1 |
6427171 | Craft et al. | Jul 2002 | B1 |
6427173 | Boucher et al. | Jul 2002 | B1 |
6434620 | Boucher et al. | Aug 2002 | B1 |
6434651 | Gentry, Jr. | Aug 2002 | B1 |
6449656 | Elzur et al. | Sep 2002 | B1 |
6453360 | Muller et al. | Sep 2002 | B1 |
6470415 | Starr et al. | Oct 2002 | B1 |
6473425 | Bellaton et al. | Oct 2002 | B1 |
6480489 | Muller et al. | Nov 2002 | B1 |
6487202 | Klausmeier et al. | Nov 2002 | B1 |
6487654 | Dowling | Nov 2002 | B2 |
6490631 | Teich et al. | Dec 2002 | B1 |
6502144 | Accarie | Dec 2002 | B1 |
6523119 | Pavlin et al. | Feb 2003 | B2 |
6526446 | Yang et al. | Feb 2003 | B1 |
6570884 | Connery et al. | May 2003 | B1 |
6587875 | Ogus | Jul 2003 | B1 |
6591302 | Boucher et al. | Jul 2003 | B2 |
6591310 | Johnson | Jul 2003 | B1 |
6648611 | Morse et al. | Nov 2003 | B2 |
6650640 | Muller et al. | Nov 2003 | B1 |
6657757 | Chang et al. | Dec 2003 | B1 |
6658480 | Boucher et al. | Dec 2003 | B2 |
6678283 | Teplitsky | Jan 2004 | B1 |
6681364 | Calvignac et al. | Jan 2004 | B1 |
6687758 | Craft et al. | Feb 2004 | B2 |
6697868 | Craft et al. | Feb 2004 | B2 |
6751665 | Philbrick et al. | Jun 2004 | B2 |
6757746 | Boucher et al. | Jun 2004 | B2 |
6765901 | Johnson et al. | Jul 2004 | B1 |
6807581 | Starr et al. | Oct 2004 | B1 |
6842896 | Redding et al. | Jan 2005 | B1 |
6912522 | Edgar | Jun 2005 | B2 |
6938092 | Burns | Aug 2005 | B2 |
6941386 | Craft et al. | Sep 2005 | B2 |
6965941 | Boucher et al. | Nov 2005 | B2 |
6996070 | Starr et al. | Feb 2006 | B2 |
7000031 | Fischer et al. | Feb 2006 | B2 |
7042898 | Blightman et al. | May 2006 | B2 |
7076568 | Philbrick et al. | Jul 2006 | B2 |
7089326 | Boucher et al. | Aug 2006 | B2 |
7093099 | Bodas et al. | Aug 2006 | B2 |
7124205 | Craft et al. | Oct 2006 | B2 |
7133940 | Blightman et al. | Nov 2006 | B2 |
7167926 | Boucher et al. | Jan 2007 | B1 |
7167927 | Philbrick et al. | Jan 2007 | B2 |
7174393 | Boucher et al. | Feb 2007 | B2 |
7185266 | Blightman et al. | Feb 2007 | B2 |
7191241 | Boucher et al. | Mar 2007 | B2 |
7191318 | Tripathy et al. | Mar 2007 | B2 |
7237036 | Boucher et al. | Jun 2007 | B2 |
7254696 | Mittal et al. | Aug 2007 | B2 |
7284070 | Boucher et al. | Oct 2007 | B2 |
20010004354 | Jolitz | Jun 2001 | A1 |
20010013059 | Dawson et al. | Aug 2001 | A1 |
20010014892 | Gaither et al. | Aug 2001 | A1 |
20010014954 | Purcell et al. | Aug 2001 | A1 |
20010025315 | Jolitz | Sep 2001 | A1 |
20010048681 | Bilic et al. | Dec 2001 | A1 |
20010053148 | Bilic et al. | Dec 2001 | A1 |
20020073223 | Darnell et al. | Jun 2002 | A1 |
20020112175 | Makofka et al. | Aug 2002 | A1 |
20030056136 | Aweya et al. | Mar 2003 | A1 |
20030066011 | Oren | Apr 2003 | A1 |
20030110344 | Szezepanek et al. | Jun 2003 | A1 |
20030165160 | Minami et al. | Sep 2003 | A1 |
20040054814 | McDaniel | Mar 2004 | A1 |
20040059926 | Angelo et al. | Mar 2004 | A1 |
20040153578 | Elzur | Aug 2004 | A1 |
20040213290 | Johnson et al. | Oct 2004 | A1 |
20040246974 | Gyugyi et al. | Dec 2004 | A1 |
20060239300 | Hannel et al. | Oct 2006 | A1 |
Number | Date | Country |
---|---|---|
WO 9819412 | May 1998 | WO |
WO 9850852 | Nov 1998 | WO |
WO 9904343 | Jan 1999 | WO |
WO 9965219 | Dec 1999 | WO |
WO 0013091 | Mar 2000 | WO |
WO 0104770 | Jan 2001 | WO |
WO 0105107 | Jan 2001 | WO |
WO 0105116 | Jan 2001 | WO |
WO 0105123 | Jan 2001 | WO |
WO 0140960 | Jun 2001 | WO |
WO 0159966 | Aug 2001 | WO |
WO 0186430 | Nov 2001 | WO |