Obtaining a destination address so that a network interface device can write network data without headers directly into host memory

Information

  • Patent Application
  • 20040240435
  • Publication Number
    20040240435
  • Date Filed
    June 29, 2004
    20 years ago
  • Date Published
    December 02, 2004
    20 years ago
Abstract
A Network Interface device (NI device) coupled to a host computer receives a multi-packet message from a network (for example, the Internet) and DMAs the data portions of the various packets directly into a destination in application memory on the host computer. The address of the destination is determined by supplying a first part of the first packet to an application program such that the application program returns the address of the destination. The address is supplied by the host computer to the NI device so that the NI device can DMA the data portions of the various packets directly into the destination. In some embodiments the NI device is an expansion card added to the host computer, whereas in other embodiments the NI device is a part of the host computer.
Description


TECHNICAL FIELD

[0006] The present invention relates generally to computer or other networks, and more particularly to protocol processing for information communicated between hosts such as computers connected to a network.



BACKGROUND INFORMATION

[0007] One of the most CPU intensive activities associated with performing network protocol processing is the need to copy incoming network data from an initial landing point in system memory to a final destination in application memory. This copying is necessary because received network data cannot generally be moved to the final destination until the associated packets are: A) analyzed to ensure that they are free of errors, B) analyzed to determine which connection they are associated with, and C) analyzed to determine where, within a stream of data, they belong. Until recently, these steps had to be performed by the host protocol stack. With the introduction of the intelligent network interface device (as disclosed in U.S. patent application Ser. Nos. 09/464,283, 09/439,603, 09/067,544, and U.S. Provisional Application Ser. No. 60/061,809), these steps may now be performed before the packets are delivered to the host protocol stack.


[0008] Even with such steps accomplished by an intelligent network interface device, there is another problem to be addressed to reduce or eliminate data copying, and that is obtaining the address of the destination in memory and passing that address to the network interface device. Obtaining this address is often difficult because many network applications are written in such a way that they will not provide the address of the final destination until notified that data for the connection has arrived (with the use of the “select( )” routine, for example). Other attempts to obtain this address involve the modification of existing applications. One such example is the Internet Engineering Task Force (IETF) Remote DMA (RDMA) proposal, which requires that existing protocols such as NFS, CIFS, and HTTP be modified to include addressing information in the protocol headers. A solution is desired that does not require the modification of existing applications or protocols.



SUMMARY

[0009] A multi-packet message (for example, a session layer message) is to be received onto a Network Interface device (NI device) and the data payload of the message is to be placed into application memory in a host computer. The NI device receives the first packet of the message and passes a first part of this first packet to the operating system on the host. In one embodiment, the first part of the first packet includes the session layer header of the message. The operating system passes this first part of the first packet to an application program. The application program uses the first part of the first packet to identify an address of a destination in application memory where the entire data payload is to be placed. The application program returns the address to the operating system and the operating system in turn forwards the address to the NI device. The NI device then uses the address to place the data portions of the various packets of the multi-packet message into the destination in application memory. In one embodiment, the NI device DMAs the data portions of the packets from the NI device directly into the destination. In some embodiments, the NI device DMAs only data into the destination such that the destination contains the data payload in one contiguous block without any session layer header information, without any transport layer header information, and without any network layer header information.


[0010] In some embodiments, the NI device is an interface card that is coupled to the host computer via a parallel bus (for example, the PCI bus). In other embodiments, the NI device is integrated into the host computer. For example, the NI device may be part of communication processing device (CPD) that is integrated into the host computer.


[0011] Other structures and methods are described in the detailed description below. This summary does not purport to define the invention. The invention is defined by the claims.







BRIEF DESCRIPTION OF THE DRAWINGS

[0012]
FIG. 1 is a diagram of a Network Interface Device (NI device) in accordance with an embodiment of the present invention. The NI device performs fast-path processing on information passing from a packet-switched network (for example, the Internet), through the NI device, and to a host computer.


[0013]
FIG. 2 is a diagram that illustrates a method in accordance with an embodiment of the present invention where network data from a multi-packet session message is transferred by the NI device directly into a destination in a host computer.


[0014]
FIG. 3 is a flowchart of a method in accordance with an embodiment of the present invention.


[0015]
FIG. 4 shows an NI device integrated into a host computer.







DETAILED DESCRIPTION

[0016]
FIG. 1 is a diagram of a host computer 100 that is coupled to a packet-switched network 101 (for example, the Internet) via a Network Interface (NI) device 102. In the illustrated example, host computer 100 is an Intel x86-based system (for example, Compaq Proliant). Software executing on host computer 100 includes: 1) a Linux operating system 103, and 2) an application program 104 by the name of “Samba”. Operating system 103 includes a kernel 105. Kernel 105 includes: 1) driver software 106 for interfacing to and controlling NI device 102, and 2) a protocol stack 107. A part of protocol stack 107 is specially customized to support the NI device 102.


[0017] In one specific embodiment, NI device 102 is the Intelligent Network Interface Card (INIC) of FIGS. 21 and 22 of U.S. patent application Ser. No. 09/464,283 (the entire disclosure of Ser. No. 09/464,283 is incorporated herein by reference). The NI device 102 in this specific embodiment is an expansion card that plugs into a card edge connector on the host computer (for example, a personal computer). The card includes an application specific integrated circuit (ASIC) (for example, see ASIC 400 of FIG. 21 of U.S. application Ser. No. 09/464,283) designed by Alacritech, Inc. of 234 East Gish Road, San Jose, Calif. 95112. The card performs “fast-path processing” in hardware as explained in U.S. application Ser. No. 09/464,283. An INIC card (Model Number 2000-100001 called the “Alacritech 100x2 Dual-Server Adapter”) is available from Alacritech, Inc. of 234 East Gish Road, San Jose, Calif. 95112.


[0018]
FIG. 2 is a diagram illustrating the transfer of data in a multi-packet session layer message 200 from a buffer 2114 (see FIG. 1) in NI device 102 to a second destination 110 in memory in host computer 100. The portion of the diagram to the left of the dashed line 201 (see FIG. 2) represents NI device 102, whereas the portion of the diagram to the right of the dashed line 201 represents host computer 100. Multi-packet message 200 includes approximately forty-five packets, four of which (202-205) are labeled on FIG. 2. The first packet 202 includes a portion 205 containing transport and network layer headers (for example, TCP and IP headers), a portion 206 containing a session layer header, and a portion 207 containing data. The subsequent packets 203-205 do not contain session layer header information, but rather include a first portion containing transport and network layer headers (for example, TCP and IP headers), and a second portion containing data.


[0019]
FIG. 3 is a flowchart of a method in accordance with one specific embodiment of the present invention. In a first step (step 300), the Samba application program 104 initializes application-to-operating system communication by calling the “socket” function. The socket function causes kernel 105 to allocate a communication control block (CCB) that will be used to manage the connection. The Samba application program 104 then uses the “bind” routine to associate the socket with a particular local IP adderss and IP port. The Samba application program 104 then calls the “listen” routine to wait for an incoming connection to arrive from kernel 105. When an incoming connection arrives, the Samba application program 104 calls the “accept” routine to complete the connection setup. After setting up the socket, the Samba application program 104 uses the “select” routine to tell the kernel 105 to alert application 104 when data for that particular connection has arrived.


[0020] In a next step (step 301), driver 106 allocates a 256-byte buffer 108 in host memory as a place where NI device 102 can write data. Driver 106 then passes the address of 256-byte buffer 108 to NI device 102 so that NI device 102 can then use that address to write information into 256-byte buffer 108. Driver 106 does this by writing the address of 256-byte buffer 108 into a register 112 on the NI device 102. A status field at the top of the 256-byte buffer 108 contains information indicating whether the 256-byte buffer contains data (and is valid) or not.


[0021] In step (step 302), NI device 102 receives the first packet 202 of message 200 (see FIG. 2) from network 101. NI device 102 looks at the IP source address, IP destination address, TCP source port and TCP destination port and from those four values determines the connection identified with the packet. (IP is the network layer. TCP is the transport layer.) NI device 102 then: 1) writes a unique identifier that identifies the connection into a designated field in the 256-byte buffer 108; 2) writes the first 192 bytes of the first packet into the 256-byte buffer (the MAC, IP and TCP headers are not written to the 256-byte buffer); 3) sets the status field of 256-byte buffer 108 to indicate that the 256-byte buffer is full; and 4) interrupts the kernel 105.


[0022] In a next step (step 303), kernel 105 responds by having the driver 106 look at the status field of the 256-byte buffer 108. If the status field indicates 256-byte buffer 108 is full and valid, then driver 106 passes the address of 256-byte buffer 108 to protocol stack 107. The first part of this 192 bytes is session layer header information, whereas the remainder of the 192 bytes is session layer data. Protocol stack 107 notifies application program 104 that there is data for the application program. Protocol stack 107 does this by making a call to the “remove_wait_queue” routine.


[0023] In a next step (step 304), the Samba application program 104 responds by returning the address of a first destination 109 in host memory. The Samba application program 104 does this by calling a socket routine called “recv”. The “recv” socket routine has several parameters: 1) a connection identifier that identifies the connection the first destination 109 will be for, 2) an address of the first destination 109 where the data will be put, and 3) the length of the first destination 109. (In some embodiments, Samba application program 104 calls “recv” to request less than 192 bytes.) Through this “recv” socket routine, kernel 105 receives from application program 104 the address of the first destination 109 and the length of the first destination 109. Kernel 105 then gives the address of the first destination 109 to the protocol stack 107.


[0024] In a next step (step 305), the protocol stack 107 moves the requested bytes in 256-byte buffer 108 to first destination 109 identified by the address. The first destination is in memory space of the application program 104 so that application program 104 can examine the requested bytes. If the application program 104 requested less than 192 bytes using “recv”, then driver 106 moves that subset of the 192 bytes to first destination 109 leaving the remainder of the 192 bytes in the 256-byte buffer. On the other hand, if the application program 104 requested all 192 bytes using “recv”, then driver 106 moves the full 192 bytes to first destination 109.


[0025] In a next step (step 306), the application examines the requested bytes in first destination 109. Application program 104 analyzes the session layer header portion, determines the amount of session layer data coming in the session layer message, and determines how long a second destination 110 should be so as to contain all the remaining session layer data of message 200. Application program 104 then returns to kernel 105 the address of second destination 110 and the length of the second destination 110. Application program 104 does this by calling the socket routine “recv”. Kernel 105 receives the address of second destination 110 and the length of the second destination 110 and gives that information to the protocol stack 107.


[0026] In a next step (step 307), the protocol stack 107 moves any session layer data in the 192 bytes (not session layer headers) in 256-byte buffer 108 to second destination 110 identified by the second address. This move of data is shown in FIG. 2 by arrow 208.


[0027] In a next step (step 308), the protocol stack 107 writes the address of second destination 110 and the length of second destination 110 into a predetermined buffer 111 in host memory. Driver 106 then writes the address of predetermined buffer 111 to a predetermined register 112 in NI device 102.


[0028] In a next step (step 309), NI device 102 reads the predetermined register 112 and retrieves the address of predetermined buffer 111. Using this address, NI device 102 reads the predetermined buffer 111 by DMA and retrieves the address of second destination 110 and the length of second destination 110.


[0029] In some embodiments, the second destination 110 is actually made up of a plurality of locations having different addresses of different lengths. The application program supplies a single virtual address for the NI device 102 to read (such as explained in step 310), but this virtual address is made up of many different physical pages. Driver 106 determines the addresses of the pages that are associated with this virtual address and passes these physical addresses and their lengths to NI device 102 by placing the addresses in predetermined buffer 111 and writing the address of predetermined buffer 111 to predetermined register 112 in NI device 102.


[0030] In a next step (step 310), NI device 102 transfers the data from the remaining portion of first packet 202 (without any session layer headers, and without any TCP or IP headers) directly into second destination 110 using DMA. In this example, the transfer is made across a parallel data bus (for example, across a PCI bus by which the NI device 102 is coupled to the host computer 100). This move of data is shown in FIG. 2 by arrow 209.


[0031] In a next step (step 311), subsequent packets are received onto NI device 102. For each packet, NI device 102 removes the TCP and IP headers and writes the remaining data (without session layer headers, TCP headers, or IP headers) directly to second destination 110 using DMA (for example, NI device 102 may write the data directly into the second destination across the PCI bus by which the NI device 102 is coupled to the host computer 100). The data from the many packets of the session layer message is written into second destination 110 such that there are no session layer headers, transport layer headers, or network layer headers between the data portions from the various packets of message 200.


[0032] In the above described specific embodiment, there is no session layer header, transport layer header, or network layer header between the data portions from the various packets of message 200 as the data portions are desposited into the second destination 110. This need not be the case, however. In some embodiments, session layer header information does appear in second destination 110. This is so because it is the application program that determines the length of the second destination 110.


[0033] In some embodiments, application program 104 returns a first destination that is larger than 192 bytes. In that case, there is no different second destination. The entire 192 bytes contained in the 256-byte buffer is moved to the first destination. The address of the remainder is given to the NI device as described above with respect to the second destination.


[0034] Although the NI device may be realized on an expansion card and interfaced to the host computer via a bus such as the PCI bus, the NI device can also be integrated into the host computer. For example, the NI device in some embodiments is disposed on the motherboard of the host computer and is substantially directly coupled to the host CPU. The NI device may, for example, be integrated into a memory controller integrated circuit or input/output integrated circuit that is coupled directly to the local bus of the host CPU. The NI device may be integrated into the Intel 82815 Graphics and Memory Controller Hub, the Intel 440BX chipset, or the Apollo VT8501 MVP4 Northbridge chip. FIG. 4 shows an NI device integrated into a host computer 400 in the form of a communication processing device (CPD) 401.


[0035] Although the present invention is described in connection with certain specific embodiments for instructional purposes, the present invention is not limited thereto. Advantages of the present invention may be realized wherein either no header information or just an insubstantial amount of header information is transferred from the network interface device into the second destination. All the data from the session layer message may be deposited into a single contiguous block of host memory (referred to as a destination) in some embodiments or may be deposited into several associated blocks (that together are referred to as a destination) of host memory in other embodiments. Accordingly, various modifications, adaptations, and combinations of various features of the described embodiments can be practiced without departing from the scope of the invention as set forth in the claims.


Claims
  • 1-20. (Cancelled)
  • 21. A set of computer-executable instructions for execution on a host computer, wherein an application layer program is executing on the host computer, and wherein a network interface device is coupled to the host computer, and wherein the network interface device is coupled to receive a multi-packet message from a packet-switched network, the multi-packet message including a first packet and a plurality of subsequent packets, each of the plurality of subsequent packets containing a TCP header portion, an IP header portion and a data portion, the set of computer-executable instructions being for performing steps comprising: passing at least a portion of the first packet of the multi-packet message to the application layer program; receiving from the application layer program an indication of a destination in memory on the host computer; and passing the indication of the destination to the network interface device such that the network interface device writes the data portions of the subsequent packets into the destination without writing any TCP header portion into the destination and without writing any IP header portion into the destination.
  • 22. The set of computer-executable instructions of claim 21, wherein the multi-packet message has a data payload, and wherein the entire data payload is written by the network interface device into the destination.
  • 23. The set of computer-executable instructions of claim 21, wherein the set of computer-executable instructions is an operating system, and wherein the network interface device is an intelligent network interface card (INIC) coupled to the host computer.
  • 24. The set of computer-executable instructions of claim 21, wherein the multi-packet message is of a protocol layer higher than the transport protocol layer.
  • 25. A method for transferring data of a message from a network interface device to a host computer, the network interface device being coupled to receive the message from a packet-switched network, the network interface device being coupled to the host computer, the message consisting of a first packet and a plurality of subsequent packets, wherein the first packet includes a session layer header portion, a TCP header portion and an IP header portion, and wherein each of the plurality of subsequent packets contains a TCP header portion, an IP header portion and a data portion, the method comprising: (a) passing at least a portion of the first packet from the network interface device to an application layer program executing on the host computer, wherein said at least a portion includes the session layer header portion; (b) the application layer program executing on the host computer examining the session layer header portion and generating an indication of a destination in host memory; and (c) the network interface device transferring the data portions of the subsequent packets into the destination without writing any TCP header portion of any of the subsequent packets into the destination and without writing any IP header portion of any of the subsequent packets into the destination.
  • 26. The method of claim 25, wherein the network interface device comprises an expansion card and an application specific integrated circuit (ASIC).
  • 27. The method of claim 25, wherein only a portion of the first packet is passed to the application layer program in (a) such that the application layer program generates the indication of the destination without receiving the entire first packet.
  • 28. The method of claim 26, wherein the host computer includes a motherboard, and wherein the network interface device is disposed on the motherboard.
  • 29. A method for transferring data of a message from a network interface device to a host computer, the network interface device being coupled to receive the message from a packet-switched network, the network interface device being coupled to the host computer, the message consisting of a first packet and a plurality of subsequent packets, wherein the first packet includes a session layer header portion, a TCP header portion and an IP header portion, and wherein each of the plurality of subsequent packets contains a TCP header portion, an IP header portion and a data portion, the method comprising: (a) passing a first part of the first packet, but not a second part of the first packet, from the network interface device to the host computer, the first part of the first packet including the session layer header portion; (b) an application layer program executing on the host computer examining the session layer header portion and generating an indication of a destination in host memory; and (c) the network interface device transferring the data portions of the subsequent packets into the destination without writing any TCP header portion of any of the subsequent packets into the destination and without writing any IP header portion of any of the subsequent packets into the destination.
  • 30. The method of claim 29, further comprising: after (b) and before (c) the network interface device transferring the second part of the first packet from the network interface device and into the destination.
  • 31. The method of claim 29, wherein the indication of the destination in host memory comprises a plurality addresses and a plurality of lengths.
  • 32. The method of claim 29, wherein the message is communicated over a TCP/IP connection, and wherein the TCP/IP connection is setup before step (a).
  • 33. The method of claim 29, wherein in (b) the application layer program determines from the session layer header how much session layer data is contained in the message, and wherein the application layer determines how big the destination should be in order to contain all the session layer data of the message, and wherein all the session layer data is written into the destination such that no TCP headers are present in the destination and such that no IP headers are present in the destination.
  • 34. The method of claim 33, wherein all the session layer data is written into the destination and such that no session layer headers are present in the destination.
CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of and claims the benefit under 35 U.S.C. §120 of prior U.S. patent application Ser. No. 09/789,366, filed Feb. 20, 2001, now U.S. Pat. No. 6,757,746. [0002] Prior U.S. patent application Ser. No. 09/789,366, now U.S. Pat. No. 6,757,746, is a continuation-in-part of and claims the benefit under 35 U.S.C. § 120 of U.S. patent application Ser. No. 09/464,283, filed Dec. 15, 1999, now U.S. Pat. No. 6,427,173, which in turn claims the benefit under 35 U.S.C. § 120 of U.S. patent application Ser. No. 09/439,603, filed Nov. 12, 1999, now U.S. Pat. No. 6,247,060, which in turn claims the benefit under 35 U.S.C. §. 120 of U.S. patent application Ser. No. 09/067,544, filed Apr. 27, 1998, now U.S. Pat. No. 6,226,680, which in turn claims the benefit under 35 U.S.C. § 119(e) of the Provisional Application Ser. No. 60/061,809, filed Oct. 14, 1997. [0003] Prior U.S. patent application Ser. No. 09/789,366, now U.S. Pat. No. 6,757,746, also is a continuation-in-part and claims benefit under 35 U.S.C. § 120 of the following U.S. patent applications: Ser. No. 09/748,936, filed Dec. 26, 2000, now U.S. Pat. No. 6,334,153; Ser. No. 09/692,561, filed Oct. 18, 2000; Ser. No. 09/675,700, filed Sep. 29, 2000; Ser. No. 09/675,484, filed Sep. 29, 2000; Ser. No. 09/514,425, filed Feb. 28, 2000, now U.S. Pat. No. 6,427,171; Ser. No. 09/416,925, filed Oct. 13, 1999, now U.S. Pat. No. 6,470,415; and Ser. No. 09/141,713, filed Aug. 28, 1998, now U.S. Pat. No. 6,389,479. [0004] Prior U.S. patent application Ser. No. 09/789,366, now U.S. Pat. No. 6,757,746, is also a continuation-in-part of and claims benefit under 35 U.S.C. § 120 of U.S. patent application Ser. No. 09/384,792, filed Aug. 27, 1999, now U.S. Pat. No. 6,434,620, which in turn claims the benefit under 35 U.S.C. § 119 of Provisional Application Ser. No. 60/098,296, filed Aug. 27, 1998. [0005] The complete disclosures of: U.S. patent application Ser. No. 09/789,366; U.S. patent application Ser. No. 09/464,283; U.S. patent application Ser. No. 09/439,603; U.S. patent application Ser. No. 09/067,544; U.S. patent application Ser. No. 09/748,936; U.S. patent application Ser. No. 09/692,561; U.S. patent application Ser. No. 09/675,700; U.S. patent application Ser. No. 09/675,484; U.S. patent application Ser. No. 09/514,425; U.S. patent application Ser. No. 09/416,925; U.S. application Ser. No. 09/384,792; U.S. application Ser. No. 09/141,713 and Provisional Application Ser. Nos. 60/061,809 and 60/098,296 are incorporated herein by reference.

Provisional Applications (2)
Number Date Country
60061809 Oct 1997 US
60098296 Aug 1998 US
Continuations (4)
Number Date Country
Parent 09789366 Feb 2001 US
Child 10881271 Jun 2004 US
Parent 09439603 Nov 1999 US
Child 09464283 Dec 1999 US
Parent 09067544 Apr 1998 US
Child 09439603 Nov 1999 US
Parent 09789366 Feb 2001 US
Child 10881271 Jun 2004 US
Continuation in Parts (9)
Number Date Country
Parent 09464283 Dec 1999 US
Child 09789366 Feb 2001 US
Parent 09748936 Dec 2000 US
Child 09789366 US
Parent 09692561 Oct 2000 US
Child 09789366 US
Parent 09675700 Sep 2000 US
Child 09789366 US
Parent 09675484 Sep 2000 US
Child 09789366 US
Parent 09514425 Feb 2000 US
Child 09789366 US
Parent 09416925 Oct 1999 US
Child 09789366 US
Parent 09141713 Aug 1998 US
Child 09789366 US
Parent 09384792 Aug 1999 US
Child 09789366 Feb 2001 US