Intelligent network adaptor with end-to-end flow control

Information

  • Patent Grant
  • 8060644
  • Patent Number
    8,060,644
  • Date Filed
    Friday, May 11, 2007
    17 years ago
  • Date Issued
    Tuesday, November 15, 2011
    13 years ago
Abstract
A host is coupled to a network via an intelligent network adaptor. The host is executing an application configured to receive application data from a peer via the network and the intelligent network adaptor using a stateful connection according to a connection-oriented protocol. The intelligent network adaptor performs protocol processing of the connection. Application data is copied from host memory not configured for access by the application (possibly OS-associated host memory) to host memory associated with the application (application-associated host memory). The application data is received from the peer by the intelligent network adaptor and copied to host memory not configured for access by the application. The operating system selectively provides, to the intelligent network adaptor, information of the memory associated with the application. At least one portion of the application data for the connection is provided directly from the intelligent network adaptor to the memory associated with the application.
Description
CROSS REFERENCE TO RELATED APPLICATIONS

This application is related to the following U.S. Non-Provisional Applications: application Ser. No. 11/747,650, By: Dimitrios Michaelidis, Wael Noureddine, Felix Marti and Asgeir Eiriksson, Entitled “INTELLIGENT NETWORK ADAPTOR WITH ADAPTIVE DIRECT DATA PLACEMENT SCHEME”; application Ser. No. 11/747,790, By: Dimitrios Michaelidis, Wael Noureddine, Felix Marti and Asgeir Eiriksson, Entitled “PROTOCOL OFFLOAD IN INTELLIGENT NETWORK ADAPTOR, INCLUDING APPLICATION LEVEL SIGNALLING”; and application Ser. No. 11/747,793, By: Dimitrios Michaelidis, Wael Noureddine, Felix Marti and Asgeir Eiriksson, Entitled “INTELLIGENT NETWORK ADAPTOR WITH DDP OF OUT-OF-ORDER SEGMENTS”; all of which are filed concurrently herewith and are incorporated by reference herein in their entirety for all purposes.


TECHNICAL FIELD

The present invention relates to efficient receive data communication using an intelligent network adaptor.


BACKGROUND

High speed communications over packet-based networks can place a heavy burden on end-host resources.


A host may typically be connected to a network using at least one network adaptor. The network adaptor receives packets from the network that are destined for the host. The network adaptor causes the packets to be placed in host memory buffers associated with the operating system. When one or more packets have been placed, the adaptor notifies the host processor of this event, typically using an interrupt. The packets are then processed by the host operating system, including delivering application payload data of the packets from the operating system buffers to buffers of a destination application corresponding to the packets.


Each application data bearing packet received from the network includes the application data, encapsulated within a stack of headers according to a network communication stack. For example, the network communication may be via TCP/IP over Ethernet. In this case, the TCP/IP stack encapsulates application data in a TCP header, and the TCP encapsulated data are encapsulated in an IP header. The TCP/IP encapsulated data are encapsulated in a header according to a local area network technology protocol, such as Ethernet.


In the high-speed communications environment, there are several challenges for end-host resources. One challenge is the high packet arrival rate, which implies a high associated interrupt rate as the host is notified of the arrival of each packet. Another challenge is associated with the memory bandwidth resources to copy application payload data from the operating system buffers to application buffers. Yet another challenge is to achieve low communication latency between the network and the host via the network adaptor, such that application payload received from the network by the network adaptor is provided in a relatively expedient manner to the application.


The present invention in some aspects may be directed to one or more of these challenges identified above to result in high speed and low latency communication with reduced demands on host processing and memory resources.


The challenges of high-speed communications have led to enhancements to network adaptor capabilities, resulting in so-called “Intelligent Network Adaptors” that, for example, offload some or all network communication protocol processing. In addition, direct data placement (DDP) is known. It refers to the capability of some intelligent network adaptors to process network packets arriving from the network and place payload data contained within the network packets directly into pre-determined locations in host memory.


SUMMARY

An intelligent network adaptor couples a host to a network. The host is executing an application configured to receive application data from a peer via the network and the intelligent network adaptor using a stateful connection between the host and the peer according to a connection-oriented protocol. The intelligent network adaptor performs protocol processing of the connection, including providing a receive window to the peer. The intelligent network adaptor places application data, received from the peer via the stateful connection, from memory of the intelligent network adaptor to host memory. The intelligent network adaptor receives, from the host, an indication of consumption of the application data from application buffers of the host memory. The intelligent network adaptor generates the receive window based at least in part on the received indications of consumption of the application data from the application buffers of the host memory.





BRIEF DESCRIPTION OF FIGURES


FIG. 1 illustrates a host system with an intelligent network adaptor.



FIG. 2 illustrates data delivery by an ordinary network adaptor.



FIG. 3 illustrates data delivery by an intelligent network adaptor capable of performing direct data placement according to the present invention.



FIG. 4 illustrates a buffer handling scheme in host memory.





DETAILED DESCRIPTION

The inventors have realized that a destination in a host for application data of payload packets protocol processed by a network adaptor, may be selected, and such selection may contribute to improved overall performance of communication of application payload data from the network, via the network adaptor, to the application executing in the host.



FIG. 1 broadly illustrates a host system 10 running an application process 11, which is communicating to a peer 12 via a network 13. A network adaptor 14 couples the host system 10 to the network.


Generally speaking, network speed increases have not been accompanied by a proportional increase in packet size. As mentioned in the Background, in a high-speed network, the host processor (or processors) may see a high packet arrival rate and a resulting high notification (e.g., interrupt) rate. Handling the notifications can divert a significant part of the host resources away from application processing. Interrupt rate moderation schemes are known, but have limited effectiveness and can introduce delays that may increase communication latency.


Before describing an example of how a payload data destination in the host may be selected, we first discuss more particularly what may be considered a “packet” and how packets may be processed according to one or more protocols. A packet is typically constructed to include application payload data and a sequence of headers (at least one header). Each header may be encapsulated in a lower layer header. The lowest layer is typically called the Media Access Control (MAC) layer (e.g., under a generalized ISO 7-layer model), though other identifiers may be used. A MAC packet is usually called a frame, and includes a MAC header, payload bytes and a trailer. An example MAC layer protocol is Ethernet. An upper layer protocol packet is typically encapsulated in a MAC frame. In the context of the Internet, a common protocol encapsulated in a MAC frame is the Internet Protocol (IP). Another common protocol encapsulated in the MAC frame is the Address Resolution Protocol (ARP). A higher layer (transport layer) packet, such as a TCP or a UDP packet, is typically encapsulated in an IP packet.


Processing of packets received from the network includes protocol processing of the packets based at least in part on the sequence of headers. Connection oriented protocols typically maintain state to be used in protocol processing at the communicating endpoints. Protocol processing is typically carried out before application payload is delivered to the appropriate destination application or applications. Protocol processing in the context of high-speed networks may use significant resources. It is now common for network adaptors to implement intelligence for the support of host processing, including to partly or completely offload protocol processing. Such adaptors are sometimes referred to as intelligent network adaptors.


A highly desirable capability of an intelligent network adaptor is the capability to place application data directly into pre-determined memory locations. For example, these locations may be in host memory or in memory of another adaptor. Direct data placement can significantly reduce the memory bandwidth used for delivery of data received via network communication. In particular, direct data placement may allow placing data directly into application buffers from the intelligent network adaptor, thereby avoiding the copying of packet payload from operating system network buffers to application buffers. This capability is typically referred to as “zero-copy.” In addition to saving CPU cycles, the omission of the data copy also saves on application end-to-end latency, as “store-and-forward” memory copies introduce a delay in the end-to-end communication of the application data.


Broadly speaking, applications may differ in their interaction with the host operating system's network protocol processing stack interface. This difference may in part be dependent on the operating system that implements the network protocol processing stack software. To achieve zero-copy data transfer to application buffers, an application typically provides a destination buffer into which the adaptor can directly place application payload data of packets received from the network. Conceptually, an application buffer may include one or a chain of memory locations, each described by a memory address and a length. Typically, it is a network adaptor device driver of the host operating system that handles mapping between the application buffer and descriptors of the chain of memory locations.


Benefits of zero-copy may depend on the size of the buffer provided by the application, and the expediency that an application wishes to get data. For example, there may be a tradeoff between the cost of copying packet payload (e.g., from an operating system buffer to an application buffer) and overhead cost associated with mapping an application buffer into a chain of memory descriptors (for zero copy). The tradeoff involves latency, i.e. having the data wait in the adaptor memory while the mapping is being created, rather than receiving packets through the operating system buffers, while simultaneously setting up memory maps to enable the zero-copy receive of data directly to the application buffers. The mapping typically involves communicating with the host operating system to reserve (referred to as pinning) memory in the host machine that is to receive the zero-copy application data.


The mapping process further includes communicating the mapping to the network adaptor. A standard Ethernet frame carries 1500 B of data, and the mapping is created when the first frame arrives, and if the overall transfer size is only on the order of say 8000 B, and data is being received for a single connection than it is likely that the overhead of zero-copy will be greater than receiving the packets through operating system buffers. When multiple connections are receiving data simultaneously then zero-copy is more efficient even for smaller transfer sizes, because while the mapping is being created for one connection, it is possible that data is being received with zero-copy mechanism for other connections.


Benefits of direct data placement can depend on the application's ability to post buffers and the traffic pattern. Direct data placement to an application buffer is possible if the application has posted a buffer when the application data arrives from the wire and is to be written from the network adaptor to the host. Some application programming interfaces, such as the widely adopted synchronous sockets interface, allow the application to post only one buffer. In such a case, once that buffer is completed and returned to the application, additional incoming application data for the input request can not be directly placed. Thus, for example, the additional incoming data can be stored in memory of the network adaptor and later directly placed when the application posts the next read, or the incoming application data can be placed (e.g., by DMA) to an operating system buffer. In the latter case, when an application buffer is available (e.g., the application posts another buffer), the application data would then be copied from the operating system buffer to the application buffer.


In general, the amount of data that is placed in an operating system buffer is limited if an application buffer becomes available in a timely manner. In addition, copying from the operating system buffer to the application buffer can be done in parallel with the network adaptor operating to cause additional incoming application data to be placed into the same application buffer (e.g., if the application buffer is posted by an application only after the network adaptor already receives application data for that application). Thus, for example, the application data copied from the operating system buffer to the application buffer may be placed at the beginning of the application buffer while the additional incoming application data is placed into the same application buffer at an offset, to account for the data that was previously placed into the operating system buffer.


An intelligent network adaptor may be provided to offload the processing of the stack of headers of the packets. Furthermore, the intelligent network adaptor may be configured so as to reduce the notification rate according to the transport protocol, with minimal or no application level delays being introduced.


In one example, the copy vs. zero-copy decision may be addressed (e.g., in a driver program that interoperates with or is part of the operating system of the host) by using an adaptive scheme that selectively performs one or the other depending on various criteria. For example, a decision may be based in part on, or indirectly result from, the size of the application buffer. This scheme may be transparent to applications so as to not require any application modification or awareness. We refer to this scheme as the “adaptive copy avoidance” scheme. In addition, data may be directly transferred to application space. Applications can make use of this method for highly efficient, flexible, high-speed, very low latency communication. We refer to this as the “memory mapped receive” scheme.


For the purpose of illustration in this description, we refer to applications that use the Transmission Control Protocol (TCP) to communicate over an Internet Protocol (IP) network. The description herein serves the purpose of illustrating the process of application data delivery over a network, and it is not intended to be construed as limiting the applicability of the described methods and systems to this particular described context.


TCP provides a reliable data stream delivery service. TCP is a connection-oriented protocol according to which a stateful association is established between two communicating end-points such as between an application, executing in a host, and a peer located across a network from the host. In this context, each packet may include a 4-tuple designating a destination IP address, a source IP address, a destination TCP port number and a source TCP port number, to uniquely identify a connection. An intelligent network adaptor configured to offload the processing of the TCP/IP protocol stack from a host is typically referred to as a TCP Offload Engine (TOE).



FIG. 2 broadly illustrates an example of conventional payload delivery in a host system 201 using TCP/IP to receive data over a network 202 from a peer (not shown) by means of a network adaptor 203. FIG. 2 shows packets 204 that have been delivered to host memory 205 (typically in operating system memory). The headers 206 are processed by the network stack 207, and the payload 208 is copied from its location in host memory 205 to application buffers 209 associated with an application 210 (also in host memory). Payload arrival notifications 211 are sent by the network adaptor 203 to a host processor 212 so that the application 210 can be made aware that application data is “ready” in the application buffers 209.


In contrast to the conventional payload delivery described with respect to FIG. 2, FIG. 3 broadly illustrates payload delivery with notification rate moderation and with adaptive copy avoidance. The intelligent network adaptor 303 receives the packets 304 from the network 302 via a TCP connection, performing protocol processing in some form to process the headers 306 and extract the application payload 308. The application payload is directly placed in host memory 309 according to memory descriptors of a memory map 301 associated with the TCP connection via which the packets 304 are received, available in adaptor memory 300.


The adaptor may moderate the rate of providing payload data arrival notifications by, for example, only generating a payload arrival notification to the host processor 312 if the incoming packet is determined by the adaptor to contain useful application level notifications. Whether an incoming packet contains useful application level notifications may be determined, for example, by processing the application payload in the intelligent network adaptor, from signaling according to the transport protocol in the protocol header, or from the level of data in an application level buffer (e.g., the application level buffer is full or nearly full). With respect to signaling according to the transport protocol, in a TCP/IP context, TCP may carry signaling that may loosely be considered application level signaling in the TCP header control flags, such as the FIN, URG and PSH flags. The FIN flag indicates that the sender/peer has finished sending data. The URG flag indicates that the Urgent pointer is set (indicating that the payload data should reach the host quickly). The PSH flag indicates that a segment should be passed to the application as soon as possible.


In this example, the memory location for direct placement of application payload data may be dependent on the adaptive copy avoidance scheme. According to the scheme, if zero-copy is to be used, the application buffer descriptors have been communicated to the adaptor from the host and, based on this, the intelligent network adaptor operates to directly place payload in the application buffer according to the descriptors. Otherwise, application buffer descriptors have not been communicated to the adaptor, and application payload is placed in a memory location not corresponding to an application buffer, such as, for example, in a memory location corresponding to an operating system network adaptor buffer, for subsequent copying into an application buffer.


We now describe a specific example of data delivery according to an example of the adaptive copy avoidance scheme, with reference to FIG. 3. In particular, we describe conditions under which the application buffer descriptors may be communicated to the adaptor from the host. When an Ethernet packet is received from the network 302 by the adaptor 303, the Ethernet packet is processed by the adaptor 303. If the processing results in identifying a specific TCP connection state within the adaptor 303 (i.e., the packet is for a TCP connection being handled by the adaptor), the packet is further processed within the adaptor according to the TCP/IP protocol. As a result of the processing, it may be determined that the packet contains application payload data, at least a part of which may be acceptable according to the TCP protocol. For application payload data that is acceptable according to the TCP protocol, the intelligent adaptor may proceed to place at least part of the acceptable application payload data into a destination memory 309 of the host—application buffer memory or, for example, operating system memory.


We now particularly discuss operation of the host, with respect to the application, affecting whether buffer descriptors corresponding to application buffer memory are communicated to the intelligent network adaptor (or the intelligent network adaptor is otherwise configured for zero copy with respect to the application). For example, an application may perform input/output (I/O) data transfer operations by presenting a memory buffer to the operating system 307 to/from which data transfer is to occur. One criterion for deciding whether to configure the intelligent network adaptor for performing zero-copy for a data receive I/O operation may be the buffer size passed by the application relative to the size of a copy source buffer that would otherwise be used. As mentioned earlier, if the size of the application buffer is relatively small, the overhead involved in setting up a zero-copy operation can become a significant portion of the operation.


In this case, a copy operation may be more advantageous, particularly when the source buffer of the copy operation (i.e., if the copy operation is to be used) is larger than the size of the application buffer. Then, it may result that data transfer occurs from the adaptor to a copy source buffer in the host memory. The copy source buffer is typically (but need not necessarily be) in memory controlled by the operating system 307. The data in the copy source buffer (written from the adaptor) can be subsequently copied to the application buffer 309. If the size of the application buffer is relatively larger, the I/O operation can be expected to be relatively longer, and it can be worthwhile to expend the overhead to set up a zero copy operation based on the expected efficiencies gained in carrying out the zero copy operation.


Another criterion for determining whether to perform zero copy may be whether the application buffer sizes are large enough relative to the amount of application data being received for zero copy to be “useful” such as, for example, to reduce data transfer overhead (in contrast to increasing zero copy setup overhead). When the application buffer sizes are “sufficiently large” for zero-copy to be useful (for example, there is enough room in the application buffer to accommodate all the data of a particular transfer from the intelligent network adaptor to the application), an indication of the memory locations corresponding to the application buffer are communicated to the intelligent network adaptor. For example, this communication of memory location indications may be initiated by operating system drivers associated with the intelligent network adaptor. The intelligent network adaptor can then place incoming data directly into the application buffer based on the indications of memory locations.


In some cases, after requesting an input (receive, read) operation, an application waits for the operation to complete and then performs some processing on the received data before proceeding to request another receive operation. This is typically referred to as “synchronous I/O”. Data that arrives during the time the application has no receive request in place may be saved in some memory—either in memory of the intelligent network adaptor or in host memory (e.g., in memory controlled by the operating system). In this case, the operating system buffer described above (the copy source buffer) can be used to save the received data. This helps minimize idle times and delays and allows for a network adaptor design with less memory and less likelihood to need to drop received packets.


When an application makes a receive request and the operating system buffer contains saved data, the data may be copied from the operating system buffer to the application buffer. Furthermore, if the application buffer is large enough to hold all the saved data, the application buffer information may be communicated to the intelligent adaptor. Then, the adaptor can start placing additional received data of the request in the application buffer at an offset based on the data previously placed in the operating system buffer. The operating system can independently and simultaneously copy the saved data into the application buffer. This approach has an advantage that latency of the data transfer may be minimized in that data may be copied to the application buffers via the operating system buffers as soon as the data is requested by the application and, simultaneously, the operating system interoperates with the adaptor to cause the adaptor to be configured with the mapping information so that subsequent data can be placed directly in the application buffers with zero-copy.


Some applications may be capable of placing multiple receive requests. Such applications can process data received corresponding to one request (and, for example, in one request buffer) while, simultaneously, data received corresponding to another request can be placed into another buffer. This is sometimes referred to as “asynchronous I/O”. Such applications may be capable of ensuring that at least one receive request is in place at all times, and therefore may operate without needing an operating system buffer to handle the “no receive request” case as described above. When the adaptor completes transfer to one application buffer, the host system is notified, and subsequent data can be placed in another application buffer (such as in the following application buffer in sequence). A driver associated with the intelligent network adaptor may control where incoming data is placed. For example, the adaptor may simply pin and map the application buffer memory for direct memory access of application payload data from the intelligent network adaptor to the host.


In accordance with another aspect, an intelligent reduction in notification rate regarding data received from the network and provided from the intelligent network adaptor to an application buffer memory of the host or a copy source buffer of the host may improve host system performance without affecting (or with minimal effect on) communication latency. The notification rate for the data transfer process to the copy source buffer may be moderated by the intelligent network adaptor to be on a greater than per-packet basis to reduce host system processing load to respond to the notifications. Accordingly, data transfer for a connection from the intelligent network adaptor to the host system may be performed “silently” without any data transfer notification to the host system, for example, until such a notification is deemed useful. Useful notifications may correspond to events occurring at the application layer, determined by the intelligent network adapter through processing the application layer information. Other useful notifications may correspond, for example, to events occurring at a layer lower than the application layer, such as at the transport layer. For example, useful data notifications may correspond to the receipt and detection, by the intelligent network adaptor from the network, of a special TCP control flag, such as the PSH flag, the URG flag, or the FIN flag for the connection. A different type of notification may be sent when a non-graceful teardown of the connection occurs, such as receipt of a valid RST flag or other abortive events.


It is useful in some situations to provide data transfer notifications from the intelligent network adaptor to the host on the basis of elapsed time, such as on a periodic basis, or when the memory buffer in use is filled up, or when a timer elapses since the last data had been silently placed. These measures help to allow the application processing to proceed at a steady rate, and forward progress when the last data has arrived. The first of these (elapsed time, such as periodic basis) is useful in the cases where the data is arriving slowly, as it can assist the application in getting an opportunity to process the data at a steady rate with acceptable latency. The second (when the memory buffer in use is filled up) assists the application in processing at a steady rate and/or the application can process data while new data is arriving (pipelining). The last of these (timer elapses since the last data had been silently placed) applied at least to the “end of data” case, i.e. no more data is forthcoming and a timer ensures that the application makes forward progress in processing the application data.


A flow control scheme can be established for use with direct data placement functionality (whether in accordance with the adaptive copy scheme discussed above or otherwise). In an example implementation, a number of data credits are provided to the intelligent network adaptor. The credits may correspond to an amount of data received. When the intelligent network adaptor buffers the data itself (e.g., because the data cannot be delivered on-the-fly to the host for some reason), the receive window being advertised to the peer is shortened (e.g., by the number of bytes that are buffered by the adaptor). When the data is eventually placed on the host, then the advertised receive window is lengthened (e.g., by the number of bytes that are placed). If the data placed on the host cannot be directly placed to an application buffer (e.g., is held in a “common pool” of buffers), then the receive window is accordingly shortened, and when the data is eventually delivered to the application, the receive window is again lengthened.


Conversely, when the application consumes data and application buffer becomes available, the credits are returned to the intelligent network adaptor, and its credit count incremented accordingly. If the TCP window flow control is made dependent on this flow control scheme, such as by basing the TCP window size on the number of credits available to the intelligent network adaptor, it is possible to perform end-to-end flow control with the application as terminus, as opposed to with the intelligent network adaptor or operating system buffers as terminus. One benefit of this arrangement may be that less memory may be used in the intelligent network adaptor, since flow control is based on the available memory in the host for the application and is not based on (or, at least, is not based completely on) the available memory in the intelligent network adaptor.


We now move to discuss an example of a method for managing the transfer to application memory space. The method allows high bandwidth and low latency, at low host system load. It may allow applications executing on the host to perform random access to the received data in application memory.


Referring to FIG. 4, in an example implementation of this method, an application 410 running in a host 401 is capable of allocating a portion of its memory space 413 as a buffer 409 for receive operations. The location of the allocated buffer 409 is pinned in host memory 405, memory mapped by the operating system 407. The memory map is communicated to the intelligent network adaptor 403. The intelligent network adaptor 403 stores the memory map 401 in its on-board memory 400. Data received by the intelligent network adaptor from the network, in a connection for the application, are placed into the buffer 409. The notification rate to the host may be moderated, as discussed above for example. When a notification 411 is provided to the host by the intelligent network adaptor 403, the notification may first be received by the operating system and in turn passed to the application. The notifications serve to inform the application of the progress of the receive operation.


The intelligent network adaptor may place received out-of-order data directly in the buffer 409, and therefore perform data re-assembly in the buffer 409 itself, e.g., if the data is received out-of-order in accordance with an applicable transport protocol for the connection. In this case the adaptor keeps track of which data has already been placed in the application buffer, and which data is still to be delivered, and notifies the application when all the data has been placed, or in case of a delivery timer expiring, how much of the in-order data at the start of the buffer has been delivered. Alternatively, out-of-order data re-assembly may be performed in a different memory location (e.g., in adaptor memory), with only in-order data being placed in the buffer 409. In either case, the progress of the receive operation may be indicated in terms of the memory address containing the latest byte received in-order. Payload data of at least some of the packets can, after being reordered spatially, be considered to comprise payload data of an equivalent newly-configured packet. In this case, the peer may be provided an acknowledgement of the equivalent newly-configured packet.


A flow control scheme may be used to control the amount of data provided to the host. For the sake of illustration, we now discuss an example of the flow control in more detail. In one example, byte granularity flow control is performed on an end-to-end basis by assigning a total amount of flow control credit equal to the size of the mapped receive memory. As one example, TCP allows up to 1 GB of memory to be used for flow control. An even larger receive memory can be utilized by dividing the receive memory into smaller sized “windows” and proceeding through the windows in sequence, i.e. by moving forward the window into the buffer on which the adaptor is working as the buffer gets filled by the adaptor. In this manner, it is possible to expose 1 GB of memory at a time, and to move through a large memory area by “sliding” the exposed receive window as the data placement progresses.


For each amount of data placed by the adaptor, a corresponding number of units of credit are consumed by the adaptor. The consumed credits are subsequently reflected in the flow control window (by contraction of the window) advertised to the sending peer. Conversely, when the application processes a number of bytes, a corresponding number of units of credits can be returned to the adaptor. The returned credits are subsequently reflected in the flow control window (by expansion of the window) advertised to the sending peer. This scheme allows flow control to be performed directly in terms of receiving application space.


Note that when the placement reaches the end of the mapped memory region, the adaptor can resume placing at the beginning of the memory region, if flow control allows. It is therefore possible to visualize the operation as a ring buffer 413 with a producer pointer 414 corresponding to the latest byte placed in-order by the adaptor, and a consumer pointer 415 corresponding to the last byte consumed by the application. The region 416 in between the consumer pointer and the producer pointer contains received data bytes that can be accessed in random order. The application moves the consumer pointer forward to a new location along the ring when work has been completed on (i.e. consumed) the bytes that fall behind the new pointer location. The number of credits in possession of the adaptor determines the number of bytes that the adaptor is allowed to place on the ring in advance of the producer pointer. A goal of proper flow control is to avoid the producer pointer from crossing over the consumer pointer. In some examples, the producer pointer is not moved every time data is placed. Rather, the producer pointer may be moved when a notification of progress is received, and the notification of progress may be moderated by the intelligent network adaptor to minimize the communication between the intelligent network adaptor and the host. Likewise, in returning credits to the intelligent network adaptor, credits may be accumulated and sent to the intelligent network adaptor as a group, which can minimize the number of update messages being communicated.


It is possible to obtain very low latency operation using the memory mapped scheme. In this approach, an application or other host software may interpret a change in the contents of a particular memory location as an indication that new in-order data has become available in the receive buffer. In one example, the change in the contents of the particular memory location may be serialized with respect to the arrived data. In one example, the memory location could be ahead of the last byte of arrived data on the ring buffer, and the data placement by the intelligent network adaptor is guaranteed to be in-order, i.e. without “skips”. By polling such a memory location, the application is informed of the arrival of new data with very low delay.


By intelligently handling transfer of data between an intelligent network adaptor and a host, efficiency and efficacy of peer to host communication can be substantially increased.

Claims
  • 1. A method of operating an intelligent network adaptor that couples a host to a network, the host executing an application configured to receive data packets, including application data, from a peer via the network and the intelligent network adaptor using a stateful connection between the host and the peer according to a connection-oriented protocol, wherein the intelligent network adaptor performs protocol processing of the connection, the method comprising: by the intelligent network adaptor, performing protocol processing for the connection with the peer to, at least in part, obtain the application data from the data packets received from the peer, including indicating a receive window to the peer;placing application data, received from the peer via the stateful connection, from the intelligent network adaptor directly to host memory application buffer associated with the application without the application data being first provided from the intelligent network adaptor to host memory buffer associated with a host operating system and not specifically with the application; andcausing the indicated receive window to be increased based on the application buffer becoming available due to the application consuming application data from the application buffer.
  • 2. The method of claim 1, further comprising: causing the indicated receive window to be reduced based on the application data being placed in the host memory application buffer by the intelligent network adaptor.
  • 3. The method of claim 1, further comprising: by the intelligent network adaptor, generating the receive window indication, including maintaining an indication of receive window credits based on an amount of the application data placed from the intelligent network adaptor to host memory application buffer and further based on the application data in the host memory application buffer being consumed by the application.
  • 4. The method of claim 1, further comprising: placing application data, received from the peer via the stateful connection, from the intelligent network adaptor to the host memory buffer associated with the operating system and not specifically with the application and, based thereon, causing the indicated receive window to be reduced.
  • 5. A method of operating an intelligent network adaptor that couples a host to a network, the host executing an application configured to receive data packets, including application data from a peer via the network and the intelligent network adaptor using a stateful connection between the host and the peer according to a connection-oriented protocol, wherein the intelligent network adaptor performs protocol processing of the connection, the method comprising: by the intelligent network adaptor, performing protocol processing for the connection with the peer to, at least in part, obtain the application data from the data packets received from the peer, including providing a receive window to the peer;placing application data, received from the peer via the stateful connection, from the intelligent network adaptor directly to host memory application buffer associated with the application without the application data being first provided from the intelligent network adaptor to host memory buffer associated with a host operating system and not specifically with the application; andreceiving, from the host, an indication that the application has consumed application data from the host memory application buffer associated with the application; andgenerating the receive window based at least in part on the received indications of consumption of the application data from the host memory application buffer,wherein the host memory application buffer is larger than an amount of memory allowed by the connection-oriented protocol for flow control; andthe host is configured to expose to the interface adaptor, for use with the connection at any particular time, a portion of the host memory application buffer that is no larger than the amount of memory allowed by the connection-oriented protocol for flow control.
  • 6. The method of claim 5, wherein: the host maintains the host memory application buffer as a ring structure.
  • 7. The method of claim 5, wherein: the host maintains the host memory application buffer as a plurality of overlapping windows into the host memory application buffer, each window no larger than the amount of memory allowed by the connection-oriented protocol for flow control; andthe host selectively exposes to the interface adaptor one of the windows at any particular time.
  • 8. The method of claim 7, wherein: maintaining the host memory application buffer as a plurality of overlapping windows includes maintaining the host memory application buffer as a ring structure, including maintaining producer and consumer pointers into a currently-exposed window, including advancing the producer pointer based on in-order application data being placed in the host memory application buffer and advancing the consumer pointer based on application data being consumed by the application from the host memory application buffer.
  • 9. The method of claim 8, wherein: maintaining the host memory application buffer as a plurality of overlapping windows includes, as the consumer pointer is advanced based on consumption of application data by the application, releasing credits corresponding to the connection from the host to the intelligent network adaptor; andthe intelligent network adaptor generates the receive window for the connection based at least in part on the released credits.
  • 10. The method of claim 9, wherein: releasing credits corresponding to the connection from the host to the intelligent network adaptor includes accumulating credits corresponding to the connection and bunching the release of accumulated credits from the host to the intelligent network adaptor.
  • 11. The method of claim 8, wherein: advancing the producer pointer based on in-order application data being placed in the host memory application buffer includes the intelligent network adaptor accumulating notifications to the host of application data being placed in the host memory application buffer to moderate the notifications being provided to the host.
  • 12. The method of claim 7, wherein: maintaining the producer and consumer pointers into the currently-exposed window includes constraining the producer pointer from being advanced only as far as the one-window's worth of memory away from the consumer pointer.
  • 13. The method of claim 8, wherein: maintaining the host memory application buffer as a plurality of overlapping windows includes, as the producer pointer is advanced based on placing of application data by the intelligent network adaptor, decrementing receive windows credits corresponding to the connection; andthe intelligent network adaptor generates the receive window based at least in part on receive window credits corresponding to the connection.
  • 14. The method of claim 7, wherein: the application data placing includes placing the application data, in time, substantially in the order received by the intelligent network adaptor, reordered spatially in the host memory application buffer from the order received by the intelligent network adaptor, to account for temporally out-of-order receipt by the intelligent network adaptor.
US Referenced Citations (160)
Number Name Date Kind
4445116 Grow Apr 1984 A
4533996 Hartung et al. Aug 1985 A
5497476 Oldfield et al. Mar 1996 A
5778189 Kimura et al. Jul 1998 A
6087581 Emmer et al. Jul 2000 A
6226680 Boucher et al. May 2001 B1
6240094 Schneider May 2001 B1
6247060 Boucher et al. Jun 2001 B1
6334153 Boucher et al. Dec 2001 B2
6389479 Boucher et al. May 2002 B1
6393487 Boucher et al. May 2002 B2
6397316 Fesas, Jr. May 2002 B2
6401177 Koike Jun 2002 B1
6427171 Craft et al. Jul 2002 B1
6427173 Boucher et al. Jul 2002 B1
6434620 Boucher et al. Aug 2002 B1
6460080 Shah et al. Oct 2002 B1
6470415 Starr et al. Oct 2002 B1
6510164 Ramaswamy et al. Jan 2003 B1
6564267 Lindsay May 2003 B1
6591302 Boucher et al. Jul 2003 B2
6594268 Aukia et al. Jul 2003 B1
6625671 Collette et al. Sep 2003 B1
6658480 Boucher et al. Dec 2003 B2
6681244 Cross et al. Jan 2004 B1
6687758 Craft et al. Feb 2004 B2
6697868 Craft et al. Feb 2004 B2
6701372 Yano et al. Mar 2004 B2
6708223 Wang et al. Mar 2004 B1
6708232 Obara Mar 2004 B2
6717946 Hariguchi et al. Apr 2004 B1
6751665 Philbrick et al. Jun 2004 B2
6757245 Kuusinen et al. Jun 2004 B1
6757746 Boucher et al. Jun 2004 B2
6792502 Pandya et al. Sep 2004 B1
6798743 Ma et al. Sep 2004 B1
6807581 Starr et al. Oct 2004 B1
6813652 Stadler et al. Nov 2004 B2
6862648 Yatziv Mar 2005 B2
6907042 Oguchi Jun 2005 B1
6925055 Erimli et al. Aug 2005 B1
6938092 Burns Aug 2005 B2
6941386 Craft et al. Sep 2005 B2
6965941 Boucher et al. Nov 2005 B2
6996070 Starr et al. Feb 2006 B2
7031267 Krumel Apr 2006 B2
7042898 Blightman et al. May 2006 B2
7076568 Philbrick et al. Jul 2006 B2
7089289 Blackmore et al. Aug 2006 B1
7089326 Boucher et al. Aug 2006 B2
7093099 Bodas et al. Aug 2006 B2
7114096 Freimuth et al. Sep 2006 B2
7124205 Craft et al. Oct 2006 B2
7133902 Saha et al. Nov 2006 B2
7133914 Holbrook Nov 2006 B1
7133940 Blightman et al. Nov 2006 B2
7164656 Foster et al. Jan 2007 B2
7167926 Boucher et al. Jan 2007 B1
7167927 Philbrick et al. Jan 2007 B2
7174393 Boucher et al. Feb 2007 B2
7185266 Blightman et al. Feb 2007 B2
7191241 Boucher et al. Mar 2007 B2
7191318 Tripathy et al. Mar 2007 B2
7239642 Chinn et al. Jul 2007 B1
7254637 Pinkerton et al. Aug 2007 B2
7260631 Johnson et al. Aug 2007 B1
7284047 Barham et al. Oct 2007 B2
7313623 Elzur et al. Dec 2007 B2
7320042 Trainin Jan 2008 B2
7346701 Elzur et al. Mar 2008 B2
7376147 Seto et al. May 2008 B2
7408906 Griswold et al. Aug 2008 B2
7453892 Buskirk et al. Nov 2008 B2
7457845 Fan et al. Nov 2008 B2
7474670 Nowshadi Jan 2009 B2
7493427 Freimuth et al. Feb 2009 B2
7533176 Freimuth et al. May 2009 B2
7583596 Frink Sep 2009 B1
7594002 Thorpe et al. Sep 2009 B1
7609696 Guygyi et al. Oct 2009 B2
7616563 Eiriksson et al. Nov 2009 B1
7826350 Michailidis et al. Nov 2010 B1
7844742 Pope et al. Nov 2010 B2
7869355 Kodama et al. Jan 2011 B2
7929540 Elzur Apr 2011 B2
20010010046 Muyres et al. Jul 2001 A1
20010021949 Blightman et al. Sep 2001 A1
20020039366 Sano Apr 2002 A1
20020191622 Zdan Dec 2002 A1
20030005164 Trainin Jan 2003 A1
20030018516 Ayala et al. Jan 2003 A1
20030035436 Denecheau et al. Feb 2003 A1
20030046330 Hayes Mar 2003 A1
20030158906 Hayes Aug 2003 A1
20030200284 Philbrick et al. Oct 2003 A1
20030204631 Pinkerton et al. Oct 2003 A1
20040003094 See Jan 2004 A1
20040019689 Fan Jan 2004 A1
20040028069 Tindal et al. Feb 2004 A1
20040030745 Boucher et al. Feb 2004 A1
20040042487 Ossman Mar 2004 A1
20040054813 Boucher et al. Mar 2004 A1
20040062245 Sharp et al. Apr 2004 A1
20040062246 Boucher et al. Apr 2004 A1
20040064578 Boucher et al. Apr 2004 A1
20040064590 Starr et al. Apr 2004 A1
20040073703 Boucher et al. Apr 2004 A1
20040078480 Boucher et al. Apr 2004 A1
20040088262 Boucher et al. May 2004 A1
20040100952 Boucher et al. May 2004 A1
20040111535 Boucher et al. Jun 2004 A1
20040117509 Craft et al. Jun 2004 A1
20040158640 Philbrick et al. Aug 2004 A1
20040165592 Chen et al. Aug 2004 A1
20040190533 Modi et al. Sep 2004 A1
20040199808 Freimuth et al. Oct 2004 A1
20040213235 Marshall et al. Oct 2004 A1
20040240435 Craft et al. Dec 2004 A1
20050071490 Craft et al. Mar 2005 A1
20050083850 Sin et al. Apr 2005 A1
20050083935 Kounavis et al. Apr 2005 A1
20050120037 Maruyama et al. Jun 2005 A1
20050125195 Brendel Jun 2005 A1
20050135378 Rabie et al. Jun 2005 A1
20050135396 McDaniel et al. Jun 2005 A1
20050135412 Fan Jun 2005 A1
20050147126 Qiu et al. Jul 2005 A1
20050190787 Kuik et al. Sep 2005 A1
20050216597 Shah et al. Sep 2005 A1
20050259644 Huitema et al. Nov 2005 A1
20050259678 Gaur Nov 2005 A1
20050289246 Easton et al. Dec 2005 A1
20060031524 Freimuth Feb 2006 A1
20060039413 Nakajima et al. Feb 2006 A1
20060075119 Hussain et al. Apr 2006 A1
20060080733 Khosmood et al. Apr 2006 A1
20060133267 Alex et al. Jun 2006 A1
20060168649 Venkat et al. Jul 2006 A1
20060206300 Garg et al. Sep 2006 A1
20060209693 Davari et al. Sep 2006 A1
20060221946 Shalev et al. Oct 2006 A1
20060235977 Wunderlich et al. Oct 2006 A1
20060281451 Zur Dec 2006 A1
20070011358 Wiegert et al. Jan 2007 A1
20070033301 Aloni et al. Feb 2007 A1
20070064737 Williams Mar 2007 A1
20070070901 Aloni et al. Mar 2007 A1
20070083638 Pinkerton et al. Apr 2007 A1
20070086480 Elzur Apr 2007 A1
20070110436 Bennett May 2007 A1
20070201474 Isobe Aug 2007 A1
20080002731 Tripathy et al. Jan 2008 A1
20080016511 Hyder et al. Jan 2008 A1
20080043750 Keels et al. Feb 2008 A1
20080089347 Phillipi et al. Apr 2008 A1
20080232386 Gorti et al. Sep 2008 A1
20090073884 Kodama et al. Mar 2009 A1
20090222564 Freimuth et al. Sep 2009 A1
20100023626 Hussain et al. Jan 2010 A1
20100235465 Thorpe et al. Sep 2010 A1