Method to implement an L4-L7 switch using split connections and an offloading NIC

Information

  • Patent Grant
  • 8139482
  • Patent Number
    8,139,482
  • Date Filed
    Friday, September 25, 2009
    14 years ago
  • Date Issued
    Tuesday, March 20, 2012
    12 years ago
Abstract
A method of operating intelligent network interface circuitry includes the network interface circuitry coupling a core processor to a network to facilitate communication over the network between the core processor and at least a first peer and a second peer. A first connection connects to the first peer and a second connection connects to the second peer. The network interface circuitry receives data packets from the first peer via the network on the first connection, according to a first particular protocol. The network interface circuitry processes the received data, including associating, with the second connection, data that is at least a portion of the data packets received on the first connection, such that the data received by the intelligent network interface circuitry on the first connection is switched to be outgoing from the intelligent network interface circuitry on the second connection, according to a second particular protocol.
Description
TECHNICAL FIELD

The present invention is in the field of intelligent network interface circuitry (NIC) (e.g., network interface cards and/or controllers) connectable to a core processor and, more particularly, relates to intelligent NIC that implements a protocol proxy in conjunction with protocol offload functionality.


BACKGROUND

Network protocols may be modeled as a layer of protocols from layer 1 to layer 7. For example, the IP protocol is modeled to be at layer-3, the TCP protocol is at layer-4, and various applications are at layer-7. The switching of network traffic using layer 4-7 information is well known.


A layer-4 switch, also sometimes referred to as a layer-4 load balancer, uses the four tuple information carried in a TCP/IP and UDP/IP packet to make a switching decision—for example, switching an incoming Web server request packet based on a hash of the four tuple information to one of the available server computers. As a result, processing load may be distributed across the available pool of servers, and this distribution of processing load is often called “load balancing.”


A layer-4 Performance Enhancing Proxy (PEP) is a layer-4 switch that performs further processing of layer-4 information with the goal of improving the performance in specific networking environments. One type of proxy is a TCP proxy that splits a TCP connection and performs the TCP protocol processing on each part. In some environments, connection splitting is beneficial, particularly when used in conjunction with environment specific enhancements.


A typical use of a layer-4 PEP is to connect networks that have different characteristics—for example, to connect a LAN/MAN/WAN Ethernet network to a Wireless LAN (WLAN) or a satellite network. Required features of a standard conforming PEP is described in RFC3135, and includes, as a base requirement, to be able to switch the network traffic from a first TCP connection to a second TCP connection, and in addition to have support in the TCP protocol implementation for the different network types.


A PEP typically employs a fully featured high performance TCP stack in addition to the standard features of a high performance TCP implementation as defined by RFC793, RFC1122, RFC2525, RFC2988, RFC2414, RFC1323, RFC2581, and RFC2474. Additional features may include stretch ACK (ACK moderation) with per-connection configurable inter-ACK spacing for asymmetric channels. Other useful features include byte counting, rather than ACK counting, to improve the sender's behavior when the receiver is using stretch ACKs; RTT based send pacing to reduce burstiness over long distance paths and/or paths with limited buffering capacity; advanced congestion control schemes designed for long distance or high error rate links (such as High-Speed TCP, and rate-halving); and the ability to perform rate control rather than standard window based congestion control over links with relatively high error rates (for example, wireless links).


A layer-7 switch uses information at layers above layer-4 to make a switching decision. For example, it is common practice to distribute databases across the available servers and then to direct, also referred to as “to switch”, the incoming request packets to the appropriate sever based on the data requested. For example, a database that stores street address map information might be distributed among several servers according to geographical location. A web server street-map-information request, in this case, is processed to determine the requested street address and, based on the requested street address, the request is switched to the appropriate server (i.e., the server that holds the street-map information for the requested address). The address information in this case is contained in layer-7 information such as an HTTP or XML request, that is encapsulated within L4 TCP payload, and the processing includes first processing the TCP payload, and then processing the layer-7 information within the TCP payload.


Another example of a layer-7 switch is an iSCSI storage switch that examines the iSCSI header and the SCSI Control Data Block (CDB) before deciding where to send the SCSI command. The switching decision in this case may be based on a switching table that associates SCSI storage blocks with a storage server and with a storage controller within that storage server. In this case, the switch examines the iSCSI header information and the CDB block information received on a TCP connection to make the switching decision to the appropriate storage node, based on the storage switching table. The processing may include first computing a cyclic redundancy check (CRC) on the iSCSI header and/or data information. The iSCSI switch in this example may either be centralized or distributed and form part of the storage controllers.


A protocol proxy may further be used in the role of a protocol “gateway”. For the sake of illustration, an example layer 4 protocol gateway would receive data encapsulated in UDP payload on one “connection” (an association between two processes described by the 4-tuple source and destination IP addresses, and source and destination UDP port numbers) and forward at least a part of that data encapsulated in TCP payload on another connection. This allows the use of each protocol in the appropriate environment, and takes advantage of the benefits of the protocol without requiring it to be used end-to-end. In this specific example, the benefits of UDP, e.g. simplicity and low overhead, are obtained on the first connection (which could be over a reliable local area network), whereas the benefits of TCP, e.g. reliability and congestion control, are obtained on the second connection (which could be over the Internet at large).


SUMMARY

A method of operating intelligent network interface circuitry includes the network interface circuitry coupling a core processor to a network to facilitate communication over the network between the core processor and at least a first peer and a second peer. A first connection connects to the first peer and a second connection connects to the second peer. The network interface circuitry receives data packets from the first peer via the network on the first connection, according to a first particular protocol. The network interface circuitry processes the received data, including associating, with the second connection, data that is at least a portion of the data packets received on the first connection, such that the data received by the intelligent network interface circuitry on the first connection is switched to be outgoing from the intelligent network interface circuitry on the second connection, according to a second particular protocol.





BRIEF DESCRIPTION OF FIGURES


FIGS. 1
a to 1c illustrate how TCP packets may be reordered in transit.



FIG. 2
a illustrates an example configuration in which an L4-L7 switch connects two peers, and FIG. 2b illustrates an example configuration in which an L4-L7 switch may implement a one-to-many and many-to-one relationship between connections.



FIG. 3
a is a block diagram illustrating an L4-L7 switching device in accordance with an example in which a core processor is separate from a TCP protocol offload engine (TOE), and FIG. 3b is a block diagram illustrating an L4-L7 switching device in accordance with an example in which the core processor is integrated (on the same chip) with the TOE.



FIG. 4 is a block diagram illustrating a flow processor architecture in the L4-L7 functionality may be accomplished.





DETAILED DESCRIPTION

In accordance with an aspect, layer-7 switch functionality is synergistically combined with that of a Protocol Enhancing Proxy (PEP) into a combined L4-L7 switching device.


We first point out that, in general, the TCP protocol payload is not guaranteed to arrive at its destination in the same order as the send order. As is well-known, this may result from TCP packets being lost in the network and subsequently being resent. In addition, packets may also be re-ordered en route from source to destination. This reordering is now discussed with reference to FIG. 1a, FIG. 1b and FIG. 1c.


We now turn to FIG. 1a, which illustrates an example of four TCP packets in send order of packet 0, packet 1, packet 2 and, finally, packet 3. FIG. 1b shows the reordering effect due to packet 1 initially being lost in transit and having to be subsequently resent, which leads to the arrival order of packet 0, 2, 3, and finally the retransmission of packet 1. FIG. 1c then shows the effect of packet 1 and 2 being reordered in transit from the source to destination, which leads to the arrival order of packet 0, 2, 1, and 3. It follows from these considerations that layer-7 requests, such as HTTP web server requests that are embedded (or encapsulated) within a TCP payload, are processed after the TCP receive processing is completed (also referred to as TCP termination). Even in the absence of re-ordering, it is possible that a layer-7 request spans two or more TCP packets. An HTTP request might, for example, start close to the end of packet 0 in FIG. 1 and also be partially contained in packet 1. In this case, the TCP stream of packets is processed first, before the processing of the layer-7 requests is performed.


In addition to the functionality discussed above, a layer-4 switch may implement Quality of Service (QoS) and traffic management functionality. The QoS and traffic management features can be used to pace packets for selected connections such that the packets are evenly distributed on the wire between the switch and the peers. A means of allowing the provisioning of the sender rate per class of connections, or per connection within a class enforces service level guarantees and can prevent buffer overflow in network devices on the connection path or in receivers that are slower than the sender. The capability to prioritize network traffic, into at least a latency sensitive and data mover classes is useful, for example, giving priority to the transmission and delivery of traffic for the latency-sensitive traffic over data mover traffic. This is useful, for example, to prioritize the delivery of latency sensitive voice traffic over data mover Web traffic in a converged services network.


One additional functionality which a layer 4 switch with TCP offload may perform is the processing of payload. It is thus possible to offload expensive per-byte processing such as but not limited to compression and extraction, encryption and decryption, and application level data integrity codes computation and checking.


Another useful functionality for a layer 4 switch is protocol translation, in order to use each protocol in the environment where its benefits are desirable. It is possible for example to perform TCP to UDP translation and vice versa, in order to use TCP in environments where reliability and congestion control are required, and UDP in say, reliable environments where simplicity is more desirable.


We now turn to FIG. 2a, which illustrates an example switch setup, and is referred to in the remainder of this discussion. In the FIG. 2a example, an L4-L7 switch is connected with peer1 via TCP connection1, which is also referred to as the TCP connection with identifier tid1. Similarly, the L4-L7 switch is connected with peer2 via TCP connection2, which is also referred to using the identifier tid2. Both connection1 and connection2 are full-duplex connections such that TCP payload packets can flow from the L4-L7 switch to peer1 or peer2, and acknowledgment packets can flow in the direction opposite to that of the payload packets. The example L4-L7 switching action includes switching incoming traffic (to the L4-L7 switch) from connection1 to outgoing traffic on connection2, and conversely also includes switching incoming traffic on connection2 to outgoing traffic on connection1.


As used in this present description and in the claims appended hereto (specifically, not necessarily including the “related applications” listed at the beginning of this description), the term “connection” refers to an association of data with particular source and destination indications. The term “connection” is not meant to require or imply a particular method or protocol for communication of the data from the source and destination. Thus, for example, even an association where data is transmitted by UDP, referred to traditionally as a “connectionless” protocol (since the state is not maintained) is covered by the term “connection” as used herein.


The association between connection1 and connection2 is itself the result of a procedure, which depends on the application of interest (layer-7 information). With respect to one example application, Network Address Translation, the association is established at connection initiation time and remains in effect for the lifetime of the connections.


Independently, while the FIG. 2a example illustrates a one-to-one association between two connections, other associations are possible and useful, such as one-to-many and many-to-one. In applications such as the storage switch described in the Background, the association is dynamic and one-to-many, and may change for every protocol data unit (PDU), as now discussed with reference to FIG. 2b. In the FIG. 2b example, the L4-L7 switch is, for example connected to an iSCSI initiator via connection1, and to three different iSCSI storage controllers via connection2A, connection2B, and connection2C. An iSCSI initiator request in this example, is switched to connection2A, connection2B, or connection2C depending on the location of the stored data that is being requested. In one example, the L4-L7 switch includes a table that relates the storage blocks (the stored data is stored in storage blocks) to the different storage controllers. The requested storage block is located using the table, and the request is directed to the controller that contains the located requested storage block. To illustrate an example of a many-to-one relation between the connections in a L4-L7 switch, for the storage controller reply direction, the reply data arrives via connection2A, connection2B, or connection2C (many) and the replies all go to the iSCSI initiator (to one) via connection1.


We now turn to FIG. 3a, which illustrates a layer 4-7 switching device 570 based on a TCP Protocol Offloading Engine (TOE) 530 which, in this example, has two 10 Gigabit Ethernet ports 540 and 550. The TOE also has a memory system 560 which typically contains a pool of equally sized send buffer pagers (TX pages 561) and a pool of equally sized receive buffer pages (RX pages 562). A send buffer for a particular offloaded connection typically includes a collection of TX pages, and a receive buffer for a particular offloaded connection typically includes a collection of RX pages. The pages are typically managed by a memory manager, which keeps a list of free pages, and access is typically by the use of page tables that are associated with each connection. The TX pages and the RX pages are shown in FIG. 3a as stored in off-chip memory, but these pages can in general be stored in on-chip memory and/or off-chip memory and/or in memory that is part of the core processor.


In some examples, the Core Processor and the TOE are integrated on the same chip, as is shown in FIG. 3b. The TOE 530 is typically connected to the Core Processor 500 via a physical or logical link 520. An example of a physical link is a peripheral bus such as the PCI bus, or a processor bus such as the Hyper-Transport bus, and an example of a logical link is a memory request/response bus within a memory controller of the Core Processor 500.


The Core Processor 500 also includes a memory subsystem 510 that can store (among other things) Direct Memory Access (DMA) transmit buffers 513 containing data that is to be DMA read by the TOE 530 (or DMA written by the Core Processor 500) and subsequently sent as egress network packets to one of the 10GE interfaces 540 or 550. The memory subsystem also contains DMA receive buffers 514, that are DMA written by the TOE 530 (or DMA read by the Core Processor 500) via the link 520. The memory subsystem also holds send commands 511, also referred to as DMA gather lists, that list the locations within the TX-buffers 513 of data that is to be DMA read by the TOE 530. The memory subsystem also holds responses 512, also referred to as completion events, that are DMA written by the TOE 530 indicating progress in processing the send commands 511 and also describing the location and length of the data that has been DMA written by the TOE 530 to the core processor memory 510.


The L4-L7 switching function, the iSCSI storage switch, and the TCP Proxy function 570 is implemented using two offloaded connections: connection) that connects the TCP Proxy 570 with peer1, and connection2 that connects the L4-L7 switch 570 with peer2 (referring again to FIG. 2b). The two connections can be connected to the TOE 530 via the 10GE interface 540 or the 10GE interface 550. In one mode of operation, the receive data from one of the 10GE ports, for connection), is TCP processed by the TOE 530 and subsequently DMA-ed to the core processor RX-buffer 514, and a record containing the connection) tag, and the location and length of the written data is written to the response buffer 512. Then a gather list is created for connection2 that contains the location of the data just DMA-written for connection2. The data has thus been effectively moved to the core processor TX-buffer 513 for connection2, and is subsequently DMA read by the TOE 530 from the core processor send buffer 513 to the TOE send buffer 561. The receive data for connection2 is likewise DMA-ed to the core processor RX-buffer 514. A response entry is written to the response area, then a gather list is created for connection), which effectively moves the data to the core processor TX-buffer 513 for connection). The data is subsequently DMA read by the TOE 530 to the transmit buffer 561.


The L4-L7 switch 570 is also responsible for the processing to accomplish switching between connection1 and connection2, for the flow control of the received data by managing the receive window size, and for the flow control relative to transmission of data in the TOE transmit buffers 561. For an aggregate network bandwidth of 10 Gigabit per second (Gbps), the Core Processor 500 memory bandwidth includes, at a minimum, 20 Gigabits of bandwidth to accommodate a single DMA data moving operation at 10 Gbps into the RX buffer 514 and another 10 Gbps DMA data moving operation from the TX-buffer 513 to the TOE send buffer 561.


A more efficient operation mode of L4-L7 switch operates to move the data directly from the receive buffer 562 for connection1 (connection2) to the transmit buffer 561 for connection2 (connection1) (referred to as the zero-copy MOVE-option). An even more efficient operating mode operates to commingle the receive buffer for connection1 and the send buffer for connection2, and the receive buffer for connection2 and the send buffer for connection1 (referred to as the zero-copy SHARE-option). The zero-copy here refers to the number of times that the data uses the Core Processor Memory 510 interface.


The zero-copy MOVE-option has an advantage of allowing the editing of the L5-L7 headers as they pass from connection1 to connection2 (or from connection2 to connection1). An advantage of the zero-copy SHARE-option is that it requires just one half the memory bandwidth in the TOE for the switching operation. That is, the zero-copy MOVE-option uses one write and one read to the RX-pages buffer 562, and another write and read to and from the TX-pages buffer 561. By contrast, the zero-copy SHARE-option just writes the received data once to the receive buffer of the first connection and reads the data once when sent from the second connection to the second peer.


With reference to FIG. 4, we now describe an example architecture of a flow processor architecture of the interface device 100, having a capability to implement an L4-L7 switch to accomplish the zero-copy MOVE-option. An arbiter 102 arbitrates among various signals such as headers of control messages from a core processor (104a), data packets from the network (104b), transmission modulation event tokens (104c), receive modulation event tokens (104d), and Protocol Data Unit (PDU) feedback read responses (104e). The transmission modulation event tokens are associated with transmission traffic management functionalities, and the receive modulation event tokens with receive traffic management functionalities, respectively. The PDU 104e feedback read responses contain the first eight bytes of a PDU read from a per-connection receive buffer, or per-connection send buffer, which is used to determine the header length and the payload length of a particular PDU in the case where messages sent to the core processor should be PDU aligned, or when the egress TCP segments should be PDU aligned.


It is noted that the arbiter 102 is a feature of the particular flow processor architecture of the FIG. 4 circuitry and typically has only an indirect effect on the layer 4-7 switch function.


When the arbiter 102 operates to allow an ingress Ethernet packet through into the processing pipeline, the protocol processing block 107 includes a database lookup block 108 that locates the state for an offloaded protocol, such as TCP. A packet is identified by the header, or headers, that the packet contains. As an example, the headers for Ethernet packets contain at least a protocol stack layer-2 Ethernet packet; and when the Ethernet packet encapsulates an IP packet, the packet also contains a layer-3 IP header; and when the IP header encapsulates a layer-4 TCP (or UDP) protocol, the packet also contains a TCP (UDP) header. For a TCP packet, a 4-tuple consisting of a source and destination IP address, and a source and destination TCP (UDP) port numbers is said to uniquely identify a point-to-point connection that uses the protocol. For offloaded connections, the lookup minimally considers the 4-tuple information, and it can optionally contain one or more components to facilitate such functions as server virtualization, Virtual LAN (VLAN) functionality, and per-packet filtering and re-write.


The lookup block 108 typically operates to match the protocol header, and optionally one or more other components as discussed above, to an internal identification (“tid,” used by the interface device and the core processor) corresponding to a particular protocol or filtering rule Control Block (CB). In the FIG. 4 example, the lookup database is implemented with a TCAM memory, which allows looking up the location of a CB in pipelined fashion, with one tid result being returned from the TCAM every clock cycle after a pipeline startup delay. In place of the TCAM, other structures may be used, such as hashing or a search tree, or a combination of these methods, to implement the lookup function.


The lookup block 108 then provides the tid, received from the TCAM 110, to connection manager circuitry 112 that manages the connection state and attributes. In the FIG. 4 example, the connection state and attributes are in a Control Block (CB) 114. The connection manager 112 operates in concert with the payload command manager 116 to generate and provide separately ingress payload commands E_PCMD 118a to an ingress payload manager block, and egress payload commands C_PCMD 118b to an egress payload manager block 118b. We note that the TCP protocol is a full-duplex protocol and as such an ingress packet can both carry payload data, and acknowledgements for previously sent egress packets, on the same connection, and in this case an E_PCMD might for example write the ingress payload to the core processor, and the acknowledgement contained in the ingress packet can enable further sending of egress payload and a C_PCMD might then be issued to read payload from a per-connection send buffer to form an egress packet. The core processor, or core for short, refers to a host computer connected to the NIC, and/or an on-chip processor, or processor on the NIC card.


In particular, for offloaded connections, the connection manager provides the tid to the CB 114, and the CB 114 provides the current connection state and attributes for the connection (i.e., the connection to which the tid corresponds) to the connection manager 112. Based on the current connection state and attributes provided from the CB 114, the connection manager 112 determines that it corresponds to an offloaded connection, how to appropriately modify the connection state and provides, to the payload command manager 116, an indication of the modification to the connection state. Based on the indication of the modification, the payload command manager 116 issues, for example, an ingress message header to the form ingress packet block 120a.


The payload command manager 116 also issues one or more appropriate payload commands to the ingress payload manager block 118a to, for example, cause data to be written to the core processor or, when the data is not ready to be written to core, the payload command manager creates an Rx modulation event, which causes traffic management functionality to schedule later delivery to the core processor. For a TCP connection, the message to send payload to the core processor is, in one example, a CPL_RX_DATA message, indicating that the payload is to be written into an anonymous free-list buffer. The message can also indicate that the payload is to be directly placed in a specific location in the core memory. Furthermore, in an L4-L7 switching application, payload may be encapsulated in a CPL_RX2TX_DATA message indicating that the data is to be written to the send buffer for a particular connection.


The PM_TX 118b egress payload manager includes a send buffer that is organized as a pool of pages shared among the various offloaded connections. The core allocates pages in the send buffer to particular connections, and a CPL_TX_DATA_ACK message is sent back from the flow processor to the core processor. The core processor uses the CPL_TX_DATA_ACK message to determine when a page (or pages) is freed for reuse (by the same or a different connection). This typically occurs when the data payload stored in the pages has been acknowledged by the peer via TCP. The CPL_TX_DATA_ACK message contains the tid identifier, to enable determining which connection is freeing page(s). The core can thereby use the information contained in this message to adjust its information regarding the current size of the send buffer allocated to a particular connection.


When receive flow control is enabled for a particular connection, the CPL_RX_DATA_ACK message that is sent by the core to the connection manager is used by the connection manager to manage the size of the receive window for the individual connections. The receive window is initialized, at connection creation time, to a particular value that indicates the number of bytes that the peer is allowed to send to the connection. When a payload is sent to the core processor for a flow controlled connection, the size of the receive window for the connection is decremented by the size of the sent payload. The CPL_RX_DATA_ACK message, which includes a byte count parameter, is then used to increase the receive window size by the specified byte count to open up the receive window for a particular connection.


For offloaded connections, the connection manager 112 writes the modified connection state and attributes back into the CB 114. The read, modify and write of the connection state and attributes is done in an atomic operation. Here, atomic refers to the property that a read of the CB always returns the most recent state of the particular CB, even though the pipeline might be processing multiple messages simultaneously, that are associated with the same CB.


There are two form packet blocks—an ingress form packet block 120a and an egress form packet block 120b. The egress form packet block 120b combines headers for the various layers (e.g., Ethernet, IP, and TCP) the corresponding payload from the egress payload block 118b into an Ethernet packet for transmission to the wire. The ingress form packet block 118a combines a CPL message header such as the CPL_RX_DATA or CPL_RX2TX_DATA headers with the ingress payload from the ingress payload block PM_RX 118a, and typically sends the message to the core, for example, in the case of a CPL_RX_DATA message.


For a CPL_RX2TX_DATA message, the RX2TX de-multiplexer block 121 processes the message, such that the header is re-written as a CPL_TX_DATA message 123. The header is injected into the arbiter 102 as a simulated egress CPL message from the core, and the CPL_RX2TX_DATA payload is injected by the RX2TX arbiter 122 as simulated egress payload into the egress PM_TX 118b payload manager. The ingress payload is thus moved from an ingress payload buffer for one connection to an egress payload buffer for another connection. The CPL_RX2TX_DATA header contains the tid for the egress connection that is to send the data payload, and this tid value is stored as part of the CB for the ingress connection.


We now discuss how the L4-L7 switching action is carried out in one operating mode. When a TCP connection setup request is received from peer1, the static L4-L7 switching is implemented by opening a first connection connection1 to peer1 and a second connection connection2 to peer2. The core is involved in the management of the receive window of connection1 and the send window of connection2 and, similarly, the receive window of connection2 and the send window of connection1. Also, in a static mapping mode of operation, the tid of connection2 is stored within the CB state of connection1, to allow filling in the tid field of the CPL_RX2TX_DATA message that sends the ingress payload from connection1 to connection2. Similarly, the tid of connection1 is stored within the CB state of connection2 to allow formulating the CPL_RX2TX_DATA message that sends ingress payload from connection2 to connection1.


In another operating mode, a dynamic L4-L7 switching capability is implemented by viewing the TCP bytes stream as a sequence of application layer data units (PDU), each including a PDU header and PDU payload. The flow processor delivers ingress data in the core direction in two phases. In the first phase, the PDU header phase, a specified number of bytes is delivered to the core. In the PDU-payload phase, a specified number of payload bytes is delivered to the core or to another connection. The header phase may be repeated more than once for a particular PDU. For example, for iSCSI PDU's that have an auxiliary header, the first header phase would be utilized to determine the size of the auxiliary header, and the second header phase would then deliver the auxiliary header. The payload size is typically determined upon further examination of the header bytes. This determination may be done by the core processor or by the protocol processing block in the TOE.


The header delivery phase has the per-connection configurable option of adjusting the receive buffer by the amount of bytes that are sent to the core or, alternately, just delivering a copy of the header to the core while preserving the receive buffer as is. The first option may be used, for example, when the core might edit the header information, before forwarding it, in which case the modified header is written to the send buffer of switched-connection by writing the tid of the destination connection into the ingress connection, before the PDU-payload is forwarded to the destination connection. The copy option is more efficient when the core does not modify the header, in which case the flow processor is instructed to forward a specified number of bytes to connection2. After forwarding the specified number of bytes, the ingress operation switches again to the header phase.

Claims
  • 1. A method of operating intelligent network interface circuitry, wherein the network interface circuitry couples a core processor to a network to facilitate communication over the network between the core processor and at least a first peer, wherein a first connection connects the core processor to the first peer, the method comprising: by the network interface circuitry, receiving data packets from the first peer via the network on the first connection, according to a first particular protocol; andprocessing the received data packets, wherein each of a plurality of connections couples the core processor to facilitate communication over the network between the core processor and a respective separate one of a plurality of peers, other than the first peer, the processing of the received data packets including selecting one of the plurality of connections as a second connection and associating, with the second connection, data that is at least a portion of the data packets received on the first connection, such that the data received by the intelligent network interface circuitry on the first connection is switched to be outgoing from the intelligent network interface circuitry on the second connection, according to a second particular protocol.
  • 2. The method of claim 1, wherein: the network interface circuitry includes a pipelined processor circuitry configured to process data received by the network interface circuitry from the core for transmission to a peer via one of the connections and also to process data received by the network interface circuitry via one of the connections for receipt by the core; andassociating, with the second connection, data that is at least a portion of the data packets received on the first connection includes generating data by the core processor, to be delivered on the second connection.
  • 3. The method of claim 2, wherein: the pipelined processor circuitry includes ingress form packet circuitry configured to form ingress packets, from data received from the network, to provide to the core;egress form packet circuitry configured to form egress packets, from data received from the core, to provide to the network;intercept and redirect circuitry to selectively intercept packets from the ingress form packet circuitry, formed by the ingress form packet circuitry based on data received on the first connection, and to provide a message to the pipelined processor circuitry simulating a message from the core instructing the pipelined processor circuitry to provide the data of the ingress packets, from the ingress form packet circuitry based on data received on the first connection, to be redirected to the egress form packet circuitry to form egress packets to be delivered on the network on the second connection.
  • 4. The method of claim 3, wherein: selecting one of the plurality of connections as the second connection includes storing an identification of the second connection in the network interface circuitry in a control block associated with the first connection.
  • 5. The method of claim 1, wherein: associating, with the second connection, data that is at least a portion of the data packets received on the first connection includes placing the data in a transmit buffer associated with the second connection.
  • 6. The method of claim 5, wherein: the data is placed in the transmit buffer associated with the second connection without storing the data in a receive buffer associated with the first connection.
  • 7. The method of claim 5, wherein: placing the data in a transmit buffer associated with the second connection includes appending the data to data already in the transmit buffer associated with the second connection.
  • 8. The method of claim 1, wherein: the first particular protocol operates at no higher than layer 4, andprocessing the received data includes processing the packets at higher than layer 4, wherein the associating is based on a result of processing the layers up to higher than layer 4.
  • 9. The method of claim 1, wherein: the data received from the first peer via the network on the first connection is payload data included in data packets received from the first peer associated with the first connection;the first connection and the second connection are full duplex, andthe method further comprises, by the intelligent network interface circuitry receiving data packets from the second peer on the second connection;associating, with the first connection, the data packets received on the second connection, such that the data incoming to the intelligent network interface circuitry on the second connection is switched to be outgoing from the intelligent network interface circuitry on the first connection.
  • 10. The method of claim 1, wherein: the first particular protocol is the same as the second particular protocol.
  • 11. The method of claim 1, wherein: associating the received data with the second connection is a result of a static configuration.
  • 12. The method of claim 1, wherein: selecting one of the plurality of connections as the second connection includes the network interface circuitry providing the at least a portion of the data packet received on the first connection to the core; andthe network interface circuitry receiving an indication from the core of a determination of which of the plurality of connections is the second connection.
  • 13. The method of claim 12, wherein: the network interface circuitry providing the at least a portion of the data packet received on the first connection to the core includes delineating at least one protocol data unit in the data received on the first connection; andthe at least a portion of the data packet received on the first connection is at least a portion of the at least one protocol data unit delineated by the network interface circuitry.
  • 14. The method of claim 1, wherein: the network interface circuitry selecting one of the plurality of connections as the second connection includes, by the network interface circuitry, processing the at least a portion of the data packet received on the first connection in view of control information associated with the connections.
  • 15. The method of claim 14, wherein: selecting one of the plurality of connections as the second connection is performed by a core that is a processor on the network interface circuitry.
  • 16. The method of claim 1, wherein: the portion of the data packets processed for associating the received data with the second connection includes a portion of the data packets associated with layer 5 to 7 packet headers.
  • 17. The method of claim 16, wherein: the portion of the data packets associated with layer 5 to 7 packet headers includes iSCSI packet headers.
  • 18. The method of claim 17, wherein: the plurality of separate peers to which the plurality of connections connect are storage controllers.
  • 19. The method of claim 1, wherein: the processing of at least a portion of data packets received on the first connection selecting one of the plurality of connections as the second connection is on a per Protocol Data Unit basis in at least one of layers 5-7.
  • 20. The method of claim 19, wherein: the portion of the data packets associated with layer 5 to 7 packet headers includes iSCSI packet headers.
  • 21. The method of claim 1, wherein; data is provided to the second connection according to a desired data rate transmission characteristic characterizing the second connection.
  • 22. The method of claim 21, wherein providing the data to the second connection according to the desired data rate transmission characteristic includes: managing modulation event tokens, including receiving and providing modulation event tokens;processing modulation events;deciding whether to transmit the received packets to the second connection in association with modulation event processing;transmitting the received packets out to the network based on the deciding step; andbased on a result of the modulation events processing step, causing modulation event tokens to be fed back for receipt by the modulation event tokens managing step.
  • 23. The method of claim 1, further comprising: performing supplemental processing on the data provided on the second connection, not associated with switching the data.
  • 24. The method of claim 23, wherein the supplemental processing includes verifying the integrity of the received data.
  • 25. The method of claim 24, including: inserting, into the received data, a result of computing the integrity of the received data.
  • 26. The method of claim 24, wherein verifying the integrity of the received data includes at least one of Cyclic Redundancy Check and checksum.
  • 27. The method of claim 23, wherein the supplemental processing includes at least one of encryption/decryption and compression/decompression.
  • 28. The method of claim 1, wherein: the first protocol is TCP and the second protocol is UDP.
  • 29. The method of claim 1, wherein: the first protocol is UDP and the second protocol is TCP.
  • 30. The method of claim 1, wherein: the first protocol and the second protocol are variants of the same protocol with environment specific optimizations.
  • 31. The method of claim 30, wherein: the first protocol and the second protocol are variants of the TCP protocol with environment specific optimizations.
  • 32. The method of claim 30, wherein: one of the environments is a wireless network.
  • 33. The method of claim 30, wherein: one of the environments is a long distance network.
  • 34. The method of claim 1, wherein: the first protocol and the second protocol are variants of the same protocol configured differently.
  • 35. A method of operating intelligent network interface circuitry, wherein the network interface circuitry couples a core processor to a network to facilitate communication over the network between the core processor and at least a first peer, wherein a first connection connects the core processor to the first peer, the method comprising: by the network interface circuitry, receiving data packets from the first peer via the network on the first connection, according to a first particular protocol, the received data packets including an indication of the core processor as a destination of the received data packets according to the first particular protocol; andprocessing the received data packets, wherein each of a plurality of connections couples the core processor to facilitate communication over the network between the core processor and a respective separate one of a plurality of peers, other than the first peer, the processing of the received data packets including selecting one of the plurality of connections as a second connection and associating, with the second connection, data that is at least a portion of the data packets received on the first connection, such that the data received by the intelligent network interface circuitry on the first connection is switched to be outgoing from the intelligent network interface circuitry on the second connection, according to a second particular protocol.
CROSS REFERENCE TO RELATED APPLICATIONS

The subject application is a Continuation of U.S. application Ser. No. 11/356,850, filed Feb. 17, 2006, and entitled “Method to Implement an L4-L7 Switch Using Split Connections and an Offloading Nic” and now U.S. Pat. No. 7,616,563, which is a Continuation-in-Part of U.S. application Ser. No. 11/330,898, filed Jan. 12, 2006 and entitled “Virtualizing the Operation of Intelligent Network Interface Circuitry” and now U.S. Pat. No. 7,660,306, which is a Continuation-in-Part of U.S. patent application Ser. No. 11/313,003, filed Dec. 19, 2005 and entitled “A Method for Traffic Scheduling in Intelligent Network Interface Circuitry” and now U.S. Pat. No. 7,660,264, which is a Continuation-in-Part of U.S. patent application Ser. No. 11/282,933, filed Nov. 18, 2005 and entitled “A Method for UDP Transmit Protocol Offload Processing with Traffic Management”, and now U.S. Pat. No. 7,715,436, which is a Continuation-in-Part of U.S. patent application Ser. No. 11/217,661, filed Aug. 31, 2005 and entitled “Protocol Offload Transmit Traffic Management” and now U.S. Pat. No. 7,724,658, all of which are incorporated herein by reference for all purposes.

US Referenced Citations (195)
Number Name Date Kind
4445116 Grow Apr 1984 A
4533996 Hartung et al. Aug 1985 A
5058110 Beach et al. Oct 1991 A
5497476 Oldfield et al. Mar 1996 A
5778189 Kimura et al. Jul 1998 A
5937169 Connery et al. Aug 1999 A
6087581 Emmer et al. Jul 2000 A
6141705 Anand et al. Oct 2000 A
6226680 Boucher et al. May 2001 B1
6240094 Schneider May 2001 B1
6247060 Boucher et al. Jun 2001 B1
6334153 Boucher et al. Dec 2001 B2
6389479 Boucher et al. May 2002 B1
6393487 Boucher et al. May 2002 B2
6397316 Fesas, Jr. May 2002 B2
6401177 Koike Jun 2002 B1
6427171 Craft et al. Jul 2002 B1
6427173 Boucher et al. Jul 2002 B1
6434620 Boucher et al. Aug 2002 B1
6460080 Shah et al. Oct 2002 B1
6470415 Starr et al. Oct 2002 B1
6510164 Ramaswamy et al. Jan 2003 B1
6564267 Lindsay May 2003 B1
6591302 Boucher et al. Jul 2003 B2
6594268 Aukia et al. Jul 2003 B1
6625671 Collette et al. Sep 2003 B1
6658480 Boucher et al. Dec 2003 B2
6681244 Cross et al. Jan 2004 B1
6687758 Craft et al. Feb 2004 B2
6697868 Craft et al. Feb 2004 B2
6701372 Yano et al. Mar 2004 B2
6708223 Wang et al. Mar 2004 B1
6708232 Obara Mar 2004 B2
6717946 Hariguchi et al. Apr 2004 B1
6751665 Philbrick et al. Jun 2004 B2
6757245 Kuusinen et al. Jun 2004 B1
6757746 Boucher et al. Jun 2004 B2
6792502 Pandya et al. Sep 2004 B1
6798743 Ma et al. Sep 2004 B1
6807581 Starr et al. Oct 2004 B1
6813652 Stadler et al. Nov 2004 B2
6862648 Yatziv Mar 2005 B2
6907042 Oguchi et al. Jun 2005 B1
6925055 Erimli et al. Aug 2005 B1
6938092 Burns Aug 2005 B2
6941386 Craft et al. Sep 2005 B2
6965941 Boucher et al. Nov 2005 B2
6996070 Starr et al. Feb 2006 B2
7031267 Krumel Apr 2006 B2
7042898 Blightman et al. May 2006 B2
7076568 Philbrick et al. Jul 2006 B2
7089289 Blackmore et al. Aug 2006 B1
7089326 Boucher et al. Aug 2006 B2
7093099 Bodas et al. Aug 2006 B2
7114096 Freimuth et al. Sep 2006 B2
7124205 Craft et al. Oct 2006 B2
7133902 Saha et al. Nov 2006 B2
7133914 Holbrook Nov 2006 B1
7133940 Blightman et al. Nov 2006 B2
7164656 Foster et al. Jan 2007 B2
7167926 Boucher et al. Jan 2007 B1
7167927 Philbrick et al. Jan 2007 B2
7174393 Boucher et al. Feb 2007 B2
7185266 Blightman et al. Feb 2007 B2
7191241 Boucher et al. Mar 2007 B2
7191318 Tripathy et al. Mar 2007 B2
7239642 Chinn et al. Jul 2007 B1
7254637 Pinkerton et al. Aug 2007 B2
7260631 Johnson et al. Aug 2007 B1
7284047 Barham et al. Oct 2007 B2
7313623 Elzur et al. Dec 2007 B2
7320042 Trainin et al. Jan 2008 B2
7376147 Seto et al. May 2008 B2
7408906 Griswold et al. Aug 2008 B2
7447795 Naghshineh et al. Nov 2008 B2
7453892 Buskirk et al. Nov 2008 B2
7457845 Fan et al. Nov 2008 B2
7474670 Nowshadi Jan 2009 B2
7493427 Freimuth et al. Feb 2009 B2
7533176 Freimuth et al. May 2009 B2
7583596 Frink Sep 2009 B1
7596634 Mittal et al. Sep 2009 B2
7609696 Guygyi et al. Oct 2009 B2
7616563 Eiriksson et al. Nov 2009 B1
7660264 Eiriksson et al. Feb 2010 B1
7660306 Eiriksson et al. Feb 2010 B1
7715436 Eiriksson et al. May 2010 B1
7724658 Eiriksson et al. May 2010 B1
7735099 Micalizzi, Jr. Jun 2010 B1
7751316 Yarlagadda et al. Jul 2010 B2
7760733 Eiriksson et al. Jul 2010 B1
7813339 Bar-David et al. Oct 2010 B2
7826350 Michailidis et al. Nov 2010 B1
7831720 Noureddine et al. Nov 2010 B1
7831745 Eiriksson et al. Nov 2010 B1
7924840 Eiriksson et al. Apr 2011 B1
7945705 Eiriksson et al. May 2011 B1
20010010046 Muyres et al. Jul 2001 A1
20010021949 Blightman et al. Sep 2001 A1
20010037406 Philbrick et al. Nov 2001 A1
20020039366 Sano Apr 2002 A1
20020101848 Lee et al. Aug 2002 A1
20020188753 Tang et al. Dec 2002 A1
20020191622 Zdan Dec 2002 A1
20030005164 Trainin Jan 2003 A1
20030018516 Ayala et al. Jan 2003 A1
20030035436 Denecheau et al. Feb 2003 A1
20030046330 Hayes Mar 2003 A1
20030048751 Han et al. Mar 2003 A1
20030079033 Craft et al. Apr 2003 A1
20030158906 Hayes Aug 2003 A1
20030200284 Philbrick et al. Oct 2003 A1
20030204631 Pinkerton et al. Oct 2003 A1
20040003094 See Jan 2004 A1
20040019689 Fan Jan 2004 A1
20040028069 Tindal et al. Feb 2004 A1
20040030745 Boucher et al. Feb 2004 A1
20040042487 Ossman Mar 2004 A1
20040047361 Fan et al. Mar 2004 A1
20040054813 Boucher et al. Mar 2004 A1
20040062245 Sharp et al. Apr 2004 A1
20040062246 Boucher et al. Apr 2004 A1
20040064578 Boucher et al. Apr 2004 A1
20040064590 Starr et al. Apr 2004 A1
20040073703 Boucher et al. Apr 2004 A1
20040078480 Boucher et al. Apr 2004 A1
20040088262 Boucher et al. May 2004 A1
20040100952 Boucher et al. May 2004 A1
20040111535 Boucher et al. Jun 2004 A1
20040117496 Mittal et al. Jun 2004 A1
20040117509 Craft et al. Jun 2004 A1
20040123142 Dubal et al. Jun 2004 A1
20040158640 Philbrick et al. Aug 2004 A1
20040165592 Chen et al. Aug 2004 A1
20040190533 Modi et al. Sep 2004 A1
20040199808 Freimuth et al. Oct 2004 A1
20040213235 Marshall et al. Oct 2004 A1
20040240435 Craft et al. Dec 2004 A1
20050071490 Craft et al. Mar 2005 A1
20050083850 Sin et al. Apr 2005 A1
20050083935 Kounavis et al. Apr 2005 A1
20050102682 Shah et al. May 2005 A1
20050111483 Cripe et al. May 2005 A1
20050120037 Maruyama et al. Jun 2005 A1
20050125195 Brendel Jun 2005 A1
20050135378 Rabie et al. Jun 2005 A1
20050135396 McDaniel et al. Jun 2005 A1
20050135412 Fan Jun 2005 A1
20050135417 Fan et al. Jun 2005 A1
20050147126 Qiu et al. Jul 2005 A1
20050188074 Voruganti et al. Aug 2005 A1
20050190787 Kuik et al. Sep 2005 A1
20050216597 Shah et al. Sep 2005 A1
20050223134 Vasudevan et al. Oct 2005 A1
20050259644 Huitema et al. Nov 2005 A1
20050259678 Gaur Nov 2005 A1
20050286560 Colman et al. Dec 2005 A1
20050289246 Easton et al. Dec 2005 A1
20060015618 Freimuth et al. Jan 2006 A1
20060015651 Freimuth et al. Jan 2006 A1
20060031524 Freimuth et al. Feb 2006 A1
20060039413 Nakajima et al. Feb 2006 A1
20060072564 Cornett et al. Apr 2006 A1
20060075119 Hussain et al. Apr 2006 A1
20060080733 Khosmood et al. Apr 2006 A1
20060133267 Alex et al. Jun 2006 A1
20060168649 Venkat et al. Jul 2006 A1
20060200363 Tsai Sep 2006 A1
20060206300 Garg et al. Sep 2006 A1
20060209693 Davari et al. Sep 2006 A1
20060221832 Muller et al. Oct 2006 A1
20060221946 Shalev et al. Oct 2006 A1
20060235977 Wunderlich et al. Oct 2006 A1
20060265517 Hashimoto et al. Nov 2006 A1
20060274788 Pong Dec 2006 A1
20060281451 Zur Dec 2006 A1
20070011358 Wiegert et al. Jan 2007 A1
20070033301 Aloni et al. Feb 2007 A1
20070064737 Williams Mar 2007 A1
20070070901 Aloni et al. Mar 2007 A1
20070083638 Pinkerton et al. Apr 2007 A1
20070086480 Elzur et al. Apr 2007 A1
20070110436 Bennett May 2007 A1
20070143848 Kraemer et al. Jun 2007 A1
20070201474 Isobe Aug 2007 A1
20070233892 Ueno Oct 2007 A1
20080002731 Tripathy et al. Jan 2008 A1
20080016511 Hyder et al. Jan 2008 A1
20080043750 Keels et al. Feb 2008 A1
20080135415 Han et al. Jun 2008 A1
20080168190 Parthasarathy et al. Jul 2008 A1
20080232386 Gorti et al. Sep 2008 A1
20080273532 Bar-David et al. Nov 2008 A1
20090172301 Ebersole et al. Jul 2009 A1
20100023626 Hussain et al. Jan 2010 A1
Continuations (1)
Number Date Country
Parent 11356850 Feb 2006 US
Child 12567581 US
Continuation in Parts (4)
Number Date Country
Parent 11330898 Jan 2006 US
Child 11356850 US
Parent 11313003 Dec 2005 US
Child 11330898 US
Parent 11282933 Nov 2005 US
Child 11313003 US
Parent 11217661 Aug 2005 US
Child 11282933 US