The present invention relates to networking, and more particularly to upper level network protocols [e.g. Internet small computer system interfaces (iSCSI), etc].
The Internet small computer system interface (iSCSI) protocol is an Internet protocol (IP)-based storage networking standard for linking data storage facilities, developed by the Internet Engineering Task Force (IETF). By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances.
The iSCSI protocol is among the many technologies expected to help bring about rapid development of the storage area network (SAN) market, by increasing the capabilities and performance of storage data transmission. Because of the ubiquity of IP networks, iSCSI can be used to transmit data over local area networks (LANs), wide area networks (WANs), or the Internet and can enable location-independent data storage and retrieval.
Prior art
In use, once communication across the network 116 is established using a connection or socket, the transport offload engine 104 receives packet data [e.g. iSCSI protocol data units (PDUs), etc.]. Once received, the transport offload engine 104 stores the data contained in the PDUs in a TOE buffer 112, in order to provide time to generate a data available message 117 and send the message to the host processor 102. The foregoing operation of the transport offload engine 104 may be governed by control logic 114 of the transport offload engine 104.
In response to a data available message 117, the host processor 102 generates a data list 106 [e.g. a scatter-gather list (SGL), memory descriptor list (MDL), etc.] that describes the location(s) in application memory 110 where the incoming data is ultimately to be stored. As shown, to accomplish this, the data list 106 may include at least one memory start address where the incoming data in each PDU is to be stored, with each start address followed by the length of a region in the application memory 110.
In use, the host processor 102 generates and associates the data list 106 with a socket (also known as a connection) associated with the received PDUs that prompted the corresponding data available message(s) 117. The incoming data contained in the PDUs is then copied from the TOE buffer 112 to the application memory 110 using the locations described by the data list 106 corresponding to that socket.
To date, the transport offload engine 104 has been utilized for offloading various lower level protocol operations. On the other hand, various operations associated with upper level network protocols (e.g. iSCSI, etc.) have been traditionally carried utilizing the host processor 102. For example, cyclical redundancy checking (CRC) and other upper level network protocol operations are typically performed by the host processor 102. By way of background, CRC provides a check value designed to catch most transmission errors. In use, a decoder calculates a CRC for received data and compares it to a CRC appended to the data. A mismatch indicates that the data was corrupted in transit.
Unfortunately, utilizing the host processor 102 for such upper level network protocol operations detrimentally affects performance of the overall system 100.
There is thus a need for overcoming these and/or other problems associated with the prior art.
A system, method and associated data structure are provided for offloading upper protocol layer operations. In use, data is communicated over a network utilizing a plurality of protocols associated with a plurality of protocol layers, where the protocol layers include a network layer. Further, processing associated with the communicating is offloaded, at least in part. Such offloaded processing involves at least one protocol associated with at least one of the protocol layers situated at or above the network layer. Still yet, such offloading is performed statelessly and avoids use of state information associated with a memory descriptor list. Further still, the offloaded processing includes an insertion of markers into a data stream including the communicated data, and the markers are distinct from cyclical redundancy checking (CRC) values and are inserted as the communicated data is being transmitted, after the CRC values are calculated, such that a framing mechanism is provided which may be utilized to identify a subsequent data unit in the data stream.
Prior art
Coupled to the network 202 are a local host 204 and a remote host 206 which are capable of communicating over the network 202. In the context of the present description, such hosts 204, 206 may include a web server, storage device or server, desktop computer, lap-top computer, hand-held computer, printer or any other type of hardware/software. It should be noted that each of the foregoing components as well as any other unillustrated devices may be interconnected by way of one or more networks.
For example, the architecture 300 may be implemented in the context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, a set-top box, a network adapter, a router, a network system, a storage system, an application-specific system, or any other desired system associated with the network 202.
As shown, the architecture 300 includes a plurality of components coupled via a bus 302. Included is at least one processor 304 for processing data. While the processor 304 may take any form, it may, in one embodiment, take the form of a central processing unit (CPU), a host processor, a chipset (i.e. a group of integrated circuits designed to work and sold as a unit for performing related functions, etc.), or any other desired processing device(s) capable of processing data.
Further included is processor system memory 306 which resides in communication with the processor 304 for storing the data. Such processor system memory 306 may take the form of on-board or off-board random access memory (RAM), a hard disk drive, a removable storage drive (e.g. a floppy disk drive, a magnetic tape drive, a compact disk drive, etc.), and/or any other type of desired memory capable of storing data. While not shown, the architecture 300 may further be equipped with a display, various input/output (I/O) devices, etc.
In use, programs, or control logic algorithms, may optionally be stored in the processor system memory 306. Such programs, when executed, enable the architecture 300 to perform various functions. Of course, the architecture 300 may simply be hardwired.
Further shown is a transport offload engine 312 in communication with the processor 304 and the network (e.g. see, for example, network 202 of
While a single bus 302 is shown to provide communication among the foregoing components, it should be understood that any number of bus(es) (or other communicating mechanisms) may be used to provide communication among the components. Just by way of example, an additional bus may be used to provide communication between the processor 304 and processor system memory 306.
During operation, the transport offload engine 312 may be used for offloading upper protocol layer operations from the processor 304. Specifically, data may be communicated over a network utilizing a plurality of protocols associated with a plurality of protocol layers. Further, such protocol layers may include a network layer, one of various layers of the open system interconnection (OSI) model.
By way of background the OSI model includes the following seven layers of Table 1. Note that the layers toward the top of the list in Table 1 are considered upper level protocol layers while the layers toward the bottom of the list in Table 1 are considered lower level protocol layers.
Further in use, processing associated with the communicating may be offloaded, at least in part. Such offloaded processing may involve at least one protocol associated with at least one of the protocol layers situated at or above the network layer.
Just by way of example, in one embodiment, the offloaded processing may involve an Internet small computer system interface (iSCSI) protocol. Further, the offloaded processing may include integrity checking such as cyclical redundancy checking (CRC), whereby the offloaded processing includes inserting a digest into the communicated data. More information regarding such exemplary embodiment of offloaded processing, as well as others, will be set forth hereinafter in greater detail during reference to subsequent figures.
In another embodiment, the offloaded processing may include direct data placement (DDP). Still yet, the offloaded processing may involve a remote direct memory access (RDMA) protocol. Even still, the offloaded processing may involve an insertion of markers into a data stream including the communicated data. While various examples of offloaded processing have been mentioned, it should be strongly noted that absolutely any processing may be offloaded which involves at least one protocol associated with at least one of the protocol layers situated at or above the aforementioned network layer (e.g. layers 3-7, etc., for example).
It should further be noted that the aforementioned offloading is performed statelessly. In the context of the present description, stateless refers to the fact that there is substantially no use of any information regarding previous interactions during the course of at least a portion (e.g. a portion, all, etc.) of the aforementioned offloading of the upper level protocol operations. In one embodiment, for example, use of any state information associated with a data list [e.g. a scatter-gather list (SGL), memory descriptor list (MDL), see data list 106 of
More illustrative information will now be set forth regarding various optional architectures and features with which the foregoing offloading technique which may or may not be implemented, as desired. It should be strongly noted that the following information is set forth for illustrative purposes and should not be construed as limiting in any manner. Any of the following features may be optionally incorporated with or without the exclusion of other features described.
For example,
As shown, the data structure 400 may take the form of an iSCSI protocol data unit (PDU). Such data structure 400 may include a header section 402 and a data section 404. Each of such sections 402, 404 may have a CRC digest 406, optionally appended thereto.
In one embodiment, the header section 402 may include 48 bytes in addition to any extended header information (with a maximum of 1020 bytes). Further, the data section 404 may include up to 16 Mbytes, and the CRC digest 406 may include 4 bytes. In use, the CRC digest 406 may be conditionally appended to the various sections of the data structure 400 by a transport offload engine (e.g. see transport offload engine 312 of
In one embodiment, the present method 500 may be implemented in the context of a transport offload engine (e.g. see, for example, the transport offload engine 312 of
As shown in
In one embodiment, such CRC may be carried out by applying a 16- or 32-bit polynomial to data that is to be transmitted and appending the resulting CRC value in a digest (e.g. in the manner shown in
In one embodiment, the CRC digest may be conditionally inserted with the transmitted data based on a predetermined bit. For example, an enable bit may be used during decision 502 to specify whether or not to insert the CRC digest at the end of a sending operation.
Next, it is determined in decision 506 as to whether the processing is at the end of a transmission frame. If so, the CRC digest is appended. Note operation 508. Thereafter, a CRC counter is reset in operation 510, so that operation may continue for subsequent data to be transmitted.
It should be noted that retransmission of any data may use the same technique. Further, such retransmission may require that, if the data retransmitted lies in the middle of a PDU section (e.g. see sections 402, 404 of
As shown, it is first determined whether the offloading of CRC is enabled. Note decision 602. As mentioned previously, an un-extended header section has 48 bytes (e.g. see header section 402 of
To this end, if it is determined in decision 602 that offloading of CRC is not enabled, only 48 bytes are posted, or allocated. See operation 604. Further, the CRC digest that is appended, if any, may be removed. Of course, if an extended header is received, additional memory may be posted.
On the other hand, if it is determined in decision 602 that offloading of CRC is enabled, 52 bytes are posted. Note operation 606. Thus, after operation 606, at least a portion of the CRC processing may be offloaded utilizing a transport offload engine (e.g. see, for example, the transport offload engine 312 of
First, it is determined in decision 608 as to whether the CRC digest is contained in the last 4 bytes of the posted buffer, or whether the CRC digest is contained in the 4 bytes following the posted buffer (in which case the CRC digest is removed from the data stream). In either case, the CRC digest is retrieved accordingly in respective operations 610 and 612. In one embodiment, a first bit may be used to indicate to the transport offload engine the appropriate location of the CRC digest.
Next, it is determined whether the CRC value should be reset in operation 614. If so, the CRC value may be reset in operation 616. In one embodiment, a second bit may be used to indicate whether the CRC value should be reset before storing data in the posted buffer. Finally, the data is placed in the posted buffer in operation 618.
In one embodiment, the foregoing method 600 may operate in two states, synchronized or unsynchronized. In a synchronized state, the method 600 operates in an orderly fashion, checking the CRC digests of the header and data sections, and splitting data up into header sections and data sections. On the other hand, in an unsynchronized state, the method 600 places data into legacy buffers. An unsynchronized state may be caused by incoming data arriving out-of-order. Some transport offload engines expect data to arrive in sequence. However, once an out-of-sequence condition occurs, the pre-posted buffers may be flushed and the data may be placed into legacy buffers. The connection can also become unsynchronized if there are no buffers pre-posted to the connection, and a buffer is requested, but the processor does not respond within a certain timeframe.
As shown, a header section of a first PDU H1 is received and inserted into a pre-posted 48 byte header buffer, after which it is indicated to a processor. Subsequently, a data section of the first PDU D1 is received and inserted into a pre-posted buffer, after which the processor interrogates the associated header section H1 and posts down an appropriate buffer. Thereafter, operation continues similarly, as noted in
As shown, a first data stream 802 is shown without markers [e.g. fixed interval markers (FIMs), etc.]. Further, a second data stream 804 is shown with markers inserted. Finally, a third data stream 806 is shown with the associated markers removed.
Use of such markers provides a framing mechanism that iSCSI may utilize to periodically point to a subsequent PDU in a TCP data stream. The markers are primarily used by a transport offload engine (e.g. see transport offload engine 312 of
The interval with which the markers are inserted may be negotiated, but fixed after negotiation. On an outgoing data stream, the markers are inserted as the data is being transmitted, after any CRC values are calculated. When the data is received, the markers are removed before any CRC calculation or data processing.
A marker includes two pointers indicating an offset to the header section of the next PDU. Each marker may be 8 bytes in length and contains two 32 bit offsets. Each 32 bit offset may include a copy of the other and may indicate how many bytes to skip in the data stream in order to find the header section of the next PDU. The marker may use the two copies of the pointer so that the marker spanning a TCP segment boundary would have at least one valid copy in one TCP segment.
In one embodiment, the markers may be inserted into the data stream at 2n intervals. Fixing the markers to a 2n interval may reduce a complexity of marker insertion by allowing markers to be inserted and deleted in out-of-order packets, and retransmitted with ease. In one embodiment, it may further be possible for a different marker interval to be used for different directions of flow, etc.
With knowledge of initial sequence numbers of a TCP connection on which markers are to operate (using the 2n interval), only two pieces of information are required, namely a marker mask and the aforementioned marker offset. With these two pieces of information, any packet can be analyzed to determine the offset into that packet where a marker exists (or needs to be inserted).
The initial sequence number may be used to generate the marker offset. Further, the marker mask may be generated by the selected 2n interval chosen. The minimum marker interval, in one embodiment, may be 128 bytes while the maximum interval may be 512K bytes.
To generate a marker in a stateless manner, the sequence number of the current TCP segment being generated can be made available, along with the number of bytes until the header section of the next PDU, and the marker offset and mask. To remove markers, the starting sequence number of the incoming TCP segment should be known, along with the marker offset and mask.
If the current sequence number (CSN) of an octet being processed matches the following formula (see Formula #1), the marker region may be found where the marker units needs to place or remove a marker octet.
((CSN & Marker Mask)>=Marker Offset)& ((CSN & Marker Mask)<=Marker Offset+8) Formula #1
To this end, stateless marker insertion may be provided during transmission. While markers may be of little benefit upon receipt in some embodiments, such markers may be used in determining where a next PDU starts when encountering a defective CRC value in a header section of a PDU that also contains a defective length. A marker, in such embodiment, may allow easier error recovery by determining the location of the next PDU.
If the markers are used during receipt, the marker mask and offset may be programmed into a connection table database. The incoming TCP sequence number of the offloaded connection may then be used to extract the marker, removing it from any data indicated to the processor. The marker itself may be indicated in a receive descriptor.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
The present application is a continuation-in-part of application Ser. Nos. 10/742,352 and 10/741,681 now U.S. Pat. No. 7,260,631 filed Dec. 19, 2003, which are each incorporated herein by reference in their entirety for all purposes. The present application further claims priority from a provisional application filed Oct. 11, 2005, and which bears application Ser. No. 60/725,947, which is also incorporated herein by reference in its entirety for all purpose.
Number | Name | Date | Kind |
---|---|---|---|
5355453 | Row et al. | Oct 1994 | A |
5937169 | Connery et al. | Aug 1999 | A |
6034963 | Minami et al. | Mar 2000 | A |
6141705 | Anand et al. | Oct 2000 | A |
6226680 | Boucher et al. | May 2001 | B1 |
6247060 | Boucher et al. | Jun 2001 | B1 |
6310897 | Watanabe et al. | Oct 2001 | B1 |
6330659 | Poff et al. | Dec 2001 | B1 |
6427171 | Craft et al. | Jul 2002 | B1 |
6647528 | Collette et al. | Nov 2003 | B1 |
6683883 | Czeiger et al. | Jan 2004 | B1 |
6765901 | Johnson et al. | Jul 2004 | B1 |
6978457 | Johl et al. | Dec 2005 | B1 |
6983357 | Poff et al. | Jan 2006 | B2 |
6993644 | Anand et al. | Jan 2006 | B2 |
6996070 | Starr et al. | Feb 2006 | B2 |
7039717 | Johnson | May 2006 | B2 |
7089326 | Boucher et al. | Aug 2006 | B2 |
7124205 | Craft et al. | Oct 2006 | B2 |
7200716 | Aiello | Apr 2007 | B1 |
7206872 | Chen | Apr 2007 | B2 |
7249306 | Chen | Jul 2007 | B2 |
7260631 | Johnson et al. | Aug 2007 | B1 |
7287092 | Sharp | Oct 2007 | B2 |
7287101 | Futral et al. | Oct 2007 | B2 |
7346701 | Elzur et al. | Mar 2008 | B2 |
7363382 | Bakke et al. | Apr 2008 | B1 |
7379475 | Minami et al. | May 2008 | B2 |
7480312 | Ossman | Jan 2009 | B2 |
7535913 | Minami et al. | May 2009 | B2 |
7584260 | Craft et al. | Sep 2009 | B2 |
7620726 | Craft et al. | Nov 2009 | B2 |
7627001 | Craft et al. | Dec 2009 | B2 |
7761608 | Rohde et al. | Jul 2010 | B2 |
7844743 | Craft et al. | Nov 2010 | B2 |
7921240 | Zur et al. | Apr 2011 | B2 |
8065439 | Johnson et al. | Nov 2011 | B1 |
8180928 | Elzur et al. | May 2012 | B2 |
20010033568 | Spooner | Oct 2001 | A1 |
20020031123 | Watanabe et al. | Mar 2002 | A1 |
20020161919 | Boucher et al. | Oct 2002 | A1 |
20030023933 | Duncan | Jan 2003 | A1 |
20030058870 | Mizrachi et al. | Mar 2003 | A1 |
20030058938 | Francois et al. | Mar 2003 | A1 |
20030165160 | Minami et al. | Sep 2003 | A1 |
20040030770 | Pandya | Feb 2004 | A1 |
20040037319 | Pandya | Feb 2004 | A1 |
20040042487 | Ossman | Mar 2004 | A1 |
20040044798 | Elzur et al. | Mar 2004 | A1 |
20040054814 | McDaniel | Mar 2004 | A1 |
20040062267 | Minami et al. | Apr 2004 | A1 |
20040073716 | Boom et al. | Apr 2004 | A1 |
20040078462 | Philbrick et al. | Apr 2004 | A1 |
20040093411 | Elzur et al. | May 2004 | A1 |
20040125806 | Barzilai et al. | Jul 2004 | A1 |
20040243723 | Davis et al. | Dec 2004 | A1 |
20040267967 | Sarangam et al. | Dec 2004 | A1 |
20050021874 | Georgiou et al. | Jan 2005 | A1 |
20050076287 | Mantong | Apr 2005 | A1 |
20050138180 | Minami et al. | Jun 2005 | A1 |
20050149632 | Minami et al. | Jul 2005 | A1 |
20060015651 | Freimuth et al. | Jan 2006 | A1 |
20060015655 | Zur et al. | Jan 2006 | A1 |
20060047904 | Rohde et al. | Mar 2006 | A1 |
20060098653 | Adams et al. | May 2006 | A1 |
20070062245 | Fuller et al. | Mar 2007 | A1 |
Entry |
---|
Bob Russell; iSCSI: Towards a more effective PDU format; Mar. 15, 2001; http://www.pdl.cmu.edu/mailinglists/ips/mail/msg03752.html. |
Microsoft; Microsoft Windows Scalable Networking Initiative; Apr. 13, 2004; http://74.125.47.132/search?q=cache:jxy-ZOwFMQMJ:download. microsoft.com/download/5/b/5/5b5bec17-ea71-4653-9539-204a672f11cf/scale.doc+stateless+offload&cd=1&hl=en&ct=clnk&gl=us&client=firefox-a. |
M. Krueger et al., “Small Computer Systems Interface protocol over the Internet (iSCSI) Requirements and Design Considerations” The Internet Society (2002). |
D. Sheinwald et al., “Internet Protocol Small Computer System Interface (iSCSI) Cyclic Redundancy Check (CRC)/Checksum Considerations” The Internet Society (2002). |
J. Satran et al., Internet Small Computer Systems Interface (iSCSI) The Internet Society (2003). |
M. Bakke et al., “Internet Small Computer Systems Interface (iSCSI) Naming and Discovery” The Internet Society (2004). |
Non-Final Office Action from U.S. Appl. No. 11/625,137, dated Jul. 22, 2008. |
Non-Final Office Action from U.S. Appl. No. 11/625,137, dated Feb. 3, 2010. |
Final Office Action from U.S. Appl. No. 11/625,137, dated Feb. 18, 2009. |
Final Office Action from U.S. Appl. No. 11/625,137, dated Jul. 26, 2010. |
Non-Final Office Action from U.S. Appl. No. 11/625,137, dated Mar. 31, 2008. |
Notice of Allowance from U.S. Appl. No. 10/742,352, dated Aug. 22, 2011. |
Number | Date | Country | |
---|---|---|---|
60725947 | Oct 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10742352 | Dec 2003 | US |
Child | 11407322 | US | |
Parent | 10741681 | Dec 2003 | US |
Child | 10742352 | US |