The present invention relates generally to network communications, and more particularly, to an apparatus and method for performing fast Fibre Channel write operations over relatively high latency networks.
With the increasing popularity of Internet commerce and network centric computing, businesses and other organizations are becoming more and more reliant on information. To handle all of this data, storage area networks or SANs have become very popular. A SAN typically includes a number of storage devices, a plurality of Hosts, and a number of Switches arranged in a Switching Fabric that connects the storage devices and the Hosts.
Most SANs rely on the Fibre Channel protocol for communication within the Fabric. For a detailed explanation of the Fibre Channel protocol and Fibre Channel Switching Fabrics and Services, see the Fibre Channel Framing and Signaling Standard, Rev 1.90, International Committee for Information Technology Standards (INCITS), Apr. 9, 2003, and the Fibre Channel Switch Fabric—2, Rev. 5.4, INCITS, Jun. 26, 2001, and the Fibre Channel Generic Services—3, Rev. 7.01, INCITS, Nov. 28, 2000, all incorporated by reference herein for all purposes.
The infrastructure of many networks often includes multiple types of link level transports. For example, the communication network of an international corporation may have local SANs in their New York, Silicon Valley and Tokyo offices respectively. However, since maintaining a SAN across long distances is expensive, the organization may rely on the Internet Protocol (IP) over another inter-SAN link such as Gigabit Ethernet, SONET, ATM, wave division multiplexing, etc. to connect the SANs.
Within a typical SAN with Fibre Channel Inter-Switch Link (ISLs), the access time between a Host and a storage device (i.e., a target) is typically very fast. The speed of a Fibre Channel link is such that the performance or access time across multiple switches in close to the ideal, i.e., the Host and the target device are attached to the same switch. In other words, even if multiple Switches need to be spanned to complete the access, the speed of the individual Switches is so fast that the latency time is typically very small. In a write operation for example, packets of data can be transferred across the switches of the SAN without delay as the latency between the ISLs is very small.
In situations with a high latency inter-SAN link, however, the access time of a write operation between a Host in one SAN and a storage device in a remote SAN will suffer or deteriorate. The latency may result from the speed of the link, the distance between the Host and target, congestion on the inter-SAN link, etc. For example, when IP is used to connect two Fibre Channel SANs, the latency across the IP portion of the network is typically slow relative to an access within the SANs.
With a SCSI write command, the Host will issue a write (Wr) command defining a certain amount of data to be written. The command travels across the network, from switch to switch, until it reaches the target. In reply, the target responds with a Xfer ready command which defines the amount of data which the target may accept. When the Host receives the Xfer ready command, it transfers the data to be written in units up to the maximum transfer unit (MTU) of the network. In most Fibre Channel SANS, the MTU is approximately 2K bytes per transfer. Thus if the amount of data to be written is 8K bytes, then a total of four transfers are required. When in this case all four data transfers are received, the target generates a status success command. If for some reason the Host does not receive the status command after a predetermined period of time, it is assumed that a problem with the write operation occurred. The Host may subsequently issue another write command.
The time required to complete a SCSI write operation can be significant over a high latency inter-SAN network. A significant amount of time may lapse between the time the initial Wr command is issued and the Xfer ready is received by the Host due to the slow performance of the high latency inter-SAN network. During this time, the Host is idle and must wait until before issuing the data transfer commands to transfer the data to the Host. The target is also idle until it receives the data from the initiating Host. In other words, the initiating Host is idle until it receives the Xfer ready and the target is idle after issuing the Xfer ready until it receives the data.
An apparatus and method improving the performance of a SCSI write over a relatively high latency network is therefore needed.
To achieve the foregoing and other objectives and in accordance with the purpose of the present invention, an apparatus and method to improve the performance of a SCSI write over a high latency network is provided. The apparatus includes a first Switch close to the initiator in a first SAN and a second Switch close to the target in a second SAN. In various embodiments, the two Switches are border switches connecting their respective SANs to a relatively high latency network between the two SANs. In addition, the initiator can be either directly connected or indirectly connected to the first Switch in the first SAN. The target can also be either directly or indirectly connected to the second Switch in the second SAN. During operation, the method includes the first Switch sending Transfer Ready (Xfr_rdy) frame(s) based on buffer availability to the initiating Host in response to a SCSI Write command from the Host directed to the target. The first and second Switches then coordinate with one another by sending Transfer Ready commands to each other independent of the target's knowledge. The second switch buffers the data received from the Host until the target indicates it is ready to receive the data. Since the Switches send frames to the initiating Host independent of the target, the Switches manipulate the OX_ID and RX_ID fields in the Fibre Channel header of the various commands associated with the SCSI Write. The OX_ID and RX_ID fields are manipulated so as to trap the commands and so that the Switches can keep track of the various commands associated with the SCSI write.
The features of the present invention may best be understood by reference to the following description of the presently preferred embodiments together with the accompanying drawings.
Like reference numbers refer to like elements in the figures.
Referring to
The present invention is related to a SCSI write operation that improves or reduces the time required to perform a write operation between the initiating Host H1 and target storage device such T1 over a high latency network such as the inter-SAN network 10. The Intelligent Ports (I-ports) of the two switches SW1 and SW2 act as an intermediary between the Host H1 and the storage device T1. The transfer size of a data transfer during a write operation is negotiated before any write operations are performed. Initially, the Host H1 defines (i.e., specifies the amount of data it wishes to write) the transfer size for a write command. The switch SW1 indicates the amount of data it is ready to receive based on (i) the data size specified in the Write command and (ii) the amount of buffer space it has. The I-port on SW1 responds with a Transfer Ready (Xfer) which indicates the maximum size of a data transfer. The I-port on the switch SW2 similarly receives the Xfer ready which defines the maximum size of the data transfer. In the aforementioned embodiment, the ports involved are Intelligent Ports (I-Ports) to which the initiator and target are attached. In such a case, the I-port is typically a FC port also sometimes referred to as an Fx_Port. In an alternative embodiment, the target and the initiating Host are not directly connected to the Switches in question. In such a case, the I-port can be either an IP-port or an I-port.
In general, the fast write operation is performed after the initial negotiation by the following sequence: (i) when the Host Hi generates a SCSI write command defining the target T1, the I-port of Switch SW1 traps the command; (ii) the switch SW1 forwards the command to the target; (iii) the switch SW1 also issues a Transfer Ready command to the Host H1 on behalf of or as a proxy for the target T1; (iv) the Host H1 sends data of the amount indicated by the Transfer Ready amount to the target T1 in response to the received Transfer Ready command. The data may sequenced or broken up into frames based on the maximum transfer unit (MTU) of the network; (v) the I-port of the switch SW1 receives the data frames and forwards it to the target T1; (vi) the previous two steps are repeated until all the data is transferred to the target; and (vii) after all the data is transferred, the switch SW1 waits for either a success or error status command from the target T1. Upon receipt, the switch SW1 forwards the status command back to the Host H1. If the target returns an error command, no attempt is made by the I-port to correct the error. In should be noted that in an alternative embodiment, the above sequence can be performed by switching the order of steps (ii) and (iii) as defined above.
The I-port of the second switch SW2 operates essentially the same as switch SW1 except that it buffers the received data frames until receiving a Transfer Ready command from the target T1. Specifically, the I-port of switch SW2: (i) forwards the SCSI write command received from switch SW1 to the target; (ii) issues a Transfer Ready command to the switch SW1 as a proxy for the target T1; (iii) buffers the data frames received from the switch SW1; (iv) transfers the data frames to the target T1 when a Transfer Ready command is received from the target T1; and (v) after all the data is transferred, the switch SW2 waits for either a success or error status command from the target T1. Upon receipt, the switch SW2 forwards the status command back to switch SW1. If the target returns an error command, no attempt is made by the I-port of switch SW2 to correct the error.
To identify an FC device, Fibre Channel Identifiers (FCIDs) are used. A transaction between an FC host and a target is referred to as an exchange. In a typical Fibre Channel network, there are many Hosts and targets. Each Host may initiate many read and/or write operations. For the hosts and targets within a network to keep track of the various transactions between each other, two fields are available in the Fibre Channel header for all SCSI Command, Data, Response, and Transfer Ready frames. The first field is called the Originator Exchange Identifier or OX_ID. The second field is called the Receiver Exchange Identifier or RX_ID. The Host relies on the OX_ID to maintain its local state and the target relies on the RX_ID to maintain its local state. In both cases, the OX_ID and RX_ID are typically 16 bits wide.
The OX_ID and RX_ID are typically used by the initiating host and target of a transaction respectively to keep track of the ongoing transactions between the two entities. In general, the switches in a SAN do not keep track of such transactions. With the present invention, however, the switches SW1 and SW2 are acting as intermediaries between the initiating Host and the target T1. The switches SW1 and SW2 therefore also use the OX_ID and RX_ID values to track exchanges between the Host H1 and the target T1.
Referring to
Referring to
Referring to
It also should be noted that the Switches SW1 and SW2 “trap” Extended Link Service or ELS frames (state management frames) that contain the original OX_ID and RX_ID in the payload since the switches change the original values of OX_ID and RX_ID. ELS frames are used by the initiator H1 and target T1 to query and manage state transactions, such as ABTS and REC ELS frames.
Referring to
In an alternative embodiment, it is possible for switch SW2 to grant more buffer space than requested by SW1. Based on the previous example, SW2 could grant 15 MB instead of 10 MB. The remaining unutilized buffers are used for subsequent Write commands from the Host H1. For example, consider a second Write command for say 1 MB from the Host H1. With this embodiment, SW1 would send a Xfr_Rdy for 1 MB to the Host H1 and send the command to the target via SW2 as stated in paragraph 0021. When the Host H1 sends data, SW1, instead of waiting for Xrdy_Rdy to come from SW2 before sending data, now immediately starts transferring the data to SW2. It can do this because SW2 had previously granted additional buffers to SW1 via the last Xrdy_Rdy command. The basic idea is that the data can be transferred from SW1 to SW2 for subsequent (after the first) write commands without waiting for a specific Xrdy_Rdy from SW2 pertaining to the subsequent write.
In various embodiments of the invention, a number of alternatives may take place in situations where the switch SW1 has no available buffer space. In one embodiment, the Host H1 receives a busy status signal and the Host must re-try the write transaction; In a second embodiment, the command is placed in a pending command list. Eventually, the switch SW1 responds to the write but only after the processing the preceding transactions on the list. In yet another embodiment, the switch SW1 can simply forward the Write command to the target.
In yet another embodiment, the switches SW1 and SW2 are configured to set the Burst Length and Relative Offset fields in the Transfer Ready frame both to zero (0). This enables the other switches to differentiate if the Transfer Ready command was generated by the target switch or the target itself. The initiating switch and Host realizes that the target switch issued the Transfer Ready when both fields are set to zero (0) since the target itself would never set both to zero for a given transaction. If only one or neither of the fields are set to zero, then the initiating switch SW1 and Host realizes the Transfer Ready was generated by the target.
In data networks, data frames are lost on occasion. In various embodiments of the present invention, an one of a number of different buffer credit recovery schemes may be used.
Referring to
When a Write command is received at the initiating switch SW1 that specifies a tuple to be trapped, the switch SW1 forwards it to the processor 50. In reply, the processor 50 is responsible for forwarding the original frame on to the original destination and generating a Transfer Ready command to the initiating Host H1. The Transfer Ready command defines a data size determined by the existing buffer space at the switch SW1. The processor also defines the locally generated RX_ID which is used for all subsequent communication between the switch SW1 and the initiating Host H1. When the data frame is received from the Host H1 at the I-port of the switch SW1, the frame is trapped. The processor 50 in turn instructs the switch SW1 to transmit the data frames up to the negotiated size without waiting to receive a Transfer Ready command. Any remaining claims are buffered. Similarly, at the I-port of the switch SW2, any data frames associated with this exchange are trapped and buffered. When a Transfer Ready is received from the target T1, the switch SW2 transfers the buffered data.
Transfer Ready frames involving this exchange received by either switch SW1 and SW2 are also trapped and forwarded to the processor 50. The target switch SW2 uses the Transfer Ready frame to start the transfer of data to the target. The initiating switch SW1 on the other hand, uses the TransferReady command to transmit more data frames toward the target. In either case, the I-ports of both switches SW1 and SW2 modify the RX_ID's.
According to one embodiment, the Fibre Channel cyclical redundancy check or CRC included in the Fibre Channel header 20 is recomputed to protect rewrite operations. The CRC protects FC payload and FC header from corruption while traversing various parts of a Fiber Channel SAN. With the present invention, the RX_ID and OX_ID fields are modified, the FC headers need to be protected and the CRC recomputed to protect the rewrites from any corruption.
Although only a few embodiments of the present invention have been described in detail, it should be understood that the present invention may be embodied in many other specific forms without departing from the spirit or scope of the invention. Therefore, the present examples are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein but may be modified within the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5617421 | Chin et al. | Apr 1997 | A |
5740171 | Mazzola et al. | Apr 1998 | A |
5742604 | Edsall et al. | Apr 1998 | A |
5764636 | Edsall | Jun 1998 | A |
5809285 | Hilland | Sep 1998 | A |
5999930 | Wolff | Dec 1999 | A |
6035105 | McCloghrie et al. | Mar 2000 | A |
6101497 | Ofek | Aug 2000 | A |
6148414 | Brown et al. | Nov 2000 | A |
6188694 | Fine et al. | Feb 2001 | B1 |
6202135 | Kedem et al. | Mar 2001 | B1 |
6208649 | Kloth | Mar 2001 | B1 |
6209059 | Ofer et al. | Mar 2001 | B1 |
6219699 | McCloghrie et al. | Apr 2001 | B1 |
6226771 | Hilla et al. | May 2001 | B1 |
6260120 | Blumenau et al. | Jul 2001 | B1 |
6266705 | Ullum et al. | Jul 2001 | B1 |
6269381 | St. Pierre et al. | Jul 2001 | B1 |
6269431 | Dunham | Jul 2001 | B1 |
6295575 | Blumenau et al. | Sep 2001 | B1 |
6400730 | Latif et al. | Jun 2002 | B1 |
6542961 | Matsunami et al. | Apr 2003 | B1 |
6683883 | Czeiger et al. | Jan 2004 | B1 |
6701410 | Matsunami et al. | Mar 2004 | B2 |
6772231 | Reuter et al. | Aug 2004 | B2 |
6847647 | Wrenn | Jan 2005 | B1 |
6850955 | Sonoda et al. | Feb 2005 | B2 |
6876656 | Brewer et al. | Apr 2005 | B2 |
6880062 | Ibrahim et al. | Apr 2005 | B1 |
6898670 | Nahum | May 2005 | B2 |
6907419 | Pesola et al. | Jun 2005 | B1 |
6952734 | Gunlock et al. | Oct 2005 | B1 |
6978300 | Beukema et al. | Dec 2005 | B1 |
6983303 | Pellegrino et al. | Jan 2006 | B2 |
6986015 | Testardi | Jan 2006 | B2 |
7200144 | Terrell et al. | Apr 2007 | B2 |
7237045 | Beckmann et al. | Jun 2007 | B2 |
7269168 | Roy et al. | Sep 2007 | B2 |
7277431 | Walter et al. | Oct 2007 | B2 |
7353305 | Pangal et al. | Apr 2008 | B2 |
7433948 | Edsall et al. | Oct 2008 | B2 |
7460528 | Chamdani et al. | Dec 2008 | B1 |
7533256 | Walter et al. | May 2009 | B2 |
7548975 | Kumar et al. | Jun 2009 | B2 |
20010037406 | Philbrick et al. | Oct 2001 | A1 |
20020053009 | Selkirk et al. | May 2002 | A1 |
20020083120 | Soltis | Jun 2002 | A1 |
20020095547 | Watanabe et al. | Jul 2002 | A1 |
20020103889 | Markson et al. | Aug 2002 | A1 |
20020103943 | Lo et al. | Aug 2002 | A1 |
20020112113 | Karpoff et al. | Aug 2002 | A1 |
20020120741 | Webb et al. | Aug 2002 | A1 |
20020138675 | Mann | Sep 2002 | A1 |
20030026267 | Oberman | Feb 2003 | A1 |
20030131105 | Czeiger et al. | Jul 2003 | A1 |
20030131182 | Kumar et al. | Jul 2003 | A1 |
20030140210 | Testardi | Jul 2003 | A1 |
20030159058 | Eguchi | Aug 2003 | A1 |
20030185154 | Mullendore et al. | Oct 2003 | A1 |
20030210686 | Terrell et al. | Nov 2003 | A1 |
20040028043 | Maveli et al. | Feb 2004 | A1 |
20040030857 | Krakirian et al. | Feb 2004 | A1 |
20040039939 | Cox et al. | Feb 2004 | A1 |
20040057389 | Klotz et al. | Mar 2004 | A1 |
20040088574 | Walter et al. | May 2004 | A1 |
20050050211 | Kaul et al. | Mar 2005 | A1 |
20050076113 | Klotz et al. | Apr 2005 | A1 |
20050091426 | Horn et al. | Apr 2005 | A1 |
20050117562 | Wrenn | Jun 2005 | A1 |
20050125418 | Brewer et al. | Jun 2005 | A1 |
20050192967 | Basavaiah et al. | Sep 2005 | A1 |
20060274755 | Brewer et al. | Dec 2006 | A1 |
20080320134 | Edsall et al. | Dec 2008 | A1 |
20090185678 | Walter et al. | Jul 2009 | A1 |
Number | Date | Country |
---|---|---|
1008433 | Jan 1989 | JP |
02-144718 | Jun 1990 | JP |
06-195177 | Jul 1994 | JP |
07-311661 | Nov 1995 | JP |
9198308 | Jul 1997 | JP |
2000-029636 | Jan 2000 | JP |
2000-242434 | Sep 2000 | JP |
2001-523856 | Nov 2001 | JP |
WO 0052576 | Sep 2000 | WO |
0180013 | Oct 2001 | WO |
WO 03060688 | Jul 2003 | WO |
WO 03062979 | Jul 2003 | WO |
03084106 | Sep 2003 | WO |
WO 2005055497 | Jun 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20050117522 A1 | Jun 2005 | US |