The present application is related to filed U.S. patent application Ser. No. 11/015,383, titled Tape Acceleration by Manali Nambiar, Arpakorn Boonkongchuen, Murali Busavaiah, and Stephen Degroote and filed on Dec. 15, 2004, the entirety of which is incorporated by reference herein for all purposes.
1. Field of the Invention
The present invention generally relates to Storage Area Networks (SANs). More specifically, the present invention provides techniques and mechanisms for improving data reads between hosts and end devices connected to SANs.
2. Description of Related Art
Storage Area Networks (SANs) provide an effective mechanism for maintaining and managing large amounts of data. A host can obtain data from a device such as a tape device through a fibre channel fabric having a number of fibre channel switches. However, storage area networks are often limited in geographic scope. Fibre channel fabrics in different geographic areas or separate fibre channel fabrics often have limited ability to interoperate.
Protocols such as Fibre Channel over the Internet Protocol (FCIP) allow devices on different fibre channel fabrics to communicate. FCIP is one protocol that allows the creation of a wide area network (WAN) connecting hosts and tape resources. For example, two separate fibre channel fabrics may be connected through an IP network. A host device on a first fibre channel fabric can read data from a device on a second fibre channel fabric through the IP network. However, reading data over an IP network from a separate fibre channel network can often be inefficient. Round trip times for commands and data can often introduce high latency into various storage applications, such as tape backup restore applications.
Consequently, it is desirable to provide improved techniques for efficiently and effectively reading data from remote tape device connected to a host through fibre channel switches in a wide area network.
According to the present invention, methods and apparatus are provided improving reading of a remote tape device by a host through multiple fibre channel switches. A fibre channel switch preemptively sends read requests to a tape device before read requests are received from a host. Flow control, buffer management, and error handling mechanisms are implemented to allow accelerated tape back up restoration while working to prevent buffer overflow and underflow.
In one embodiment, a technique implementing read acceleration is provided. A first read command is received from a host for a first data block on a tape device. The first read command is received at a first fibre channel switch. The first read command is forwarded to the tape device. The first data block is received from the tape device. The first data block is associated with the first read command. A second data block is received from the tape device. The second data block is associated with a second read command. The second data block is received after an anticipatory read of the tape device. The second read command is received from the host for the second data block on the tape device. The second data block is provided to the host before forwarding the second read command to the tape device.
In another embodiment, a fibre channel switch for implementing read acceleration is provided. The fibre channel switch includes an Internet Protocol (IP) interface and a fibre channel interface. The Internet Protocol (IP) interface is configured to receive a first read command and a second read command for a first data block and a second data block on a tape device. The IP interface is configured with a fibre channel over IP tunnel. The fibre channel interface is configured to forward the first read command and send an anticipatory read command to the tape device before the second read command is received and receive the first data block and the second data block from the tape device.
A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings.
The invention may best be understood by reference to the following description taken in conjunction with the accompanying drawings, which are illustrative of specific embodiments of the present invention.
Reference will now be made in detail to some specific examples of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims.
For example, the techniques of the present invention will be described in the context of storage area networks and tape devices. However, it should be noted that the techniques of the present invention can be applied to a variety of different standards and variations to storage area networks and tape devices. Similarly, a server is described throughout. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
Furthermore, techniques and mechanisms of the present invention will sometimes be described in singular form for clarity. However, it should be noted that some embodiments include multiple iterations of a technique or multiple instantiations of a mechanism unless noted otherwise. For example, a processor is used in a variety of contexts. However, it will be appreciated that multiple processors can also be used while remaining within the scope of the present invention unless otherwise noted. In addition, a block of data is described in several contexts. A block of data may include multiple data blocks or data sets.
Hosts often need to read data from remote tape devices. In one example, the host and the remote tape device area connected over a wide area network (WAN) that introduces substantial latency into communications between the devices. Furthermore, a variety of systems require that a host send a command to read one or more blocks of data only after a response to a prior command has been received. Hence the latency introduced by a wide area network (WAN) can make tape backup restore applications and bulk data transfer operations highly inefficient for such systems.
Consequently, the techniques of the present invention provide mechanisms for performing read acceleration or read aheads. In one example, a host is connected to a tape device through at least a host end fibre channel switch and a tape device end fibre channel switch. Any switch located near a host is referred to herein as a host end fibre channel switch. Any switch located near a tape device is referred to herein as a tape device end fibre channel switch. Data is anticipatorily read by a tape device end fibre channel switch from a tape device and buffered at a host end fibre channel switch based on received read commands.
A read ahead window or flow control window is maintained to decrease the likelihood of buffer overflow or underflow at a host end fibre channel switch. The read ahead window is used to control the number of read aheads a tape device end fibre channel switch will perform. In one example, the read ahead window size is dynamic. The host end fibre channel switch sends a control message to the tape device end fibre channel switch to increase the window size if read ahead buffered data is not yet available upon receiving a read command from a host. Error handling and rewind mechanisms are also provided to allow efficient operation in the event that non-read commands or errors are received.
In one example, a host 115 sends commands to a tape device 117 through the fibre channel network 151, the Internet protocol network 153, and the fibre channel network 155. According to various embodiments, the fibre channel fabric switch 101 establishes a tunnel through the Internet protocol network 153 with fibre channel fabric switch 103. Fibre channel fabric switches 101 and 103 are referred to herein as tunnel end points.
However, sending commands and data through multiple networks such as fibre channel network 151, an Internet protocol network 153, and a fibre channel network 155 can cause high latency and poor round trip times. A host 115 would not be able to efficiently read data from a tape device 117. In a standard such as SCSI (Small Computer Systems Interface) stream commands for tape devices, only one command can be issued at a time. For each command to complete, the host 115 needs to first receive a response from the tape device 117. For example, in order for a host to issue multiple read commands for multiple blocks of data, the host must wait for the data and status messages from a tape device for the first read command to be received before the host can send a second read command. This wait can cause a delay of at least one round trip time for every command.
In some instances, a fibre channel fabric switch 103 preemptively or anticipatorily sends a read command to a tape device 117 as soon as a status response is returned from a tape device 117 for the previous read. For example, a fibre channel fabric switch 103 can send a read command as soon as it is determined that the host is in streaming mode. Instead of waiting for additional read commands from a host 115, data is anticipatorily read by switch 103 and buffered at a switch 101 so that the switch 101 can rapidly respond to expected subsequent read requests from the host 115.
In one example, flow control problems can occur if a fibre channel fabric switch 103 preemptively sends too much data to switch 101. The additional data may end up getting buffered with the risk of buffer overflow. In this example, it is desirable to limit the number of read commands sent by a switch 103 to prevent buffer overflow. The present invention dynamically adjusts the buffering to account for particular network and device characteristics. Consequently, techniques and mechanisms of the present invention allow for flow control to intelligently monitor the amount of data being read.
Similarly, preemptively or anticipatorily reading data from a tape device 117 can cause a fibre channel fabric switch 103 to have already read data that eventually may not be requested. A tape device would then have to be rewound and a switch 101 buffer would have to be flushed. Errors may also be received while doing anticipatory reads for data which has not yet been read by the host. Consequently, techniques and mechanisms of the present invention provide error handling mechanisms to allow anticipatory reads while accounting for possible error scenarios.
The fibre channel switch 203 in turn sends the data and status messages to the host 201. Because the host 201 is accessing the tape device 207 over a network having less than ideal round trip times, reading a block of data can take a non negligible amount of time. Many storage area networks require that data and status messages be received before a host 201 can issue another read command. Consequently, the read command 221 for another block of data is held until a response to read command 211 is received. Read command 221 is similarly forwarded from host 201 through fibre channel switches 203 and 205 to a tape device 207. The data and status response 223 is returned through switches 205 and 203 to the host 201. The host 201 can then issue another read command 231 for the next block of data to the tape device 207 for data and status messages 233.
Applications such as tape backup restore applications that require a large number of data blocks can suffer performance limitations because of the latency of the IP network. Consequently, the techniques and mechanisms of the present invention provide acceleration of read commands.
According to various embodiments, a host 301 sends a read command 311 to a fibre channel fabric switch 303. According to various embodiments, fibre channel fabric switches 303 and 305 are gateways between fibre channel and IP networks. The fibre channel fabric switches 303 and 305 serve as fibre channel over IP (FCIP) tunneling endpoints. The fibre channel fabric switch 303 forwards the read command to the fibre channel switch 305. The fibre channel switch 305 forwards the command 311 to the tape device. The tape device performs a read on a block of data and returns data along with possibly other messages such as status messages 313. The fibre channel switch 305 forwards the data and status messages 313 to the fibre channel switch 303.
The fibre channel switch 303 in turn sends the data and status messages to the host 301. However, the fibre channel switch 305 determines that read acceleration can be performed and does not wait for subsequent read commands, for example, read command N+1, N+2, or read command N+3 to arrive from a host 301. To perform read acceleration, the fibre channel switch 305 makes anticipatory reads or read aheads of blocks N+1, N+2, and N+3 using read acceleration commands 321, 325, and 329. In one embodiment the fibre channel switch 305 makes anticipatory reads if flow control is not being enforced. The data and status messages for N+1, N+2, and N+3, respectively 323, 327, and 331, are forwarded from fibre channel switch 305 and buffered at fibre channel switch 303. In one embodiment the fibre channel switch 305 determines that flow control is being enforced and so does not send out an anticipatory read N+4 345. According to various embodiments, the fibre channel switch 305 keeps the number of the blocks it has read ahead. In one embodiment, the fibre channel switch 303 keeps the actual block data.
When a read command N+1 341 is received from the host 301 at a fibre channel switch 303, the data is already available and the data N+1 343 is promptly sent to the host 301. The switch 303 also forwards the read command N+1 341 to the fibre channel switch 305. Upon receiving the read command N+1, the fibre channel switch 305 recognizes that the buffer for block N+1 has been cleared and there is no flow control and that an additional block can now be read. The fibre channel switch 305 is free to obtain an additional block using read command N+4 345. The data block N+4 347 is forwarded from the fibre channel switch 305 and buffered at switch 303.
When a read command N+2 351 is received from the host 301 at a fibre channel switch 303, the data is already available and the data N+2 353 is promptly sent to the host 301. The switch 303 also forwards the read command N+2 351 to the fibre channel switch 305. Upon receiving the read command N+2, the fibre channel switch 305 recognizes that the buffer for block N+2 has been cleared and that an additional block can now be read. The fibre channel switch 305 is free to obtain an additional block using read command N+5 355. The data block N+5 357 is forwarded from the fibre channel switch 305 and buffered at switch 303.
According to various embodiments, a host 401 sends a read command 411 to a fibre channel fabric switch 403. According to various embodiments, fibre channel fabric switches 403 and 405 are gateways between fibre channel and IP networks. The fibre channel fabric switches 403 and 405 serve as fibre channel over IP (FCIP) tunneling endpoints. The fibre channel fabric switch 403 forwards the read command to the fibre channel switch 405. The fibre channel switch 405 forwards the command 411 to the tape device. The tape device performs a read on a block of data and returns data along with possibly other messages such as status messages 413. The fibre channel switch 405 forwards the data and status messages 413 to the fibre channel switch 403.
The fibre channel switch 403 in turn sends the data and status messages to the host 401. However, the fibre channel switch 405 determines that read acceleration can be performed and does not wait for subsequent read commands, for example, read command N+1 or read command N+2 to arrive from a host 401. To perform read acceleration, the fibre channel switch 405 makes anticipatory reads or read aheads of blocks N+1 and N+2 using commands read acceleration 415 and 419. The data and status messages for N+1 and N+2, respectively 417 and 421 are forwarded from fibre channel switch 405 and buffered at fibre channel switch 403. The fibre channel switch 405 keeps track of what blocks it has read ahead. According to various embodiments, the fibre channel switch 405 keeps the number of blocks it has read. In one embodiment, the fibre channel switch 403 keeps the actual block data.
When a read command N+1 441 is received from the host 401 at a fibre channel switch 403, the data is already available and the data N+1 443 is promptly sent to the host 401. The switch 403 also forwards the read command N+1 441 to the fibre channel switch 405. Upon receiving the read command N+1, the fibre channel switch 405 recognizes that the buffer for block N+1 has been cleared and that an additional block can now be read. The fibre channel switch 405 is free to obtain an additional block using read command N+3 427. The data block N+3 441 is forwarded from the fibre channel switch 405 and buffered at switch 403. However, before the data block N+3 441 has arrived at a fibre channel switch 403, the switch 403 has already received the read commands for N+2 and N+3, 431 and 435 respectively. The data block N+2 433 was already available. However, the data block N+3 was not yet available, in some cases because of an insufficient buffering window size. Consequently, the fibre channel switch 403 not only forwards the read command N+3 435 but also sends a control message 437 to increase the window size. The fibre channel switch 405 can then increase its read ahead window size to allow for additional buffering.
In some other instances, the fibre channel switch may anticipate read commands inappropriately, and non read commands are received instead. In such a case, anticipatory read commands would have to be rolled back. A tape device would have to be rewound and buffers holding the read ahead data would have to be flushed.
The fibre channel switch 503 in turn sends the data and status messages to the host 501. However, the fibre channel switch 505 determines that read acceleration can be performed and does not wait for subsequent read commands, for example, read command N+1 or read command N+2 to arrive from a host 501. To perform read acceleration, the fibre channel switch 505 makes anticipatory reads or read aheads of blocks N+1 and N+2 using commands read acceleration 515 and 519. The data and status messages for N+1 and N+2, respectively 517 and 521 are forwarded from fibre channel switch 505 and buffered at fibre channel switch 503. The fibre channel switch 505 keeps track of what blocks it has read ahead. According to various embodiments, the fibre channel switch 505 keeps the number blocks it has read. In one embodiment, the fibre channel switch 503 keeps the actual block data.
However, instead of receiving an expected N+1 read command, a non read command 523 is received at fibre channel switch 503. The non read command 523 is forwarded from fibre channel switch 503 to fibre channel switch 505. The fibre channel switch 505 then has to issue a rewind command 525 based on the number of blocks it has read ahead. In one instance, it rewinds the type by two data blocks with rewind command 525. The fibre channel switch 505 also sends a flush control message 527 to flush data blocks N+1 and N+2. Upon receiving a rewind status message 529, the fibre channel switch forwards the non read command 531 to the tape device and the process proceeds without acceleration.
The fibre channel switch 603 in turn sends the data and status messages to the host 601. It should be noted that many of the exchange diagram herein depict particular read commands. However, it should be noted that the techniques of the present invention apply during initial stream processing, mid stream processing, etc. The fibre channel switch 605 determines that read acceleration can be performed and does not wait for subsequent read commands, for example, read command N+1 or read command N+2 to arrive from a host 601. To perform read acceleration, the fibre channel switch 605 makes anticipatory reads or read aheads of blocks N+1 and N+2 using commands read acceleration 615 and 619. Although read acceleration commands are shown herein, the commands themselves may resemble mere read commands. The data and status message for N+1 617 is forwarded from fibre channel switch 605 and buffered at fibre channel switch 603. The fibre channel switch 605 keeps track of what blocks it has read ahead. According to various embodiments, the fibre channel switch 605 keeps the number of blocks it has read. In one embodiment, the fibre channel switch 603 keeps the actual block data. However, the tape device 607 responds to the read command 619 with an error message N+2 621.
According to various embodiments, a fibre channel switch 605 merely forwards the error message to fibre channel switch 603 without attempting another read command 633. In one example, the fibre channel switch 605 waits for the read command N+2 631 from the host 601. The read command N+1 623 is processed at fibre channel switch 603 and handled with read ahead data N+1 625. In one example, a read command N+2 631 is also processed at fibre channel switch 603 and the error message N+2 637 is sent to the host 601. In this manner an error encountered in an anticipatory read is presented to the host. It should be noted that in some instances, this may not happen and the tape may be rewound. In some other examples, the fibre channel switch can also forward the read command N+2 to trigger a read command N+2 without acceleration.
The techniques of the present invention can be implemented on a variety of network devices such as fibre channel switches and routers. In one example, the techniques of the present invention are implemented on the MDS 9000 series of fibre channel switches available from Cisco Systems of San Jose, Calif.
Line cards 803, 805, and 807 can communicate with an active supervisor 811 through interface circuitry 883, 885, and 887 and the backplane 815. According to various embodiments, each line card includes a plurality of ports that can act as either input ports or output ports for communication with external fibre channel network entities 851 and 853. The backplane 815 can provide a communications channel for all traffic between line cards and supervisors. Individual line cards 803 and 807 can also be coupled to external fibre channel network entities 851 and 853 through fibre channel ports 843 and 847.
External fibre channel network entities 851 and 853 can be nodes such as other fibre channel switches, disks, RAIDS, tape libraries, or servers. It should be noted that the switch can support any number of line cards and supervisors. In the embodiment shown, only a single supervisor is connected to the backplane 815 and the single supervisor communicates with many different line cards. The active supervisor 811 may be configured or designed to run a plurality of applications such as routing, domain manager, system manager, and utility applications.
According to one embodiment, the routing application is configured to provide credits to a sender upon recognizing that a frame has been forwarded to a next hop. A utility application can be configured to track the number of buffers and the number of credits used. A domain manager application can be used to assign domains in the fibre channel storage area network. Various supervisor applications may also be configured to provide functionality such as flow control, credit management, and quality of service (QoS) functionality for various fibre channel protocol layers.
According to various embodiments, the switch also includes line cards 875 and 877 with IP interfaces 865 and 867. In one example, the IP port 865 is coupled to an external IP network entity 855. The line cards 875 and 877 can also be coupled to the backplane 815 through interface circuitry 895 and 897.
According to various embodiments, the switch can have a single IP port and a single fibre channel port. In one embodiment, two fibre channel switches used to form an FCIP tunnel each have one fibre channel line card and one IP line card. Each fibre channel line card connects to an external fibre channel network entity and each IP line card connects to a shared IP network.
In addition, although an exemplary switch is described, the above-described embodiments may be implemented in a variety of network devices (e.g., servers) as well as in a variety of mediums. For instance, instructions and data for implementing the above-described invention may be stored on a disk drive, a hard drive, a floppy disk, a server computer, or a remotely networked computer. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
While the invention has been particularly shown and described with reference to specific embodiments thereof, it will be understood by those skilled in the art that changes in the form and details of the disclosed embodiments may be made without departing from the spirit or scope of the invention. For example, embodiments of the present invention may be employed with a variety of network protocols and architectures. It is therefore intended that the invention be interpreted to include all variations and equivalents that fall within the true spirit and scope of the present invention.
Number | Name | Date | Kind |
---|---|---|---|
4423480 | Bauer et al. | Dec 1983 | A |
4428064 | Hempy et al. | Jan 1984 | A |
4435762 | Milligan et al. | Mar 1984 | A |
4932826 | Moy et al. | Jun 1990 | A |
5016277 | Hamilton | May 1991 | A |
5347648 | Stamm et al. | Sep 1994 | A |
5692124 | Holden et al. | Nov 1997 | A |
5758151 | Milligan et al. | May 1998 | A |
5765213 | Ofer | Jun 1998 | A |
5809328 | Nogales et al. | Sep 1998 | A |
5842040 | Hughes et al. | Nov 1998 | A |
5892915 | Duso et al. | Apr 1999 | A |
5930344 | Relyea et al. | Jul 1999 | A |
6026468 | Mase et al. | Feb 2000 | A |
6049546 | Ramakrishnan | Apr 2000 | A |
6070200 | Gates et al. | May 2000 | A |
6141728 | Simionescu et al. | Oct 2000 | A |
6148421 | Hoese et al. | Nov 2000 | A |
6172250 | Lawman et al. | Jan 2001 | B1 |
6178244 | Takeda et al. | Jan 2001 | B1 |
6219728 | Yin | Apr 2001 | B1 |
6317819 | Morton | Nov 2001 | B1 |
6327253 | Frink | Dec 2001 | B1 |
6381665 | Pawlowski | Apr 2002 | B2 |
6449697 | Beardsley et al. | Sep 2002 | B1 |
6507893 | Dawkins et al. | Jan 2003 | B2 |
6570848 | Loughran et al. | May 2003 | B1 |
6625750 | Duso et al. | Sep 2003 | B1 |
6651162 | Levitan et al. | Nov 2003 | B1 |
6658540 | Sicola et al. | Dec 2003 | B1 |
6751758 | Alipui et al. | Jun 2004 | B1 |
6757767 | Kelleher | Jun 2004 | B1 |
6775749 | Mudgett et al. | Aug 2004 | B1 |
6782473 | Park | Aug 2004 | B1 |
6788680 | Perlman et al. | Sep 2004 | B1 |
6791989 | Steinmetz et al. | Sep 2004 | B1 |
6880062 | Ibrahim et al. | Apr 2005 | B1 |
6941429 | Kamvysselis et al. | Sep 2005 | B1 |
7000025 | Wilson | Feb 2006 | B1 |
7065582 | Dwork et al. | Jun 2006 | B1 |
7165180 | Ducharme | Jan 2007 | B1 |
7181578 | Guha et al. | Feb 2007 | B1 |
7219237 | Trimberger | May 2007 | B1 |
7237045 | Beckmann et al. | Jun 2007 | B2 |
7290236 | Flaherty et al. | Oct 2007 | B1 |
7295519 | Sandy et al. | Nov 2007 | B2 |
7397764 | Cherian et al. | Jul 2008 | B2 |
7411958 | Dropps et al. | Aug 2008 | B2 |
7414973 | Hart et al. | Aug 2008 | B2 |
7415574 | Rao et al. | Aug 2008 | B2 |
7436773 | Cunningham | Oct 2008 | B2 |
7472231 | Cihla et al. | Dec 2008 | B1 |
7568067 | Mase et al. | Jul 2009 | B1 |
7583597 | Dropps et al. | Sep 2009 | B2 |
7617365 | Zhang et al. | Nov 2009 | B2 |
20010016878 | Yamanaka | Aug 2001 | A1 |
20020024970 | Amaral et al. | Feb 2002 | A1 |
20020059439 | Arroyo et al. | May 2002 | A1 |
20020169521 | Goodman et al. | Nov 2002 | A1 |
20030021417 | Vasic et al. | Jan 2003 | A1 |
20030065882 | Beeston et al. | Apr 2003 | A1 |
20030093567 | Lolayekar et al. | May 2003 | A1 |
20030185154 | Mullendore et al. | Oct 2003 | A1 |
20040010660 | Konshak et al. | Jan 2004 | A1 |
20040081082 | Moody et al. | Apr 2004 | A1 |
20040088574 | Walter et al. | May 2004 | A1 |
20040148376 | Rangan et al. | Jul 2004 | A1 |
20040153566 | Lalsangi et al. | Aug 2004 | A1 |
20040158668 | Golasky et al. | Aug 2004 | A1 |
20040160903 | Gai et al. | Aug 2004 | A1 |
20040170432 | Reynolds et al. | Sep 2004 | A1 |
20040202073 | Lai et al. | Oct 2004 | A1 |
20050021949 | Izawa et al. | Jan 2005 | A1 |
20050031126 | Edney et al. | Feb 2005 | A1 |
20050114663 | Cornell et al. | May 2005 | A1 |
20050117522 | Basavaiah et al. | Jun 2005 | A1 |
20050144394 | Komarla et al. | Jun 2005 | A1 |
20050192923 | Nakatsuka | Sep 2005 | A1 |
20060039370 | Rosen et al. | Feb 2006 | A1 |
20060059313 | Lange | Mar 2006 | A1 |
20060059336 | Miller et al. | Mar 2006 | A1 |
20060112149 | Kan et al. | May 2006 | A1 |
20060126520 | Nambiar et al. | Jun 2006 | A1 |
20060248278 | Beeston et al. | Nov 2006 | A1 |
20070101134 | Parlan et al. | May 2007 | A1 |