This invention concerns a system for taking a resource offline in a storage network.
A storage area network (SAN) is made up of two primary components: storage systems and a logically isolated network. The storage systems may include disks, tapes, and SAN-management software, that must be SAN-capable. The network may include adapters, wiring, bridges, hubs, switches, and directors. Adapters attach servers and peripherals to the wiring in the network. Bridges are used to convert from one protocol to another. Hubs, switches, and directors provide a central connection point and routing capability. Currently, a large number of SANs utilize fibre channel to provide connections for processors and storage systems on the SAN.
FICON (fiber connection) is a high-speed input/output (I/O) interface for mainframe computer connections to storage devices based on the NCITS fibre channel standard (FC-SB-2), and SANs are available in the market that are based on FICON. FICON products use a mapping layer that is based on the ANSI (American National Standards Institute) X3.230-1994 fibre channel—physical and signaling interface standard (FC-PH) that specifies physical signaling, cabling and transmission speeds for fibre channel. Because FICON is based on the industry-standard fibre channel architecture, the fiber infrastructure and fiber directors of a network can be shared between different types of servers. For example, fiber interfaces can readily be switched between FICON and FCP (Fibre Channel Protocol).
Interface errors (IFCCs) in a FICON enabled SAN may be caused by a resource provides mechanisms needed to transfer data blocks of data end-to-end. FC4 is the highest level layer in the fibre channel standards set. FC4 defines the mapping between the lower level layers of the fibre channel and Upper Layer Protocols such as the IPI and SCSI command sets, the HIPPI data framing, and other ULPs. As a result of the timeouts discussed above, exchanges will be aborted, and errors will be logged.
A fibre channel has potentially hundreds of open exchanges. As a result, use of the above-described method of taking a resource offline provides a potential for causing hundreds of abnormally terminated exchanges. This is problematic because it may result in a perception of poor quality of the SAN, undue service calls, and can drive a large warranty cost to investigate the cause of such errors.
To avoid the problems discussed above, there is a need to eliminate or reduce the detrimental effects of the taking resources offline.
It is an object of the present invention to provide a method, apparatus, and computer program product to reduce the detrimental effects of taking resources offline. For the present invention, a resource is taken offline in a network by quiescing activity to the resource, providing notification that the resource will become unavailable, and taking the resource offline.
In accordance with one or more embodiments of the present invention, errors that occur when a resource is taken offline in a network are eliminated or reduced.
All equipment that is connected to a fibre channel network must contain at least one fibre channel port. The ports are able to send or receive data under the fibre channel protocol. Each port type has its own characteristics, and is required to connect to a limited set of port types on the other end of the connection to create a valid fibre channel configuration. Fibre channel standards define several types of ports. N_PORTs are the simplest ports. N_PORTS are implemented on servers, storage units and similar devices. An N_PORT may only participate in a point-to-point connection with another N_PORT, or with a F_PORT on a switch. F_PORTs are ports used on a fibre channel switch to connect the fibre channel switch to N_PORTs on nodes. Thus, any port on a node device, (the device may be a disk drive or a personal computer (PC)) is an N_PORT. A port on a fabric is a F_PORT.
A channel is an entity, typically of a host system or computer that includes an N_PORT and elements that perform functions specified by FC-SB-2 (a mapping protocol that maps a particular upper level protocol instance to FC-PH) to provide access to resources by means of control units or emulated control units. In this configuration an example of a resource is an I/O device. A control unit is a physical or emulated entity that includes at least one N_PORT and elements that adapt the characteristics of one or more I/O devices to enable their attachment to a link interface of a channel. In particular, communication over a fibre channel network occurs between a pair of N_PORTs, and depending upon the configuration, the communicating N_PORTs are between a channel and control unit. A resource or device refers to an I/O device such as a direct-access-storage device or a port on an I/O device. Operation of the I/O device is regulated by a control unit that provides logical and buffering capabilities necessary to operate the I/O device.
When a resource in a network is taken offline a control unit associated with that resource is also taken offline, and as a result, an N_PORT associated with that control unit is also taken offline. For the present invention the resource may be a storage device, the associated control unit may be control unit 300 and the N_PORT may be N_PORT 310 shown in
At step 210, in response to the request to go offline, N_PORT 310 quiesces activity to the resource. N_PORT 310 quiesces activity to the resource by quiescing links to N_PORT 310. Quiescing links to N_PORT 310 includes N_PORT 310 completing all channel programs in progress and returning control unit busy status to new commands or control functions other than a system reset command or a purge path command. The FC-SB-2 (fibre channel single-byte command code sets-2 mapping protocol) documentation contains the appropriate information for quiescing links in clause 5.2 that states: “Control units may quiesce the link by completing all channel programs in progress and by returning control unit busy status to new commands or control functions other than a system reset or purge path”.
At step 215, N_PORT 310 determines whether the quiescent period has elapsed. A quiescent period timer may be utilized in determining whether quiescent period has elapsed. In accordance with one embodiment of the present invention, the quiescent period is set to be five (5) seconds. In accordance with one such embodiment of the present invention, N_PORT 310 determines whether the quiescent period has elapsed in response to software running on a CPU contained therein, which software can be fabricated routinely by one of ordinary skill in the art without undue experimentation. Alternatively, N_PORT 310 may determine whether the quiescent period has elapsed in response to logic contained therein, which logic can be fabricated routinely by one of ordinary skill in the art without undue experimentation.
If the quiescent period has elapsed, control is transferred to step 220. If the quiescent period has not elapsed, then the system remains in a state of quiescing activity to the resource and determining if the quiescent period has elapsed (step 210 and step 215). During the quiescent period new start I/O commands received from a host system or computer (i.e., an N_PORT in a channel thereof) are not accepted by N_PORT 310. N_PORT 310 responds with a “control unit busy” status to new start I/O commands. Because “control unit end” status is not presented for these logical paths, the host system or computer will not re-drive the start I/O commands. The host system or computer will not re-drive the start I/O commands only for storage control unit N_PORTs that have a method for inhibiting the host system or computer from continuously re-driving commands. A FICON system 390 storage unit running the FICON protocol is an example of a storage control unit with associated N_PORTs that will not continuously re-drive commands. As one can readily appreciate, the embodiments described above for executing step 210 and step 215 of flowchart 200 shown in
In accordance with this embodiment of the present invention, the steps described above with respect to step 210 and step 215 advantageously reduce or eliminate the number of open exchanges between the host system or computer and the resource.
At step 220, for each remote N_PORT that has logged into N_PORT 310 using a N_PORT login extended link service (ELS) command (PLOGI ELS command, see Fibre Channel—Single-Byte Command Code Sets-2 Mapping Protocol (FC-SB-2), clause 6.2.2), N_PORT 310 performs a step of initiating an explicit N_PORT logout. An explicit N_PORT logout is accomplished by sending an N_PORT logout extended link service request (LOGO ELS) to the remote N_PORT, and by setting a timer to be utilized in determining whether responses to the explicit N_PORT logouts have been received within a predetermined reply waiting period. LOGO ELS is described in FC-SB-2, clause 6.2.3. For example, in accordance with one embodiment of the present invention, the predetermined reply waiting period is set to be two (2) seconds.
At step 225 N_PORT 310 determines whether the remote N_PORTs have responded to the LOGO ELSs with accept (ACC) ELS replies. After, a predetermined fraction of the predetermined reply waiting period (determined by examining the timer), control is transferred to step 230. As one can readily appreciate, the embodiments described above for executing steps 220 and 225 provide means for providing notification that the resource will become unavailable.
At step 230 the N_PORT 310 determines whether all remote N_PORTs have responded with ACC ELSs to the LOGO ELSs sent by N_PORT 310, or whether the predetermined reply waiting period has elapsed. If all the remote N_PORTs have so responded, or if the predetermined reply waiting period has elapsed, control is transferred to step 235. In accordance with one such embodiment of the present invention, N_PORT 310 executes step 235 in response to software running on a CPU contained therein, which software can be fabricated routinely by one of ordinary skill in the art without undue experimentation. Further, in accordance with one such embodiment, the timer may be contained within the N_PORT, or the timer may be generated utilizing a CPU in accordance with any one of a number of methods that are well known to those of ordinary skill in the art. In accordance with this embodiment of the present invention, the steps described above with respect to step 220, step 225, and decision step 230 provide the remote N_PORTs with a timely indication that N_PORT 310 (i.e., the N_PORT associated with the resource) is going offline. Advantageously, this eliminates a need for a test initialization state procedure that would otherwise be required in a FICON environment.
At step 235 N_PORT 310 proceeds with procedures to take itself offline. For example, in accordance with one or more embodiments, N_PORT 310 transmits an offline primitive sequence (OLS), drops light, drops power, and so forth. In accordance with one such embodiment of the present invention, the local N_PORT carries out this step in response to software running on a CPU contained therein, which software can be fabricated routinely by one of ordinary skill in the art without undue experimentation. Alternatively, the local N_PORT may carry out this step in response to logic contained therein, which logic can be fabricated routinely by one of ordinary skill in the art without undue experimentation. As one can readily appreciate, the embodiments described above for carrying out step 235 provide a means for taking the resource offline.
Because of the receipt of N_PORT LOGO ELSs (and the consequent removal of logical paths), the need for a test initialization state procedure that would otherwise be required in a FICON environment is removed. As a result, the time it takes for initialization state testing in response to the RSCN ELS requests (see the procedure described in the Background of the Invention) is eliminated or reduced. As a consequence, FC2 and FC4 timeouts will be eliminated or reduced, and logged errors will be eliminated or reduced. Advantageously, as a result of utilizing one or more embodiments of the present invention to eliminate or reduce such errors, a perception of poor quality of the SAN, undue service calls, and associated warranty costs can be eliminated or reduced.
In the above description N_PORT 310 was used to illustrate an example of a local N_PORT 310 going offline. Those skilled in the art will recognize that the foregoing description is not limited to N_PORT 310.
Those skilled in the art will recognize that the foregoing description has been presented for the sake of illustration and description only. As such, it is not intended to be exhaustive or to limit the invention to the precise form disclosed. For example, one or more further embodiments of the present invention include a network, for example, and without limitation, a storage area network that utilizes Fibre Channel, that includes software for performing one or more of the above-described embodiments of the present invention, which software can be generated routinely and without undue experimentation by one of ordinary skill in the art in light of the detailed description provided above.
This application claims the benefit of U.S. Provisional Application No. 60/402,376, filed on Aug. 9, 2002, which application is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4237533 | Mills et al. | Dec 1980 | A |
4503535 | Budde et al. | Mar 1985 | A |
4974147 | Hanrahan et al. | Nov 1990 | A |
5056091 | Hunt | Oct 1991 | A |
5276848 | Gallagher et al. | Jan 1994 | A |
5331673 | Elko et al. | Jul 1994 | A |
5339405 | Elko et al. | Aug 1994 | A |
5369570 | Parad | Nov 1994 | A |
5539875 | Elko et al. | Jul 1996 | A |
5598541 | Malladi | Jan 1997 | A |
5694617 | Bishop et al. | Dec 1997 | A |
5793968 | Gregerson et al. | Aug 1998 | A |
5875290 | Bartfai et al. | Feb 1999 | A |
5968116 | Day et al. | Oct 1999 | A |
6021262 | Cote et al. | Feb 2000 | A |
6101166 | Baldwin et al. | Aug 2000 | A |
6161208 | Dutton et al. | Dec 2000 | A |
6175927 | Cromer et al. | Jan 2001 | B1 |
6304980 | Beardsley et al. | Oct 2001 | B1 |
6336193 | Yudenfriend et al. | Jan 2002 | B1 |
6338145 | Yudenfriend et al. | Jan 2002 | B1 |
6338151 | Yudenfriend et al. | Jan 2002 | B1 |
6546498 | Saegusa | Apr 2003 | B1 |
6571355 | Linnell | May 2003 | B1 |
6587962 | Hepner et al. | Jul 2003 | B1 |
6594786 | Connelly et al. | Jul 2003 | B1 |
6609165 | Frazier | Aug 2003 | B1 |
6629156 | Odenwald et al. | Sep 2003 | B1 |
6643795 | Sicola et al. | Nov 2003 | B1 |
6691184 | Odenwald et al. | Feb 2004 | B2 |
6888792 | Gronke | May 2005 | B2 |
6895528 | Cantwell et al. | May 2005 | B2 |
6931440 | Blumenau et al. | Aug 2005 | B1 |
7036110 | Jeyaraman | Apr 2006 | B2 |
7039827 | Meyer et al. | May 2006 | B2 |
7069317 | Colrain et al. | Jun 2006 | B1 |
7085860 | Dugan et al. | Aug 2006 | B2 |
7127633 | Olson et al. | Oct 2006 | B1 |
7152105 | McClure et al. | Dec 2006 | B2 |
7447939 | Faulkner et al. | Nov 2008 | B1 |
7484021 | Rastogi et al. | Jan 2009 | B2 |
20010009014 | Savage et al. | Jul 2001 | A1 |
20010034804 | Hernandez, III | Oct 2001 | A1 |
20020133601 | Kennamer et al. | Sep 2002 | A1 |
20030135782 | Matsunami et al. | Jul 2003 | A1 |
20030182459 | Jeyaraman | Sep 2003 | A1 |
20030200399 | Dawkins et al. | Oct 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20040153702 A1 | Aug 2004 | US |
Number | Date | Country | |
---|---|---|---|
60402376 | Aug 2002 | US |