1. Field
The present disclosure relates to intelligent storage systems and methods in which logical storage units (LUNs) are managed for use by host systems that perform data storage input/output (I/O) operations on the LUNs. More particularly, the present disclosure pertains to intelligent storage systems that support active-passive configurations using redundant communication paths from each host system to each LUN.
2. Description of the Prior Art
By way of background, many intelligent storage systems that support redundant communication paths to the same LUN implement active/passive configurations wherein host systems are allowed to access the LUN on only a single path at any given time. This represents the active path whereas the remaining path(s) to the LUN represents passive path(s). Additionally, storage systems may also allow administrators to define preferred (default) paths and non-preferred (non-default) paths to balance the I/O traffic on the storage system controllers. Initially, a preferred path to a LUN is usually selected to be the LUN's active path.
During storage system operations, a path failure may occur in which a host is no longer able to access a LUN on the active path. If the host detects the path failure, it may send a specific failover command (e.g., a SCSI MODE_SELECT command) to the storage system to request that the non-preferred/passive path be designated as the new active path and that the preferred/active path be designated as the new passive path. The storage system will then perform the failover operation in response to the host's failover request. Alternatively, in lieu of sending a specific failover command, the host may simply send an I/O request to the LUN on the passive path. This I/O request will be failed by the storage system but the storage system will then automatically perform the failover operation.
In either of the above situations, it is possible that other hosts can still reach the LUN on the preferred path even though it has been failed over to passive status. For example, the path failure that led to the failover may have been caused by a hardware or software problem in a communication device or link that affects only a single host rather than the storage system controller that handles I/O to the LUN on behalf of all hosts. Other hosts connected to the same controller may thus be able to communicate with the LUN on the preferred path that has now been placed in passive mode. Insofar as such other hosts will usually be programmed to favor using the preferred path as the active path, one or more of such hosts may initiate a failback operation that restores the paths to their default status in which the preferred path is the active path and the non-preferred path is the passive path. The failback operation may then trigger another failover operation from the original host that did a failover if the original path failure condition associated with the preferred path is still present. Thus a repeating cycle of failover/failback operations may be performed to switch between the preferred and non-preferred paths. This path-thrashing activity, which is called the “ping-pong” effect, causes unwanted performance problems.
A method, system and computer program product are provided for avoiding a ping-pong effect on active-passive paths in a storage system managing one or more logical storage units (LUNs). A first path to the LUNs is designated as an active path for use by host systems to access the LUNs for data storage input/output (I/O) operations. A second path to the LUNs is designated as a passive path for use by the host systems to access the LUNs for data storage I/O operations. The first path is also designated as a preferred path for use by the host systems to access the LUNs for data storage I/O operations. In response to a path failure on the first path in which a host system cannot access the LUNs on the first path, a failover operation is performed wherein the second path is designated as the active path to the LUNs and the first path is designated as the passive path to the LUNs. Notwithstanding the failover operation, the designation of the first path as the preferred path to the LUNs is not changed. Subsequent failback operations that attempt to redesignate the first path as the active path to the LUNs due to the first path being the preferred path are conditionally inhibited. In particular, a failback operation initiated by a host system that is not the failover host will fail and only the failover host will be permitted to initiate the failback.
The foregoing and other features and advantages will be apparent from the following more particular description of an example embodiment, as illustrated in the accompanying Drawings, in which:
Before describing an example embodiment of the disclosed subject matter, it will be helpful to review the ping-pong phenomenon associated with conventional active-passive storage storage systems in more detail. Turning now to
Following the failback operation of
Turning now to the remaining drawing figures, wherein like reference numerals represent like elements in all of the several views,
In the interest of simplicity, the storage environment 12 is shown as having a single storage system 18. In an actual distributed data storage environment, there could be any number of additional storage systems and devices of various type and design. Examples include tape library systems, RAID (Redundant Array of Inexpensive Disks) systems, JBOD (Just a Bunch Of Disks) systems, etc. Likewise, there could be any number of host systems in addition to Host 1 and Host 2. It should also be understood that the individual connection components that may be used to implement embodiments of the SAN 20, such as links, switches, routers, hubs, directors, etc., are not shown in
In addition to their connectivity to SAN 20, Host 1 and Host 2 may also communicate with a local area network (LAN) 22 (or alternatively a WAN or other type of network) that comprises one or more data processing clients 20, several of which are identified as client systems 201, 202 . . . 20n. One or more data sets utilized by the client systems 20 are assumed to reside on the storage system 18. Access to these data sets is provided by Host 1 and Host 2, which act as intermediaries between the storage system 18 and the client systems 20.
There are a variety of computer hardware and software components that may be used to implement the various elements that make up the SAN 20, depending on design preferences. The network interconnection components of the SAN 20 may include any number of switches, directors, hubs, bridges, routers, gateways, etc. Such products are conventionally available from a wide array of vendors. Underlying the SAN design will be the selection of a suitable communication and media technology. Most commonly, a fibre channel architecture built using copper or fiber optical media will provide the physical and low level protocol layers. Higher level protocols, such SCSI-FCP (Small Computer System Interface-Fibre Channel Protocol), IPI (Intelligent Peripheral Interface), IP (Internet Protocol), FICON (Fiber Optic CONnection), etc., can be mapped onto the fibre channel protocol stack. Selection of the fibre channel architecture will dictate the choice of devices that will be used to implement the interconnection components that comprise the SAN 20, as well as the network interface hardware and software that connect Host 1, Host 2 and storage system 18 to the SAN. Although less commonly, other low level network protocols, such as Ethernet, could alternatively be used to implement the SAN 20. It should also be pointed out that although the SAN 20 will typically be implemented using wireline communications media, wireless media may potentially also be used for one or more of the communication links.
Host 1 and Host 2 may be implemented as SAN storage manager servers that offer the usual SAN access interfaces to the client systems 20. They can be built from conventional programmable computer platforms that are configured with the hardware and software resources needed to implement the required storage management functions. Example server platforms include the IBM® zSeries®, Power® systems and System x™ products, each of which provides a hardware and operating system platform set, and which can be programmed with higher level SAN server application software, such as one of the IBM® TotalStorage® DS family of Storage Manager systems.
Host 1 and Host 2 each include a pair of network communication ports 24 (Port A) and 26 (Port B) that provide hardware interfaces to the SAN 20. The physical characteristics of Port A and Port B will depend on the physical infrastructure and communication protocols of the SAN 20. If SAN 20 is a fibre channel network, Port A and Port B of each host may be implemented as conventional fibre channel host bus adapters (HBAs). Although not shown, additional SAN communication ports could be provided in each of Host 1 and Host 2 if desired. Ports A and Port B of each host are managed by a multipath driver 28 that may be part of an operating system kernel 30 that includes a file system 32. The operating system kernel 30 will typically support one or more conventional application level programs 34 on behalf of the clients 20 connected to the LAN 22. Examples of such applications include various types of servers, including but not limited to web servers, file servers, database management servers, etc.
The multipath drivers 28 of Host 1 and Host 2 support active-passive mode operations of the storage system 18. Each multipath driver 28 may be implemented to perform conventional multipathing operations such as logging in to the storage system 18, managing the logical paths to the storage system, and presenting a single instance of each storage system LUN to the host file system 32, or to a host logical volume manager (not shown) if the operating system 30 supports logical volume management. As is also conventional, each multipath driver 28 may be implemented to recognize and respond to conditions requiring a storage communication request to be retried, failed, failed over, or failed back.
The storage system 18 may be implemented using any of various intelligent disk array storage system products. By way of example only, the storage system 18 could be implemented using one of the IBM® TotalStorage® DS family of storage servers that utilize RAID technology. In the illustrated embodiment, the storage system 18 comprises an array of disks (not shown) that may be formatted as a RAID, and the RAID may be partitioned into a set of physical storage volumes 36 that may be identified as SCSI LUNs, such as LUN 0, LUN 1, LUN 2, LUN 3 . . . LUN n, LUN n+1. Non-RAID embodiments of the storage system 18 may also be utilized. In that case, each LUN could represent a single disk or a portion of a disk. The storage system 18 includes a pair of controllers 38A (Controller A) and 38B (Controller B) that can both access all of the LUNs 36 in order to manage their data storage input/output (I/O) operations. In other embodiments, additional controllers may be added to the storage system 18 if desired. Controller A and Controller B may be implemented using any suitable type of data processing apparatus that is capable of performing the logic, communication and data caching functions needed to manage the LUNs 36. In the illustrated embodiment, each controller respectively includes a digital processor 40A/40B that is operatively coupled (e.g., via system bus) to a controller memory 42A/42B and to a disk cache memory 44A/44B. A communication link 45 facilitates the transfer of control information and data between Controller A and Controller B.
The processors 40A/40B, the controller memories 42A/42B and the disk caches 44A/44B may be embodied as hardware components of the type commonly found in intelligent disk array storage systems. For example, the processors 40A/40B may be implemented as conventional single-core or multi-core CPU (Central Processing Unit) devices. Although not shown, plural instances of the processors 40A/40B could be provided in each of Controller A and Controller B if desired. Each CPU device embodied by the processors 40A/40B is operable to execute program instruction logic under the control of a software (or firmware) program that may be stored in the controller memory 42A/42B (or elsewhere). The disk cache 44A/44B of each controller 38A/38B is used to cache disk data associated with read/write operations involving the LUNs 36. During active-passive mode operations of the storage system 18, each of Controller A and Controller B will cache disk data for the LUNs that they are assigned to as the primary controller. The controller memory 42A/42B and the disk cache 44A/44B may variously comprise any type of tangible storage medium capable of storing data in computer readable form, including but not limited to, any of various types of random access memory (RAM), various flavors of programmable read-only memory (PROM) (such as flash memory), and other types of primary storage.
The storage system 18 also includes communication ports 46 that provide hardware interfaces to the SAN 20 on behalf of Controller A and Controller B. The physical characteristics of these ports will depend on the physical infrastructure and communication protocols of the SAN 20. A suitable number of ports 46 is provided to support redundant communication wherein Host 1 and Host 2 are each able to communicate with each of Controller A and Controller B. This redundancy is needed to support active-passive mode operation of the storage system 18. In some embodiments, a single port 46 for each of Controller A and Controller B may be all that is needed to support redundant communication, particularly if the SAN 20 implements a network topology. However, in the embodiment of
As discussed in the “Introduction” section above, Controller A and Controller B may share responsibility for managing data storage I/O operations between between each of Host 1 and Host 2 and the various LUNs 36. By way of example, Controller A may be the primary controller for all even-numbered LUNs (e.g., LUN 0, LUN 2 . . . LUN n), and the secondary controller for all odd-numbered LUNs (e.g., LUN 1, LUN 3 . . . LUN n+1). Conversely, Controller B may be the primary controller for all odd-numbered LUNs, and the secondary controller for all even-numbered LUNs. Other controller-LUN assignments would also be possible, particularly if additional controllers are added to the storage system 18.
Relative to Host 1, Port A of Host 1 may be configured to communicate with Port A1 of Controller A, and Port B of Host 1 may be configured to communicate with Port B1 of Controller B. In an example embodiment wherein Controller A is the primary controller for all even-numbered LUNs in storage system 18, Host 1 would use its Port A to access even-numbered LUNs on a preferred/active path that extends through Controller A. Port B of Host 1 would provide a non-preferred/passive path to the even-numbered LUNs that extends through Controller B in the event of a path failure on the preferred/active path. For odd-numbered LUNs wherein Controller B is the primary controller, Host 1 would use its Port B to access all such LUNs on a preferred/active path that extends through Controller B. Port A of Host 1 would provide a non-preferred/passive path to the odd-numbered LUNs that extends through Controller A.
Relative to Host 2, Port A of Host 1 may be configured to communicate with Port A2 of Controller A, and Port B of Host 1 may be configured to communicate with Port B2 of Controller B. In an example embodiment wherein Controller A is the primary controller for all even-numbered LUNs in storage system 18, Host 2 would use its Port A to access even-numbered LUNs on a preferred/active path that extends through Controller A. Port B of Host 2 would provide a non-preferred/passive path to the even-numbered LUNs that extends through Controller B. For odd-numbered LUNs wherein Controller B is the primary controller, Host 2 would use its Port B to access all such LUNs on a preferred/active path that extends through Controller B. Port A of Host 2 would provide a non-preferred/passive path to the odd-numbered LUNs that extends through Controller A.
The function of the processors 40A/40B is to implement the various operations of the controllers 38A/38B, including their failover and failback operations when the storage system 18 is in the active-passive storage mode. Control programs 48A/48B that may be stored in the controller memories 42A/42B (or elsewhere) respectively execute on the processors 40A/40B to implement the required control logic. As indicated, the logic implemented by the control programs 48A/48B includes failover/failback operations, which may be performed in the manner described below in connection with
As discussed in the “Introduction” section above, the ping-pong effect caused by repeated failover/failback operations following a path failure is detrimental to efficient storage system operations. For example, assume (according to the example above) that Controller A is the primary controller for all even-numbered LUNs in storage system 18. The preferred/active paths from Host 1 and Host 2 to the even-numbered LUNs will be through Controller A and the non-preferred/passive paths will be through Controller B. A path failure on the preferred/active path between Host 1 and Controller A may result in Host 1 initiating a failover operation in which Controller B assumes responsibility for the even-numbered LUNs. The non-preferred paths from Host 1 and Host 2 to Controller B will be made active and the preferred paths will assume passive status. This allows Host 1 to resume communications with all even-numbered LUNs. However, Host 2 will detect that it is communicating with the even-numbered LUNs on a non-preferred path but has the capability of communicating on the preferred path. If storage system 18 was not adapted to deal with the ping-pong effect, it would allow Host 2 to initiate a failback operation that results in the preferred path from Host 1 and Host 2 to Controller A being restored to active status. This would be optimal for Host 2 but would disrupt the communications of Host 1, assuming the failure condition on its preferred/active path to Controller A still exists. Host 1 would thus reinitiate a failover operation, which would be followed by Host 2 reinitiating a failback operation, and so on.
The foregoing ping-pong problem may be solved by programming Controller A and Controller B to enforce conditions on the ability of Host 1 and Host 2 to initiate a failback operation, to track the port status of the host that initiated the failover operation, and by allowing the controllers themselves to initiate a failback operation based on such status. In particular, Controller A and Controller B may be programmed to only allow a failback operation to be performed by a host that previously initiated a corresponding failover operation (hereinafter referred to as the “failover host”). For example, if the failover host notices that the path failure has been resolved, it may initiate a failback operation to restore the preferred path to active status. This failback operation satisfies the condition imposed by the controller logic, and will be permitted. Other hosts that have connectivity to both the preferred path and the non-preferred path to a LUN will not be permitted to initiate a failback operation. In some embodiments, such other hosts may be denied the right to initiate a failback operation even if they only have connectivity to a LUN via the preferred path, such that the failback-attempting host is effectively cutoff from the LUN. In that situation, it may be more efficient to require the client systems 20 to access the LUN through some other host than to allow ping-ponging.
Controller A and Controller B may be further programmed to monitor the port status of the failover host to determine if it is still online. If all of the ports of the failover host have logged out or other otherwise disconnected from the storage system 18, the controller itself may initiate a failback operation. As part of the controller-initiated failback operation, the controller may first check to see if other hosts will be cutoff, and if so, may refrain from performing the operation. Alternatively, the controller may proceed with failback without regard to the host(s) being cutoff.
The foregoing logic of Controller A and Controller B may be implemented by each controller's respective control program 48A/48B.
In blocks 60 and 62 of
Following block 62 of
If block 64 determines that a failover operation has not been performed, processing returns to block 60 insofar as there would be no possibility of a failback operation being performed in that case. On the other hand, if block 64 determines that a failover operation has been performed, processing proceeds to block 66 and control program 48A tests whether a failback operation has been requested by any host. If not, nothing more needs to be done and processing returns to block 60. As described in the “Introduction” section above, a host may request a failback operation by issuing an appropriate command (such as a SCSI MODE_SELECT command) to Controller A, which is on the preferred path that was placed in a passive state by the previous failover operation. In other embodiments, the host may request a failback operation by attempting to resume use of the preferred path that was made passive by the previous failover operation. In such an embodiment, Controller A would detect such communication and automatically implement the failback operation.
If block 66 determines that a failback operation has been requested, the control program 48A consults state information conventionally maintained by Controller A (such as a log file) to determine in block 68 whether the request came from the failover host that initiated the previous failover operation. If true, this means that the failover host has determined that it is once again able to communicate on the preferred path. Insofar as there is no possibility that a failback to that path will trigger a ping-pong effect, the control program 48A may safely implement the failback operation in block 70. Note, however, that control program 48A may first test that all of the remaining hosts are still able to communicate on the preferred path. This may be determined by checking host port table 50A to ensure that each host has at least one port logged into Controller A.
If block 68 determines that the failback request was not made by the failover host, the request is denied in block 72. Thereafter, in block 74, the control program 48A checks whether the failover host has gone offline. This may be determined by checking host port table 50A to see if the failover host has any ports logged into Controller A.
If the failover host is determined to be offline in block 74, Controller A may initiate and perform a failback operation, there being no possibility that this will trigger a ping-pong effect insofar as the failover host is no longer present. Again, however, control program 48A may first test that all of the remaining hosts are still able to communicate on the preferred path. In some embodiments, the failback operation may not be implemented unless all remaining hosts are reachable on the preferred path. In other embodiments, failback may proceed despite one or more hosts being unable to communicate on the preferred path. As part of block 74, the Controller A may also remove any notion of the failover host from its controller memory 42A, so as to allow future failbacks.
Accordingly, a technique has been disclosed for avoiding a ping-pong effect in active-passive storage. It will be appreciated that the foregoing concepts may be variously embodied in any of a data processing system, a machine implemented method, and a computer program product in which programming logic is provided by one or more machine-usable storage media for use in controlling a data processing system to perform the required functions. Example embodiments of a data processing system and machine implemented method were previously described in connection with
Example data storage media for storing such program instructions are shown by reference numerals 42A/42B (memory) of Controller A and Controller B in
Although various example embodiments have been shown and described, it should be apparent that many variations and alternative embodiments could be implemented in accordance with the disclosure. It is understood, therefore, that the invention is not to be in any way limited except in accordance with the spirit of the appended claims and their equivalents.