This application is a national stage application under 35 U.S.C. §371 of PCT/US2013/023725, filed Jan. 30, 2013.
Host devices are able to access data stored at storage devices. In a network arrangement, access of the storage devices can be managed by controller nodes that are interconnected by a communications fabric to the host devices.
The host devices are able to submit data requests to the controller nodes. In response to the data requests from the host devices, the controller nodes can submit corresponding data requests to the storage devices to access (read or write) data of the corresponding storage devices.
Some embodiments are described with respect to the following figures:
Controller nodes that manage access of storage devices can include storage array controllers or other types of controllers. A controller node is coupled to one or multiple storage devices (e.g. disk-based storage devices, integrated circuit memory devices, etc.). In some arrangements, the storage devices can be part of respective groups of storage devices. For example, the groups of storage devices can include arrays of storage devices.
In some configurations, each controller node manages access of its respective group of storage device(s). In other configurations, each controller node is able to access multiple groups of storage devices.
A controller node can receive a data request from a host device through a path in a communications fabric to host devices. The communications fabric can be a storage area network (e.g. Fibre Channel storage area network) or other type of network. In response to a data request from a host device, a controller node submits a corresponding data request to a group of storage device(s) to read or write data of such group.
The host devices can be server computers, which are coupled to client devices. Client devices can submit data requests to the host devices, which in turn submit corresponding data requests to the controller nodes. In other examples, a host device can itself be a client device that is able to generate data requests (without having to first receive data requests from another client device).
Failures can occur in a network arrangement that has host devices and controller nodes. The failures can occur at various points in the network arrangement, including the controller nodes, the communications fabric, or at other points. A failure can include any of the following: malfunction or fault of hardware equipment, malfunction or fault of machine-readable instructions (software and/or firmware), a failure caused by an attack by malware (e.g. virus, worm, spyware, etc.), or any other condition that prevents normal operation of a storage system that includes the controller nodes, storage devices, and host devices.
Traditionally, to respond to a failure that prevents successful communication over a path between a host device and a controller node, logic in the host device can be used to perform a failover from the failed path to a different path. In some examples, such logic can include a multi-pathing module that is able to selectively perform data communications over any one of multiple paths between a host device and controller nodes. The multi-pathing module can perform load balancing (to balance the data access load across multiple paths), as well as to provide failover support to fail over from one path to another path in case of a detected failure.
However, employing logic in a host device to perform failover involves making a change at the host device. When failover is performed at the host device, the host device would mark a path associated with the failure as being unavailable. As a result, the host device would no longer be able to use such path, which can reduce input/output communications capacity and can affect load balancing and/or other tasks performed at the host device.
In accordance with some implementations, instead of performing failover at a host device, failover can instead be performed (at least in part) at a controller node. The failover performed at a controller node can be transparent to a host device. In addition, failback can also be performed at a controller node if the failure condition that caused the failover is later resolved. As discussed further below, the failback is a reliable failback that first performs a health check to ensure that the network infrastructure is healthy prior to performing failback. Failback can refer to a process of returning data communications to the component that had previously failed (and thus caused a failover) and which has subsequently resumed normal operation (in other words, the failure condition has been resolved).
The network arrangement can include two or more host devices 102, and/or two or more controller nodes 108. The controller nodes 108 and 110 manage access of data in storage device groups 112, 114, where a storage device group can include any group of one or multiple storage devices. In the example of
As further depicted in
As further depicted in
Although
Each port can be assigned a port identifier, which can identify the respective communications adapter. In some examples, a port identifier can be a port world wide name (WWN). In other examples, a port identifier can be another type of identifier.
A logical path can be established between a port identifier (e.g. port WWN) of a port in a host device and a port identifier (e.g. port WWN) of a port in a controller node. Communications between a host device and a controller node can occur through the logical path. A logical path differs from a physical path. A physical path can include a specific set of physical links between a specific host device port and a specific controller node port. However, a logical path is defined by port identifiers. If a port identifier of a controller node port is re-assigned to a different controller node port, the logical path remains the same (since it is defined by port identifiers); however, after the port identifier re-assignment, the logical path provides communications between a different pair of ports.
As further depicted in
In other implementations, the failover modules 128 and 130 can be provided outside the respective controller nodes 108 and 110. As described further below, the failover modules 128 and 130 can also perform failback, in case the failure condition that caused a failover is later resolved.
As an example, upon detecting a failure that prevents the controller node port 125 from communicating over the communications fabric 106 with the host device port 121, the failover module 128 in the controller node 108 is able to initiate a failover procedure. In accordance with some implementations, as depicted in
In the example of
In
After re-assignment of WWN1 from the controller node port 125 to the controller node port 127, communications over the logical path P1 (still defined between WWN1 and WWNx) can occur between the host device port 121 and the controller node port 127, as depicted in
More generally, prior to re-assigning WWN1, the logical path P1 is associated with the controller node port 125. However, after re-assigning WWN1, the logical path is associated with the controller node port 127.
If the failure condition at the controller node port 125 is later resolved such that the failure condition no longer exists, the controller node port 125 can be reactivated and WWN1 can be re-assigned from the controller node port 127 back to the controller node 125. Thus, the failover procedure can further perform failback by re-assigning WWN1 back to the controller node port 125, at which point the logical path P1 is as depicted in
In accordance with some implementations, prior to performing the failback, the controller node port 125 to which the failback is to occur can first be temporarily assigned a probe identifier, which can be another WWN (different from WWN1 and WWN2). The probe identifier can be used for the purpose of checking the health of the network infrastructure (including physical paths and switches) between the controller node port 125 and the host device 102. Checking the health of the network infrastructure avoids a ping-pong failover/failback scenario where a failover of the logical path P1 first occurs from the controller node port 125 to the controller node 127, followed by failback from the controller node port 127 back to the controller node 125, followed further by another failover from the controller node port 125 to the controller node 127 should it be determined that the network infrastructure between the controller node port 125 and the host device 102 is not healthy.
Checking the health of the network infrastructure can include checking to ensure that components (including physical paths and switches) of the network infrastructure are working properly so that communications can occur between the controller node port 125 and the host device port 121. In addition, checking the health of the network infrastructure can also include checking to ensure that there is a valid physical path from the controller node port 125 to the host device port 121. In an example where there are multiple communications fabrics, the controller node port 125 may have been re-connected to a different communications fabric following the initial failover from the controller node port 125 to the controller node port 127. The host device port 121 may not be connected to the different communications fabric, and thus re-assigning the logical path P1 back to the controller node port 125 may result in a situation where communications is not possible between the controller node port 125 and the host device port 121 over the logical path P1.
From the perspective of the host device 102, the failover and failback at the controller nodes 108 and 110 is transparent to the host device 102. In the present discussion, a failover procedure can include both failover and failback. After failover and any subsequent failback, the logical path P1 between WWN1 and WWNx remains visible to the host device 102, which can continue to use the logical path P1 for communications.
In the example of
After failover, the controller node port 127 can potentially communicate over multiple different logical paths, including the logical path P1 between WWNx and WWN1, and another logical path between a host device port identifier and WWN2.
In response to detecting the failure, the failover procedure 300 re-assigns (at 304) the first port identifier to a second port in the storage system to cause the logical path to be associated with the second port. The second port can be another port of the same controller node, or alternatively, the second port can be a port of another controller node.
Subsequently, in response to detecting resolution of the failure, the failover procedure 300 assigns (at 306) a probe identifier to the first port. The assignment of the probe identifier to the first port can be a temporary assignment. Resolution of the failure can be detected by the failover module 128 or 130; alternatively, resolution of the failure can be indicated by equipment in the communications fabric 106 to the failover module 128 or 130. Using the probe identifier, the failover procedure 300 checks (at 308) a health of a network infrastructure between the first port and the port of the host device. In response to the checking indicating that the network infrastructure is healthy, the failover procedure 300 assigns (at 310) the first port identifier to the first port to cause failback of the logical path to the first port.
Checking of the health of the network infrastructure can be accomplished as follows, in accordance with some examples. After the first port has been assigned the probe identifier (which can be a probe WWN), the first port attempts to login to the communications fabric 106 using the probe identifier. The login is performed with a server in the communications fabric 106. If login is unsuccessful, that is an indication that the network infrastructure is not healthy, and thus failback of the logical path back to the first port would not be performed.
If login is successful, the failover module 128 (or another entity associated with the first port) can perform a communications test of the network infrastructure. Login of the first port using the probe identifier allows the first port to perform communications over the communications fabric 106. For example, the test can be a loopback test in which test packets can be sent from the first port to the host device port, to obtain a response from the host device port. If a response can be obtained from the host device port in response to the test packets, then the network infrastructure is determined to be healthy. More generally, the test can involve performing a test communication in the communications fabric 106 for ascertaining the health of the communications fabric for communications between the first port and the host device port.
Once the health of the network infrastructure between the first port at the controller node and the host device port has been confirmed, the first port can logout the probe identifier from the communications fabric 106. At this point, the failback performed at 310 can proceed.
In accordance with some implementations, the switch 402 includes a port connection database (or other data structure) 404. The port connection database 404 has multiple entries, where each entry maps a host device port identifier (e.g. port WWN) to a respective host device physical port, and maps a controller node port identifier (e.g. port WWN) to a respective controller node physical port. As part of the failover procedure, the port connection database 404 is updated, based on interaction between the failover module(s) 128 and/or 130 and the switch 402. The respective entry of the port connection database 404 is updated to indicate that the port identifier WWN1 is re-assigned to the physical port 127, rather than physical port 125.
If a failback is subsequently performed in response to resolution of the failure, in which the port identifier WWN1 is assigned back to the physical port 125, then the respective entry of the port connection database 404 can be updated again.
The processor(s) 504 can be connected to a communications interface 506 (e.g. communications adapter 124 or 126 in
The storage medium (or storage media) 508 can be implemented as one or multiple computer-readable or machine-readable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices. Note that the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes. Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can be downloaded over a network for execution.
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/023725 | 1/30/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/120136 | 8/7/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5790775 | Marks | Aug 1998 | A |
6496740 | Robertson et al. | Dec 2002 | B1 |
6578158 | Deitz et al. | Jun 2003 | B1 |
6629264 | Sicola | Sep 2003 | B1 |
6715098 | Chen et al. | Mar 2004 | B2 |
6775230 | Watanabe et al. | Aug 2004 | B1 |
6990068 | Saleh et al. | Jan 2006 | B1 |
7016299 | Kashyap | Mar 2006 | B2 |
7360010 | Ghaffari et al. | Apr 2008 | B2 |
7467191 | Wang et al. | Dec 2008 | B1 |
7565568 | Kumar et al. | Jul 2009 | B1 |
7778157 | Tawri | Aug 2010 | B1 |
7778488 | Nord et al. | Aug 2010 | B2 |
7818408 | Ignatuk | Oct 2010 | B1 |
7984258 | Sicola | Jul 2011 | B2 |
8028193 | Dake et al. | Sep 2011 | B2 |
8037344 | Hara | Oct 2011 | B2 |
8274881 | Allen | Sep 2012 | B2 |
8397092 | Karnowski | Mar 2013 | B2 |
8443232 | Nagineni | May 2013 | B1 |
8626967 | Naik | Jan 2014 | B1 |
8699322 | Tawri | Apr 2014 | B1 |
8711684 | Usgaonkar | Apr 2014 | B1 |
8732339 | Shin et al. | May 2014 | B2 |
8839043 | Long | Sep 2014 | B1 |
8873398 | Kempf et al. | Oct 2014 | B2 |
8909980 | Lewis | Dec 2014 | B1 |
8949656 | Ninan | Feb 2015 | B1 |
9100329 | Jiang et al. | Aug 2015 | B1 |
9118595 | Hariharan | Aug 2015 | B2 |
9298566 | LeFevre | Mar 2016 | B2 |
20040049573 | Olmstead et al. | Mar 2004 | A1 |
20040054866 | Blumenau | Mar 2004 | A1 |
20040081087 | Shea | Apr 2004 | A1 |
20040151188 | Maveli et al. | Aug 2004 | A1 |
20060090094 | McDonnell et al. | Apr 2006 | A1 |
20060171303 | Kashyap | Aug 2006 | A1 |
20060274647 | Wang | Dec 2006 | A1 |
20080059664 | Unger | Mar 2008 | A1 |
20090106475 | Arndt et al. | Apr 2009 | A1 |
20100097941 | Carlson et al. | Apr 2010 | A1 |
20100107000 | Wakelin | Apr 2010 | A1 |
20110228670 | Sasso et al. | Sep 2011 | A1 |
20110317700 | Assarpour | Dec 2011 | A1 |
20120233491 | Chen | Sep 2012 | A1 |
20120324137 | Jinno et al. | Dec 2012 | A1 |
20130151888 | Bhattiprolu | Jun 2013 | A1 |
20150269039 | Akirav et al. | Sep 2015 | A1 |
Number | Date | Country |
---|---|---|
101079795 | Nov 2007 | CN |
101252428 | Aug 2008 | CN |
101599853 | Dec 2009 | CN |
102137009 | Jul 2011 | CN |
102780587 | Nov 2012 | CN |
2017711 | Jan 2009 | EP |
2523113 | Nov 2012 | EP |
WO-2012103758 | Sep 2012 | WO |
Entry |
---|
Extended European Search Report received in EP Application No. 13873588.1, dated Aug. 24, 2016, 10 pages. |
EMC, White Paper, EMC Powerpath Load Balancing and Failover—Comparison with native MPIO operating system solutions, Feb. 2011 (28 pages). |
Henry Newman—Website—www.enterprisestorageforum.com—Getting Failover Right, Oct. 7, 2004 (6 pages). |
Hewlett Packard, HP LTO-5 Tape Libraries Using Data Path Failover and Control Path Failover, HP Part No. AK378-96050, Oct. 2011 (39 pages). |
ISR/WO, PCT/US2013/023725, HP reference 83129110, dated Oct. 25, 2013, 10 pps. |
Storage Area Network Quick Configuration Guide: Access Gateway NPIV with EFCM Management, (Research Paper), Nov. 17, 2007, 15 pps., http://www.brocade.com/downloads/documents/technical—briefs/AG—NPIV—Quick—Config—GA-CG-059-00.pdf. |
Wikipedia, NPIV—N—Port ID Virtualization, Sep. 14, 2012 (1 page). |
Alvin Cox, “Information technology—Serial Attached SCSI-3 (SAS-3)”, Seagate Technology, Apr. 23, 2012 (291 pages). |
Barry Olawsky et al., SFF Committee, SFF Committee SFF-8449 Specification for Shielded Cables Management Interface for SAS,ftp://ftp.seagate.com/sff, Sep. 18, 2013 (16 pages). |
George Penokie, “Information technology—SAS Protocol Layer-2 (SPL-2)”, LSI Corporation, May 10, 2012 (830 pages). |
Gibbons, T. et al., “Switched SAS Sharable, Scalable SAS Infrastructure,” White Paper, Oct. 2010 (8 pages). |
Harry Mason, “Advanced Connectivity Solutions Unleash SAS Potential”, SCSI Trade Association White Paper, Oct. 2009 (18 pages). |
SFF Committee, “SFF Committee SFF-8636 Specification for Management Interface for Cabled Environments”, ftp://ftp.seagate.com/sff, Jun. 19, 2015 (60 pages). |
SFF Committee, “SFF-8644 Specification for Mini Multilane 4/8X 12 Gb/s Shielded Cage/Connector (HD12sh)”, ftp://ftp.seagate.com/sff, Sep. 22, 2014 (7 pages). |
Number | Date | Country | |
---|---|---|---|
20150370668 A1 | Dec 2015 | US |