I. Technical Field
The present invention generally relates to the field of enterprise path management. More particularly, the invention relates to systems and methods for path impact analysis in a storage area network.
II. Background Information
Enterprise storage systems store data in large-scale environments and differ from consumer storage systems in both the size of the environment and the types of technologies that store and manage the data. A large-scale environment that stores data is typically referred to as a storage area network (SAN). SANs are commonly used in enterprise storage systems to transfer data between computer systems and storage devices. A typical SAN provides a communication infrastructure, including physical connections between computer systems, storage devices, and a management layer, which organizes the connections, storage devices, and computer systems.
In a SAN environment, computer systems, typically referred to as hosts, connect to the SAN via one or more host bus adapters. The SAN itself may include thousands of different inter-related logical and physical entities. In the case of a Fibre Channel SAN, these entities, which comprise the connections between hosts and storage devices may include Fibre Channel host bus adapters, Fibre Channel switches, Fibre Channel routers, and the like. The entities may be physically connected through the use of twisted-pair copper wire, optical fiber, or any other means of signal transmission.
Storage devices may include multiple disk drives that combine to form a disk array. A typical disk array includes a disk array controller, a cache, disk enclosures, and a power supply. Examples of disk arrays include the SYMMETRIX Integrated Cache Disk Array System and the CLARIION Disk Array System, both available from EMC Corporation of Hopkinton, Mass. A disk array controller is a piece of hardware that provides storage services to computer systems that access the disk array. The disk array controller may attach to a number of disk drives that are located in the disk enclosures. For example, the disk drives may be organized into RAID groups for efficient performance. RAID (redundant array of inexpensive disks) is a system that uses multiple disk drives that share or replicate data among the drives. Accordingly, in a RAID system, instead of identifying several different hard drives, an operating system will identify all of the disk drives as if they are a single disk drive.
Disk array controllers connect to a SAN via a port. A port serves as an interface between the disk array controller and other devices, such as the hosts, in the SAN. Each disk array controller typically includes two or more ports. Disk array controllers may communicate with other devices using various protocols, such as the SCSI (Small Computer System Interface) command protocol over a Fibre Channel link to the SAN. In the SCSI command protocol, each device is assigned a unique numerical identifier, which is referred to as a logical unit number (LUN). Further, communication using the SCSI protocol is said to occur between an “initiator” (e.g., a host) and a “target” (e.g., a storage device) via a path. For example, a path may include a host bus adapter, an associated SCSI bus or Fibre Channel cabling, and a single port of a disk array controller.
Management of a path is provided by a path management software. An example of the path management software is EMC POWERPATH system developed by EMC Corporation of Hopkinton, Mass. Path management software is a host-based software solution that is used to manage SANs and, among other things, can detect load imbalances for disk array controllers in a SAN and can select alternate paths through which to route data. In present systems, the path management software selects alternate paths after realizing that a first path has failed. Path failure may occur, for example, from the complete or partial failure of components within the SAN. However, in a SAN that may comprise thousands of entities, the path management software is unable to detect the root cause of the path failure. Thus, in selecting an alternate path, the path management software simply avoids all paths having a common end point as the failed path. However, there may be many paths with uncommon end points from the failed path which include the failed entity. Therefore, this method of alternate path selection is inefficient.
In view of the foregoing, what is needed is a system and method that is capable of utilizing the path configuration information of a path management software, the topology of a SAN, and an identified root cause failure of a path in the selection algorithm/method of an alternate path. The obtaining and coordination of path configuration information, topology information, and detected root cause path failure and its impact on data path may be centralized in one entity, such as a path impact analysis server.
Consistent with an embodiment of the present invention, a method is provided for selecting a path for an I/O in a storage area network. The method may comprise receiving path configuration information for paths associated with a host device connected to the storage area network, a listing of components within the storage area network, and a notification of a component failure within the storage area network. The method may also comprise correlating the received path configuration information, the received listing of components, and the received notification of component failure to determine one or more paths associated with the host device affected by the component failure. The method may further comprise transmitting to the host device an alert for the one or more affected paths.
Consistent with another embodiment of the present invention, a system is provided for selecting a path for an I/O in a storage area network. The system may comprise a host device connected to the storage area network. The system may also comprise a path impact analysis server for receiving path configuration information associated with the host device, a listing of components within the storage area network, and a notification of a component failure within the storage area network, wherein the path impact analysis server correlates the received path configuration information, the received listing of components, and the received notification of component failure to determine one or more paths associated with the host device affected by the component failure and transmits to the host device an alert for the one or more affected paths.
Consistent with yet another embodiment of the present invention, a computer-readable medium is provided which includes program instructions for performing, when executed by a processor, a method for selecting a path for an I/O in a storage area network. The method may comprise receiving path configuration information associated with a host device connected to the storage area network, a listing of components within the storage area network, and a notification of a component failure within the storage area network. The method may also comprise correlating the received path configuration information, the received listing of components, and the received notification of component failure to determine one or more paths associated with the host device affected by the component failure. The method may further comprise transmitting to the host device an alert for the one or more affected paths.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention or embodiments thereof, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate various embodiments and aspects of the present invention. In the drawings:
The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the following description to refer to the same or similar parts. While several exemplary embodiments and features of the invention are described herein, modifications, adaptations and other implementations are possible, without departing from the spirit and scope of the invention. For example, substitutions, additions or modifications may be made to the components illustrated in the drawings, and the exemplary methods described herein may be modified by substituting, reordering, or adding steps to the disclosed methods. Accordingly, the following detailed description does not limit the invention. Instead, the proper scope of the invention is defined by the appended claims.
The methods and apparatus of the present invention are intended for use in storage area networks (SANs) that include data storage systems, such as the SYMMETRIX Integrated Cache Disk Array System or the CLARIION Disk Array System available from EMC Corporation of Hopkinton, Mass. and those provided by vendors other than EMC Corporation.
The methods and apparatus of the present invention may take the form, at least partially, of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, random access or read only-memory, or any other machine-readable storage medium, including transmission medium. When the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The methods and apparatus of the present invention may be embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via any other form of transmission. When implemented on a general-purpose processor, the program code combines with the processor to provide a unique apparatus that operates analogously to specific logic circuits. The program code (software-based logic) for carrying out the method is embodied as part of the system described below.
As illustrated in
Storage area network (SAN) 110 provides communications between the various entities in system 100, such as the SAN management entities 120-140, host 160, and storage device 170. SAN 110 may be a shared, public, or private network and encompasses a wide area or local area. SAN 110 may be implemented through any suitable combination of wired and/or wireless communication networks. Furthermore, SAN 110 may include a local area network (LAN), a wide area network (WAN), an intranet, or the Internet.
Next, for each path nexus reported by path management server 155, the path impact analysis server 130 may obtain a listing of components with the SAN 110 from the topology server 120 (step 220). Moreover, the path impact analysis server 130 may obtain notifications regarding the outage of components within the SAN 110 from the root cause analysis server 140 (step 230). The root cause analysis server 140 may pinpoint component failures on the basis of SAN management alerts from the SAN components or their proxy agents.
Subsequent to obtaining the information from the path management server 155, the topology server 120, and the root cause analysis server 140, the path impact analysis server 130 correlates the failed component(s) of the SAN 110 with all known path nexuses and determines the impact of the failure/degradation of a component on one or more paths of interest (step 240). Finally, the path impact analysis server 130 may send alerts for each affected host 160 containing references to the affected paths. The steps of
As used herein the term “path failure” refers to the inability of the path to transmit an I/O from an initiator to a target. The term “path failure” may also refer to any measurable degradation in transmission metrics of the path. For example, transmission metrics of the path may include mean queue length, mean response time, mean throughput, and the like. The degradation of transmission metric needed to constitute a path failure may be variable, and may be defined by an administrator of the SAN. Moreover, a failure of a component in a path may include the complete failure of the component or a partial failure of the component, such as the failure of a port of the component.
The paths (i.e., the physical and logical paths between the host 160 and the storage device 170) are referenced in the path management software 165. Based upon the number of FC switches 111-114, the number of ports on each FC switch, the number of HBAs 166 on each host 160, and the number of ports on the storage device 170, there may be hundreds of paths. These multiple paths allow the path management software 165 to switch to an alternate path in the event of a path failure.
The multiple paths may be retrieved from the path management software 165 through the initiation of a “get” function by the path impact analysis server 130. The multiple paths may be logically grouped and transmitted to the path impact analysis server 130 in response to receiving the “get” function. For example, a logical group may include a path redundancy group, which represents the general attributes and properties of the associated multiple data paths.
In addition to receiving the multiple data paths of the path management software 165, the path impact analysis server 130 may also receive the topology of the SAN from the topology server 120. That is, for each data path reported by the path management software 165, the path impact analysis server obtains and correlates a listing of the SAN components that are used to create and enable the data path from the topology server 120. The topology server 120 may be capable of retrieving both the physical topology and logical topology of the SAN 110. The physical topology of the SAN 110 may include the location of the FC switches 111-114 and the layout of the signal transmission connections between the FC switches 111-144. The logical topology of the SAN 110 may include the mapping of the apparent connections that may be made as I/O communications travel along the multiple data paths of the SAN 110.
Topology server 120 may store the topology of the SAN 110 in a configuration table. The configuration table may include the components of the SAN 110, wherein the components may be identified by world-wide name (WWN), product identification, serial number, and/or any other identifier. The configuration table may further store status and state information about a host 160, such as whether the host 160 has established paths to a particular storage device 170. For example, the configuration table may indicate, for each host, all paths that the host has established and the storage device for each path.
A determination of a root cause in the event of a port failure, for example, port 1 of FC switch 114, is illustrated in
A correlation table stored in the root cause analysis server 140 may compare path failure notifications received from the path impact analysis server 130 to measure elements of cause stored in the correlation table in order to determine the most likely root cause of the one or more of the observed symptoms or events. Other symptoms or events of a path may also be also be reported to the root cause analysis server 140 by the path management software 165 and/or other related SAN components. The correlation table may include, for example, specific measured elements to determine a specific event, such as a switch port failure.
In another embodiment, EMC Smarts Codebook Correlation Technology may be utilized by the root cause analysis server 140 as a correlation technique. In this embodiment, a codebook is provided that includes a mapping between each of a plurality of groups of possible path symptoms and one of a plurality of likely exceptional events (e.g., problems) in the system. When a path failure notification is received from the path impact analysis server 130, a mismatch measurement is determined between each of the plurality of groups of possible path failures in the mapping and the one or more known path failures, while disregarding path failures in the groups of possible path failures not determined to be known. One or more of the plurality of likely problems is selected corresponding to one of the plurality of groups having the smallest mismatch measure. As indicated above, other symptoms or events of a path may also be also be reported to the root cause analysis server 140 by the path management software 165 and/or other related SAN components.
The path management software 165 may use the notification sent by the path impact analysis server 130 to select a specific alternate path to avoid the failed component. Moreover, when a component failure/degradation has been fixed, the root cause analysis server 140 may notify the path impact analysis server 130. Subsequently, the path impact analysis server 130 may release the alert for paths that were previously affected by that component. The path management software 165 may then add the paths as candidates for future selection. As a result of receiving alerts and the release of alerts from the path impact analysis server 130, the path management software 165 has the knowledge of when to test a path for availability and hence, the efficiency of the path management software 165 is improved.
The foregoing description has been presented for purposes of illustration. It is not exhaustive and does not limit the invention to the precise forms or embodiments disclosed. Modifications and adaptations of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed embodiments of the invention. For example, the described implementations include software, but systems and methods consistent with the present invention may be implemented as a combination of hardware and software or in hardware alone. Examples of hardware include computing or processing systems, including personal computers, servers, laptops, mainframes, micro-processors and the like. Additionally, although aspects of the invention are described for being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on other types of computer-readable media, such as secondary storage devices, for example, hard disks, floppy disks, or CD-ROM, the Internet or other propagation medium, or other forms of RAM or ROM.
Computer programs based on the written description and methods of this invention are within the skill of an experienced developer. The various programs or program modules can be created using any of the techniques known to one skilled in the art or can be designed in connection with existing software. For example, program sections or program modules can be designed in or by means of Java, C++, HTML, XML, or HTML with included Java applets. One or more of such software sections or modules can be integrated into a computer system or browser software.
Moreover, while illustrative embodiments of the invention have been described herein, the scope of the invention includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations and/or alterations as would be appreciated by those in the art based on the present disclosure. The limitations in the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application, which examples are to be construed as non-exclusive. Further, the steps of the disclosed methods may be modified in any manner, including by reordering steps and/or inserting or deleting steps, without departing from the principles of the invention. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims and their full scope of equivalents.
This patent application is a continuation of and claims priority to application Ser. No. 11/819,612, filed Jun. 28, 2007 and entitled “STORAGE ARRAY NETWORK PATH IMPACT ANALYSIS SERVER FOR PATH SELECTION IN A HOST-BASED I/O MULTI-PATH SYSTEM,” which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5568491 | Beal et al. | Oct 1996 | A |
5941992 | Croslin et al. | Aug 1999 | A |
6636981 | Barnett et al. | Oct 2003 | B1 |
6704812 | Bakke et al. | Mar 2004 | B2 |
6725295 | Iwatani | Apr 2004 | B2 |
6725401 | Lindhorst-Ko | Apr 2004 | B1 |
6754853 | DeKoning et al. | Jun 2004 | B1 |
6804712 | Kracht | Oct 2004 | B1 |
7111084 | Tan et al. | Sep 2006 | B2 |
7257744 | Sabet et al. | Aug 2007 | B2 |
7260628 | Yamamoto et al. | Aug 2007 | B2 |
7307948 | Infante et al. | Dec 2007 | B2 |
7318138 | Usgaonkar et al. | Jan 2008 | B1 |
7340649 | Angamuthu et al. | Mar 2008 | B2 |
7349961 | Yamamoto | Mar 2008 | B2 |
7370241 | Nicholson et al. | May 2008 | B2 |
7376764 | Todd | May 2008 | B1 |
7406039 | Cherian et al. | Jul 2008 | B2 |
7409586 | Bezbaruah et al. | Aug 2008 | B1 |
7454533 | Miki | Nov 2008 | B2 |
7475328 | Kubota et al. | Jan 2009 | B2 |
7565570 | Dohi | Jul 2009 | B2 |
7617320 | Alon et al. | Nov 2009 | B2 |
7634691 | Komatsu et al. | Dec 2009 | B2 |
7707151 | Blumenau et al. | Apr 2010 | B1 |
7756830 | Chao et al. | Jul 2010 | B1 |
7783779 | Scales et al. | Aug 2010 | B1 |
20020080445 | Falkenstein et al. | Jun 2002 | A1 |
20020156888 | Lee et al. | Oct 2002 | A1 |
20020156918 | Valdevit et al. | Oct 2002 | A1 |
20030051195 | Bosa et al. | Mar 2003 | A1 |
20030204786 | Dinker et al. | Oct 2003 | A1 |
20030210416 | Lewis et al. | Nov 2003 | A1 |
20030236074 | Ishii et al. | Dec 2003 | A1 |
20040078632 | Infante et al. | Apr 2004 | A1 |
20040117369 | Mandal et al. | Jun 2004 | A1 |
20040205238 | Doshi et al. | Oct 2004 | A1 |
20040210656 | Beck et al. | Oct 2004 | A1 |
20040260736 | Kern et al. | Dec 2004 | A1 |
20050015685 | Yamamoto | Jan 2005 | A1 |
20050028028 | Jibbe | Feb 2005 | A1 |
20050108450 | Sahara et al. | May 2005 | A1 |
20050144511 | Chih | Jun 2005 | A1 |
20050188126 | Mashima et al. | Aug 2005 | A1 |
20050262562 | Gassoway | Nov 2005 | A1 |
20060026346 | Kadoiri et al. | Feb 2006 | A1 |
20060034181 | Noguchi et al. | Feb 2006 | A1 |
20060047712 | Shitomi et al. | Mar 2006 | A1 |
20060143332 | Yagi et al. | Jun 2006 | A1 |
20060165002 | Hicks et al. | Jul 2006 | A1 |
20060209707 | Nakaya et al. | Sep 2006 | A1 |
20060253526 | Welch et al. | Nov 2006 | A1 |
20070055797 | Shimozono | Mar 2007 | A1 |
20070093124 | Varney et al. | Apr 2007 | A1 |
20070112974 | Shirogane et al. | May 2007 | A1 |
20070153683 | McAlpine | Jul 2007 | A1 |
20070169186 | Ueoka et al. | Jul 2007 | A1 |
20070234113 | Komatsu et al. | Oct 2007 | A1 |
20080228987 | Yagi et al. | Sep 2008 | A1 |
20090125754 | Chandra et al. | May 2009 | A1 |
20090210620 | Jibbe et al. | Aug 2009 | A1 |
20090257361 | Deshpande et al. | Oct 2009 | A1 |
Entry |
---|
http://enterprise.symantec.com, Data Sheet: Storage Management, Veritas CommandCentral™ Storage by Symatec, Centralized Visibility and Control Across Heterogeneous Storage Environments, pp. 1-4. |
Sanscreen®, The Enabler for Storage service Management, Jan. 2006. |
RSA Technology Solution Brief: Powerpath® Encryption with RSA: The EMC Solution for Securing Data in Enterprise Storage, 16 pages. |
Office Action dated Jun. 29, 2010 for U.S. Appl. No. 11/819,611, 17 pages. |
Response to Office Action dated Mar. 25, 2010 for U.S. Appl. No. 11/819,611 (response filed Apr. 16, 2010), 14 pages. |
Office Action dated Mar. 9, 2009 U.S. Appl. No. 11/475,060, filed Jun. 27, 2006, 16 pages. |
Response to Office Action dated Mar. 9, 2009 (submitted Jun. 4, 2009) for U.S. Appl. No. 11/475,060, filed Jun. 27, 2006, 14 pages. |
Final Office Action dated Feb. 3, 2010 U.S. Appl. No. 11/475,060, 12 pages. |
Response to Final Office Action dated Feb. 3, 2010 for U.S. Appl. No. 11/475,060 8 pages. |
U.S. Appl. No. 11/819,611, filed Jun. 28, 2007, file through Jul. 11, 2012, 284 pages. |
U.S. Appl. No. 11/475,060, filed Jun. 27, 2006, file through Jul. 11, 2012, 271 pages. |
U.S. Appl. No. 11/819,612, filed Jun. 28, 2007, file through Jul. 11, 2012, 351 pages. |
U.S. Appl. No. 11/819,609, filed Jun. 28, 2007, file through Jul. 11, 2012, 1021 pages (3 parts). |
Number | Date | Country | |
---|---|---|---|
20120233494 A1 | Sep 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11819612 | Jun 2007 | US |
Child | 13476072 | US |