The invention relates generally to computer networks, and more specifically to apparatus and methods for modeling and analyzing Storage Area Networks.
Storage Area Networks (SANs) have considerably increased the ability of servers to add large amounts of storage capability without incurring significant expense or service disruption for re-configuration. However, the ability to analyze SAN performance and/or availability has been limited by the models that have been employed. The lack of a systematic model of behavior specifically suited for the SAN objects and relationships limits several forms of important analysis. For example, it is difficult to determine the impact in the SAN, in the overall system and/or on the applications of failures in SAN components. Another example is determining the root cause problems that cause symptoms in SAN, in the overall system and/or on the applications.
Hence, there is a need in the industry for a method and system for analyzing and modeling Storage Area Networks to determine root-cause failures and impacts of such failures.
A method and apparatus for logically representing and performing an analysis on a Storage Area Network (SAN) is disclosed. The method comprising the steps representing selected ones of a plurality of components and the relationship among the components associated with the SAN, providing a mapping between a plurality events and a plurality of observable events occurring among the components, wherein the mapping is represented as a value associating each event with each observable event, and performing the system analysis based on the mapping of events and observable events. In another aspect of the invention, a method and apparatus are disclosed for representing and performing an analysis on a SAN wherein the SAN is included in a larger system logically represented as a plurality of domains is disclosed. In this aspect of the invention, the method comprises the steps of representing selected ones of a plurality of components and relationship among the components , wherein at least one of the plurality of components is associated with at least two of the domains, providing a mapping between a plurality events and a plurality of observable events occurring among the components, wherein the mapping is represented as a value associating each event with each observable event, and performing the system analysis based on the mapping of events and observable events.
It is to be understood that these drawings are solely for purposes of illustrating the concepts of the invention and are not intended as a definition of the limits of the invention. The embodiments shown in the figures herein and described in the accompanying detailed description are to be used as illustrative embodiments and should not be construed as the only manner of practicing the invention. Also, the same reference numerals, possibly supplemented with reference characters where appropriate, have been used to identify similar elements.
Returning to
Extents 340, more specifically, are units of allocation of disks, memory etc., and represent a generalization of the traditional storage block concept A volume is composed of extents 340 and is used to create a virtual space for the file system. For example, references to drives C:, D:, E:, etc. may be associated with logical volume labels within, for example, the MICROSOFT WINDOWS operating system. MicroSoft and Windows are registered trademarks of Microsoft Corporation, Redmond, Wash., USA.
The storage pool 320 is representative of a plurality of extents 340 and used for administrative purposes. In this case, when allocation of a volume is desired, the storage pool manager selects a plurality of extents 340 and designates selected extents 340 as a volume 330. Thus, the file system 240 (
Intersection points or intersection associations between domains may further be determined. For example, file server 130 represents an intersection point between domains 210 and 230, as previously noted, and between domains 410 and 420. Similarly, host 315 represents an intersection between domains 410 and 420. Knowledge of intersection points is advantageous as an error or fault in a domain that impacts an intersection point may generate failures and/or error messages in other domains. That is, intersection points function as conduits for events across intersecting domains. For example, an error in disk 150, for example, affects extent 340, which in turn affects volume 330, which further affects file system 240. Hence, errors in file system 240 may generate errors or detectable events in application domain 230 as application 235 may use a file serviced by file system 240. Similarly, a failure in disk 150 may affect file server 130 if file server 130 hosts a file system that allocates volumes that use disk 150 and may further create problems or detectable events in applications accessing disk 150.
Abstract model 1010 is known to represent a managed system 1012 containing selected ones of the physical network components 1030, e.g., nodes, routers, computer systems, disk drives, etc., and/or logical network components 1050, e.g., software, application software, ports, disk drive designation, etc. Those network elements or components that are selected for representation in the model are referred to as managed components. The representation of the managed components includes aspects or properties of the component. The relationships between the managed components, as they have been shown in
Further shown is object class ICIM_PhysicalElement 1030 that includes object class Physical Package 1032, which represents physical components such as physical storage disk 150. Object class ICIM_LogicalDevice includes object class StorageExtent 1042, which represents Extent 340 and Extent 340 is in communication with StorageVolume 330.
With respect of the model of Storage Area Networks described herein, a root-cause determination or an impact analysis may be determined by a correlation function, similar to that disclosed in the aforementioned commonly-owned U.S. patents and US patent application.
As an example of the root cause analysis consider a failure occurring in Extent 340. A failure or problem in Extent 340 may create detectable events or symptoms in File System 240, as File System 240 can no longer access data mapped into Extent 340. The failure may further create a detectable event or symptom in Application 235 when Application 235 makes a request to obtain data from File System 240. In some aspects, although a failure may occur, symptom may or may not be generated indicating that a component, e.g., Extent 240, is experiencing failures. The root-cause correlation must be powerful enough to be able to deal with scenarios in which symptoms are generated indicating the condition of Extent 240 and cases when symptoms are not generated. In both situations, the root-cause correlation diagnoses the Extent as the root cause. A root cause analysis of the SAN, similar to that described in the aforementioned US patents and patent application determines from the exemplary causality matrix shown, herein, and symptoms observed in the managed system the most likely root cause of the problem. In this case, the symptoms or observable events are further associated with the components associated with at least two domains, i.e., an intersection point or an association.
As a second example consider the failure of Storage Disk 150. A problem in Storage Disk 150 may cause symptoms as if all Extents in the storage disk itself are failing simultaneously. A problem in Storage Disk 150 may cause symptoms in File System 240, as File System 240 will not be able to access its data stored in Extent 340, which is part of Storage Disk 150. Similarly, it may cause symptoms in Application 235, as Application 235 will fail to access data stored in Extent 340, which is part of Storage Disk 150, from the File System 240. Similarly, a problem in the Storage disk may or may not cause symptoms in the Extents 340 that has a “RealizedBy” relationship with the failing Storage Disk. In addition, a problem in the Storage Disk, may or may not cause symptoms on the Storage Disk itself.
Input/output devices 1302, processors 1303 and memories 1304 may communicate over a communication medium 1325. Communication medium 1325 may represent, for example, a bus, a communication network, one or more internal connections of a circuit, circuit card or other apparatus, as well as portions and combinations of these and other communication media. Input data from the client devices 1301 is processed in accordance with one or more programs that may be stored in memories 1304 and executed by processors 1303. Memories 1304 may be any magnetic, optical or semiconductor medium that is loadable and retains information either permanently, e.g. PROM, or non-permanently, e.g., RAM. Processors 1303 may be any means, such as general purpose or special purpose computing system, such as a laptop computer, desktop computer, a server, handheld computer, or may be a hardware configuration, such as dedicated logic circuit, or integrated circuit. Processors 1303 may also be Programmable Array Logic (PAL), or Application Specific Integrated Circuit (ASIC), etc., which may be “programmed” to include software instructions or code that provides a known output in response to known inputs. In one aspect, hardware circuitry may be used in place of, or in combination with, software instructions to implement the invention. The elements illustrated herein may also be implemented as discrete hardware elements that are operable to perform the operations shown using coded logical operations or by executing hardware executable code.
In one aspect, the processes shown herein may be represented by computer readable code stored on a computer readable medium. The code may also be stored in the memory 1304. The code may be read or downloaded from a memory medium 1383, an I/O device 1385 or magnetic or optical media, such as a floppy disk, a CD-ROM or a DVD, 1387 and then stored in memory 1304. Or may be downloaded over one or more of the illustrated networks. As would be appreciated, the code may be processor-dependent or processor-independent. JAVA is an example of processor-independent code. JAVA is a trademark of the Sun Microsystems, Inc., Santa Clara, Calif. USA.
Information from device 1301 received by I/O device 1302, after processing in accordance with one or more software programs operable to perform the functions illustrated herein, may also be transmitted over network 1380 to one or more output devices represented as display 1385, reporting device 1390 or second processing system 1395.
As one skilled in the art would recognize, the term computer or computer system may represent one or more processing units in communication with one or more memory units and other devices, e.g., peripherals, connected electronically to and communicating with the at least one processing unit. Furthermore, the devices may be electronically connected to the one or more processing units via internal busses, e.g., ISA bus, microchannel bus, PCI bus, PCMCIA bus, etc., or one or more internal connections of a circuit, circuit card or other device, as well as portions and combinations of these and other communication media or an external network, e.g., the Internet and Intranet.
While there has been shown, described, and pointed out fundamental novel features of the present invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the apparatus described, in the form and details of the devices disclosed, and in their operation, may be made by those skilled in the art without departing from the spirit of the present invention. It would be recognized that the invention is not limited by the model discussed, and used as an example, or the specific proposed modeling approach described herein. For example, it would be recognized that the method described herein may be used to perform a system analysis may include: fault detection, fault monitoring, performance, congestion, connectivity, interface failure, node failure, link failure, routing protocol error, routing control errors, and root-cause analysis.
It is expressly intended that all combinations of those elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Substitutions of elements from one described embodiment to another are also fully intended and contemplated.
This application is a continuation-in-part of, and claims the benefit, pursuant to 35 USC 120, of the earlier filing date of co-pending U.S. patent application Ser. No. 10/813,842, entitled “Method and Apparatus for Multi-Realm System Modeling” filed Mar. 31, 2004, the contents of which are incorporated by reference herein, and further claims the benefit, pursuant to 35 USC 119(e), of the earlier filing date of U.S. Provisional Patent Application Ser. No. 60/647,107, entitled “Method and Apparatus for Analyzing and Problem Reporting in Storage Area Networks,” filed on Jan. 26, 2005, the contents of which are incorporated by reference herein. This application is related to co-pending U.S. patent application Ser. No 11/077,932 entitled “Apparatus and Method for Event Correlation and Problem Reporting,” which is a continuation of U.S. Pat. No. 6,868,367, filed on Mar. 27, 2003, which is a continuation of U.S. patent application Ser. No. 09/809,769 filed on Mar. 16, 2001, now abandoned, which is a continuation of U.S. Pat. No. 6,249,755, filed on Jul. 15, 1997, which is a continuation of U.S. Pat. No. 5,661,668, filed on Jul. 12, 1996, which is a continuation of application Ser. No. 08/465,754, filed on Jun. 6, 1995, now abandoned, which is a continuation of U.S. Pat. No. 5,528,516, filed on May 25, 1994, which is a continuation of U.S. Pat. No. 6,249,755, filed on Jul. 15, 1997, which is a continuation of U.S. Pat. No. 5,661,668, filed on Jul. 12, 1996, which is a continuation of application Ser. No. 08/465,754, filed on Jun. 6, 1995, now abandoned, which is a continuation of U.S. Pat. No. 5,528,516, filed on May 25, 1994, the contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60647107 | Jan 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10813842 | Mar 2004 | US |
Child | 11176982 | Jul 2005 | US |