The disclosure relates generally to a business solution, and more particularly to analyzing a state changing event of a component of a business solution to determine the root cause of the problem and its impact on the business solution.
In a typical business solution, a large number of information technology (IT) resources are combined and interact with one another to support a business process(es). The resources may be network devices, servers, applications, etc. The resources and business processes in a large scale deployment of a business solution may generate a large number of dependencies among one another such that a problem in one resource may affect other resources and business processes that are directly and/or indirectly dependent on it such that the problem can spread across the system producing a large number of other problems. As such, the success of such a complex business solution will depend on how accurately and quickly the real cause of the problems is determined and solved. That is, identifying the root cause of the problems is required to manage the system efficiently.
A first aspect of the invention is directed to a method for analyzing a state changing event, the method comprising: detecting a state changing event of a first resource; tracing a dependence link beginning at the first resource to a resource that the first resource depends on until finding a second resource having a state changing event that is not dependent one any resource with a state changing event; and identifying the state changing event of the second resource as a root cause incident for analysis.
A second aspect of the invention is directed to a system for analyzing a state changing event, comprising: means for detecting a state changing event of a first resource; means for tracing a dependence link beginning at the first resource to a resource that the first resource depends on until finding a second resource having a state changing event that is not dependent on any resource with a state changing event; and means for identifying the state changing event of the second resource as a root cause incident for analysis.
A third aspect of the invention is directed to a computer program product for analyzing a state changing event, the computer program product comprising: computer usable program code which, when executed by a computer system, enables the computer system to: receive data of a detected state changing event of a first resource; trace a dependence link beginning at the first resource to a resource that the first resource depends on until finding a second resource having a state changing event that is not dependent on any resource with a state changing event; and identify the state changing event of the second resource as a root cause incident for analysis.
Other aspects and features of the present invention, as defined solely by the claims, will become apparent to those ordinarily skilled in the art upon review of the following non-limiting detailed description of the invention in conjunction with the accompanying figures.
The embodiments of this disclosure will be described in detail, with reference to the following figures, wherein:
It is noted that the drawings of the disclosure are not to scale. The drawings are intended to depict only typical aspects of the disclosure, and therefore should not be considered as limiting the scope of the invention. In the drawings, like numbering represents like elements among the drawings.
The following detailed description of embodiments refers to the accompanying drawings, which illustrate specific embodiments of the invention. Other embodiments having different structures and operations do not depart from the scope of the present invention.
Referring to
As shown in
Computing device 104 is shown including a memory 112, a processor (PU) 114, an input/output (I/O) interface 116, and a bus 118. Further, computing device 104 is shown in communication with an external I/O device/resource 120 and a storage system 122. In general, processor 114 executes computer program code, such as event analysis system 132, that is stored in memory 112 and/or storage system 122. While executing computer program code, processor 114 can read and/or write data to/from memory 112, storage system 122, and/or I/O interface 116. Bus 118 provides a communications link between each of the components in computing device 104. I/O interface 116 can comprise any device that enables a user to interact with computing device 104 or any device that enables computing device 104 to communicate with one or more other computing devices. External I/O device/resource 120 can be coupled to the system either directly or through I/O interface 116.
In any event, computing device 104 can comprise any general purpose computing article of manufacture capable of executing computer program code installed thereon. However, it is understood that computing device 104 and event analysis system 132 are only representative of various possible equivalent computing devices that may perform the various processes of the disclosure. To this extent, in other embodiments, computing device 104 can comprise any specific purpose computing article of manufacture comprising hardware and/or computer program code for performing specific functions, any computing article of manufacture that comprises a combination of specific purpose and general purpose hardware/software, or the like. In each case, the program code and hardware can be created using standard programming and engineering techniques, respectively.
Similarly, computer infrastructure 102 is only illustrative of various types of computer infrastructures for implementing the invention. For example, in an embodiment, computer infrastructure 102 comprises two or more computing devices that communicate over any type of wired and/or wireless communications link, such as a network, a shared memory, or the like, to perform the various processes of the disclosure. When the communications link comprises a network, the network can comprise any combination of one or more types of networks (e.g., the Internet, a wide area network, a local area network, a virtual private network, etc.). Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters. Regardless, communications between the computing devices may utilize any combination of various types of transmission techniques.
Event analysis system 132 includes a data collecting unit 140; an operation controller 142; a root cause determination unit 144; an incident establishing unit 146; a previous incident deleting unit 148; an impact analysis unit 150 including a combiner 151; an database querying unit 152; and other system components 158. Other system components 158 may include any now known or later developed parts of event analysis system 132 not individually delineated herein, but understood by those skilled in the art.
According to an embodiment, computer infrastructure 102 and event analysis system 132 may be used to implement, inter alia, analysis unit 14 and relationship database 20 of system 10 (
Inputs to computer infrastructure 102, e.g., through external I/O device/resource 120 and/or I/O interface 116, may include information communicated from event monitoring unit 12 regarding a detected event. Outputs to computer infrastructure 102, e.g., through external I/O device/resource 120 and/or I/O interface 116, may include results of the root cause determination and business impact assessment that are communicated to, e.g., impact solving unit 22 (
An embodiment of the operation of event analysis system 132 is shown in the flow diagram of
In process S2, root cause determination unit 144 traces a dependence link beginning at the resource 34 having the ‘triggering event’, e.g., resource 34b, to an inferior resource 34, until finding a resource 34 that has an event and is not dependent of any resource 34 with an event. The event of the found resource 34 is referred to as an ‘initial root cause’. Note that the triggering event may be found as the initial root cause. According to an embodiment, root cause determination unit 144 coordinates with database querying unit 152 to query relationship database 20 to trace the dependence link(s). It should be appreciated that multiple ‘initial root causes’ may be found in process S2. For example, in the case that resource 34a has a ‘triggering event’, it may be found that resources 34b, 34c and 34d all have events, and in the case that resources 34e and 34f have no events, the events on resources 34b, 34c and 34d will all be identified as ‘initial root causes’ to the ‘triggering event’ of resource 34a. Here, for illustrative purposes, it is assumed that resource 34b itself is found as the ‘initial root cause’. That is, tracing dependence link from resource 34b to inferior resource 34e, root cause determination unit 144 finds that resource 34e does not have an event.
In process S3, operation controller 142 determines whether the ‘initial root cause’ is the ‘triggering event’ itself. If the ‘initial root cause’ is not the ‘triggering event’, operation controller 142 controls the operation to process S7, where incident establishing unit 146 identifies the ‘initial root cause’ as a root cause incident’. If the ‘initial root cause’ is the ‘triggering event’, here, e.g., resource 34b, operation controller 142 controls the operation to process S4.
In process S4, previous incident deleting unit 148 traces a dependent link beginning at the resource with the ‘triggering event’, here resource 34b, to a superior resource 34 (i.e., a resource 34 that depends on resource 34b) that has a state changing event. The event of the ‘superior resource’ 34 is referred to as ‘superior event’ for illustrative purposes. Here, for illustrative purposes, it is assumed that resource 34a has been found as having a ‘superior event’.
In process S5, operation controller 142 determines whether there is a ‘superior resource’ 34 having a ‘superior event’. If there is such a ‘superior resource’, operation controller 142 controls the operation to process S6. In process S6, incident establishing unit 146 identifies the triggering event, here the event of resource 34b, as a root cause incident, and previous incident deleting unit 148 deletes an root cause incident, if any, previously established for the ‘superior event’. If no such ‘superior resource’ 34 is found, operation controller 142 updates a counter and determines whether the counter value reaches a threshold in process S8. If the counter value does not reach the threshold, operation controller 142 controls the operation to pause for a preset period of time in process S9, and then go to process S2 to trace an ‘initial root cause’ again. If the counter value reaches the threshold, operation controller 142 controls the operation to process S6.
In process S10, impact analysis unit 150 analyzes an impact of the root cause incident by tracing a dependence link beginning at the resource 34, here 34b, having the root cause incident to a business process 32 depending on the resource 34, here 34b. Impact assessing unit 150 may coordinate with database querying unit 152 to implement the tracing via relationship database 20. After the dependence link(s) from the resource 34 having the root cause incident to business processes 32 has been identified, impact analysis unit 150 may analyze the potential impact of the root cause incident following the identified dependence link(s). For example, with respect to
While shown and described herein as a method and system for analyzing a state changing event, it is understood that the disclosure further provides various alternative embodiments. For example, in an embodiment, the invention provides a program product stored on a computer-readable medium, which when executed, enables a computer infrastructure to analyze a state changing event to determine the root cause of the problem and its impact on the business solution. To this extent, the computer-readable medium includes program code, such as event analysis system 132 (
As used herein, it is understood that the terms “program code” and “computer program code” are synonymous and mean any expression, in any language, code or notation, of a set of instructions that cause a computing device having an information processing capability to perform a particular function either directly or after any combination of the following: (a) conversion to another language, code or notation; (b) reproduction in a different material form; and/or (c) decompression. To this extent, program code can be embodied as one or more types of program products, such as an application/software program, component software/a library of functions, an operating system, a basic I/O system/driver for a particular computing and/or I/O device, and the like. Further, it is understood that the terms “component” and “system” are synonymous as used herein and represent any combination of hardware and/or software capable of performing some function(s).
The flowcharts and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Although specific embodiments have been illustrated and described herein, those of ordinary skill in the art appreciate that any arrangement which is calculated to achieve the same purpose may be substituted for the specific embodiments shown and that the invention has other applications in other environments. This application is intended to cover any adaptations or variations of the present invention. The following claims are in no way intended to limit the scope of the invention to the specific embodiments described herein.