The present invention relates to problems associated with self-healing in autonomic computer systems, and particularly, the problem of fast and efficient real-time diagnosis in large-scale distributed systems.
Herebelow, numerals presented in square brackets—[ ]—are keyed to the list of references found towards the close of the present disclosure.
In the context of the field of the invention just set forth, conventional techniques (e.g., the codebook approach of Kliger et al [1] and probabilistic inference with active probing approach of Rish et al [2]) typically employ a central event-correlation or inference engine that retains system information and analyzes incoming events. However, as the size of a system increases, both the frequency of events and the computational complexity of inference increase dramatically. A centralized single-engine diagnostic approach quickly becomes intractable and alternative approaches are needed. For example, there has previously been implemented a diagnostic system called RAIL (Real-time Active Inference and Learning) [2] that uses probabilistic real-time inference and relies on IBM's EPP (End-to-end Probing Platform) [3] tool for obtaining system's measurements called probes. Problems with RAIL have been noted in the context of larger systems, or for significantly large portions of an intranet. Accordingly, a need has been recognized in connection with effectively addressing such problems.
Broadly contemplated herein, in accordance with at least one presently preferred embodiment of the present invention, is a “divide-and-conquer” approach to diagnostic tasks (such as those described heretofore) via using parallel (i.e., multi-thread) and distributed (i.e., multi-machine) architectures. As such, the diagnostic task is preferably divided into subtasks and distributed to multiple diagnostic engines that collaborate with each other in order to reach a final diagnosis.
Each diagnostic engine is preferably responsible for some subset of system components (its “region”) and performs the diagnosis using all available observation about these components. When the regions do not intersect, the diagnostic task is trivially parallelized. However, in general, different regions may have common components, and thus the conclusions made by one diagnostic engine may contain useful information for another engine; information exchange between the engines may improve their diagnostic accuracy. To address this issue, there is further proposed herein a distributed diagnostic approach based on probabilistic belief propagation (BP) [4] and its generalizations [5], which yields a naturally parallelizable message-passing algorithm for distributed probabilistic diagnosis, that eliminates computational bottleneck associated with a central monitoring server, and also improves the robustness of monitoring and diagnosis by avoiding single point of failure represented by a central monitoring server. Also proposed herein is a generic architecture that supports BP and allows communication between diagnostic engines that run in parallel either on same or different machines (depending on the scale of diagnosis).
For a better understanding of the present invention, together with other and further features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying drawings, and the scope of the invention will be pointed out in the appended claims.
Although a general approach is broadly contemplated herein, which can be applied to a very wide variety of prospective environments, the disclosure now turns to a specific example of example of a “probing” approach to problem diagnosis [2,3]. A “probe”, as may be broadly understood for the discussion herein, is an end-to-end transaction (e.g., ping, webpage access, database query, an e-commerce transaction, etc.) sent through the system for the purposes of monitoring and testing. Usually, probes are sent from one or more probing stations (designated machines), and ‘go through’ multiple system components, including both hardware (e.g. routers and servers) and software components (e.g. databases and various applications).
Formally, one may consider a set X={X1, . . . , Xn} of system components, a set T={T1, . . . , Tm} of tests (probes), and an m n×dependency matrix [dij] where the columns correspond to the components, the rows correspond to the probes, and dij=1 if executing probe i involves component j, and 0 otherwise. For example,
In the presence of noise, different prior fault probabilities, and multiple failures, one may preferably apply a probabilistic approach to diagnosis that can use a convenient framework of Bayesian networks. The dependency matrix can be mapped to a two-layer Bayesian network [4] where the states of components Xi correspond to upper-level variables and the probes Ti correspond to the lower-layer variables, whose parents are the components influencing the probe's outcome and specified by 1 in corresponding row of the dependency matrix. For example,
P(X,T)=Πi=1nP(X1)Πj=1mP(Tj|pa(Tj))
where P(Xi) is a prior distribution of Xi.
Given the probe outcomes, diagnosis consists in finding most-likely combination of faults that “explain” the observed probe outcomes. Unfortunately, solving this problem exactly can be computationally expensive or even impossible as the exact inference is known to be an NP-hard problem. Thus, in accordance with at least one presently preferred embodiment of the present invention, a “belief propagation” algorithm is preferably employed as a tool of approximation. Preferably, this tool is can also easily be parallelized and thus be implemented in distributed fashion (especially desirable if one prefers to off-load a central management server).
Belief propagation (BP), in essence, may be thought of as a simple linear-time message-passing algorithm that is provably correct on polytrees (i.e., Bayesian networks with no undirected cycles) and that can be used as an approximation on general networks. Preferably, belief propagation passes probabilistic messages between the nodes and can be iterated until convergence (guaranteed only for polytrees); otherwise, it can be stopped at a given number of iterations. The algorithm computes approximate beliefs P(Xi|T) for each node.
By way of a simple (and non-restrictive) example, one may consider a network where several nodes are designated diagnostic nodes (called RAIL—real-time active inference and learning engine nodes), with associated EPP (end-to-end probing software); this is schematically illustrated in
Preferably, iterative belief propagation works by sending messages between nodes and updating probabilities (also called beliefs) at every node, as shown in
Here RAIL1 receives probes T1 and T2 and therefore diagnoses nodes {X1, X2, X3, X5, X6}, while RAIL2 received probe T3 and diagnoses nodes {X2, X3, X4}. Thus, the subsets of nodes intersect due to probe intersection (which is quite common, especially when a probe set needs to be optimized so that a minimal number of probes covers the system) and therefore beliefs obtained by different diagnostic engines about these nodes must be combined. Such combination can be brought about naturally by applying belief propagation in a distributed way, so that each RAIL will be responsible for keeping and updating messages related to its nodes. Clearly all factor nodes in the corresponding factor graph that involve a RAIL's nodes will belong to that RAIL as well.
Preferably, a system architecture (generally, a hierarchical one) will be employed that is a publish-subscribe architecture for message exchange between different diagnostic/monitoring nodes (peers, also called RAILs above) through higher-level “councilors”, using “message patterns” that describe which messages and where should be sent by each RAIL, and which messages it expects to receive from its peers.
Preferably, the system topology (as shown in
Preferably, dynamic message patterns are also supported in order to handle changes in the system, such as leaving and joining nodes both in the system under control and in our diagnostic infrastructure (e.g., addition of new RAIL engines, or unexpected failure of such an engine).
It is to be understood that the present invention, in accordance with at least one presently preferred embodiment, includes elements that may be implemented on at least one general-purpose computer running suitable software programs. These may also be implemented on at least one Integrated Circuit or part of at least one Integrated Circuit. Thus, it is to be understood that the invention may be implemented in hardware, software, or a combination of both.
If not otherwise stated herein, it is to be assumed that all patents, patent applications, patent publications and other publications (including web-based publications) mentioned and cited herein are hereby fully incorporated by reference herein as if set forth in their entirety herein.
Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention.