IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
1. Technical Field
This invention generally relates to computer system health monitoring. More particularly, this invention relates to a system and method for continuous health monitoring.
2. Description of Background
As system functions become more and more complex, the requirements of complete system health reporting grow proportionally. Every network and module which is added to systems becomes one more verification or check point that must be performed, with numerous dependencies existing between each module. Furthermore, any user may demand to receive a health report almost instantaneously. Performing health checks in a manner which ensures usability, correctness, and completeness has proven almost impossible.
System checkout functions have been used throughout early tape products. However, these functions executed an exhaustive check on each user request. Furthermore, the numerous modular checks were performed one-by-one, with some of them lasting several minutes. Although previous implementations provided a complete health report of a system, the execution proved unusable.
A system for continuous health monitoring includes a computer system including a locking mechanism configured to allow multiple health point checks to be accessed simultaneously, a plurality of component health point checks configured to monitor at least one component of the system and configured to store health monitoring statistics in the computer system, and a scheduler configured to periodically enable the plurality of component health point checks based on one of a user request and a predefined amount of time.
A method for continuous health monitoring includes initiating a plurality of component health checks of a computer system includes logging component health check change history in a storage system of the computer system, logging output of the plurality of component health checks, and continuously updating the plurality of component health checks.
Additional features and advantages are realized through the techniques of the exemplary embodiments described herein. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the detailed description and to the drawings.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains an exemplary embodiment, together with advantages and features, by way of example with reference to the drawings.
According to an exemplary embodiment, a method is provided which significantly increases the availability of health statistics for systems. This increase in availability results in a decrease in overall time waiting for health statistics reporting, and may increase the usability of complex systems.
According to example embodiments, a pluggable architecture is provided to give real time health statistics of a distributed system. The system is able to integrate existing modular health checks that may require intermittent polling with newer health checks that can update health statistics in real-time. The real-time health process consists of a persistent store for the health, a set of tools for updating the health statistics, a daemon to run and coordinate the checks, and a display environment that can generate health status reports using a cross-platform format. The architecture allows for maintaining the health status on a set of distributed machines by alerting the remote systems of changes as they occur. If the initial framework is integrated, existing modular health checks are easily implemented and new modular health checks are relatively quickly installed.
Turning to
The system 100 further includes computer storage 104. Storage 104 may be a backend storage system such as a database or file system of a computer system, or alternatively, may be a remote server or storage system such as a computer system or remote computer system. Storage 104 supports a locking mechanism allowing multiple health point updates to occur simultaneously without corruption of vital health statistics.
The system 100 further includes scheduler 105. It is noted that as used herein scheduler 105 may be similar to the daemon described above. Therefore, according to example embodiments, the terms scheduler and daemon may be used interchangeably. Furthermore, a scheduler could be termed a scheduler or scheduling daemon, and a daemon could be termed the same.
Turning back to
In addition to scheduling checks, the scheduler 105 may allow a user to manually execute a health check. The manual execution may be useful if service personnel repair a failed component. If a user manually executes a component check, the same conflicts above must be verified.
The system 100 further includes a plurality of component health checks. For example, the system, as illustrated, includes a plurality of existing modular checks 106 and a plurality of new modular checks 107. The plurality of existing modular checks 106 may be checks existing at system start-up, and/or may be scheduled to run at allotted time intervals. The plurality of new modular checks 107 may be checks inserted after system start-up in the modular system and/or may be run based on events (i.e., event driven checks). The component health checks may be responsible for actually verifying the status of various components in the system, and reporting the status using the health point storage mechanism (e.g., storage 104).
All component health checks may manage at least one (or more) health points using the health point storage mechanism. In addition, each component health check may log details about each individual health point check run. Log files may be archived using a standardized mechanism. Log files may be used by service personnel or support personnel to assist in diagnosing problems with a system. The storage mechanism may be a portion of a computer system being monitored, or part of a remote computer system as described above. Hereinafter, a method of health monitoring is described With reference to
Turning to
If health checks are initiated, the method 200 includes logging change history (block 206), logging health check output (block 207), updating health points (block 208), and logging daemon output (block 209) in a relatively parallel manner. Alternatively, the method 200 may perform blocks 206, 207, 208, and 209 in any other parallel and/or sequential combination. Upon completion of health checks (see terminal block 210), the method may return to the wait interval loop 203-204, or terminate health checks until the system restarts the daemon or a user initiates the health checks again.
System health may be reported to an end-user and/or service user via several different interfaces (e.g., text-based interfaces, web interfaces, etc). Turning to
The method further includes reading a cached file at block 303. The cached file may be stored in a storage area (e.g., storage 104). The cached file may include health statistic logs reflecting health check results from a plurality of health checks, descriptions of health checks, and/or other vital health check information. The results may have been stored from a plurality of instances of a health monitoring method as described with reference to
As shown in
According to at least one example embodiment, the health check information is formatted into a platform independent format. For example, this platform independent format may be accessible by a webpage, a user terminal, a user interface, or a command line interface. An example of a platform independent format may be extensible markup language (XML) format or other somewhat similar formats allowing multiple computing platform access to health information after formatting.
The health reporting mechanism may also be responsible for combining health points into virtual health objects. Virtual health objects may be used in order to combine several individual health points into a single “virtual” component. For example, a virtual object of a car may include health points of the tires, engine, transmission, etc.
The health check storage and reporting mechanisms described hereinbefore may be extendable to a distributed system environment. For example,
According to
Furthermore, according to an exemplary embodiment, the methodologies described hereinbefore may be implemented by a computer system or apparatus. For example,
The computer program product may include a computer-readable medium having computer program logic or code portions embodied thereon for enabling a processor (e.g., 502) of a computer apparatus (e.g., 500) to perform one or more functions in accordance with one or more of the example methodologies described above. The computer program logic may thus cause the processor to perform one or more of the example methodologies, or one or more functions of a given methodology described herein.
The computer-readable storage medium may be a built-in medium installed inside a computer main body or removable medium arranged so that it can be separated from the computer main body. Examples of the built-in medium include, but are not limited to, rewriteable non-volatile memories, such as RAMs, ROMs, flash memories, and hard disks. Examples of a removable medium may include, but are not limited to, optical storage media such as CD-ROMs and DVDs; magneto-optical storage media such as MOs; magnetism storage media such as floppy disks (trademark), cassette tapes, and removable hard disks; media with a built-in rewriteable non-volatile memory such as memory cards; and media with a built-in ROM, such as ROM cassettes.
Further, such programs, when recorded on computer-readable storage media, may be readily stored and distributed. The storage medium, as it is read by a computer, may enable the method(s) disclosed herein, in accordance with an exemplary embodiment of the present invention.
With an exemplary embodiment of the present invention having thus been described, it will be obvious that the same may be varied in many ways. The description of the invention hereinbefore uses this example, including the best mode, to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims. Such variations are not to be regarded as a departure from the spirit and scope of the present invention, and all such modifications are intended to be included within the scope of the present invention as stated in the following claims.