This application is based upon and claims the benefit of priority from prior Japanese Patent Application No. 2007-159283, filed Jun. 15, 2007, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to an apparatus and a method for analyzing a fault which may occur in an information system and the management work needed upon the occurrence of the fault.
2. Description of the Related Art
The recent complication of information systems has posed the problem of an increased management cost of information systems. In view of this, a research is underway for a self-management system with the aim of reducing the management cost of an information system by causing the information system to manage itself (see, for example, “What is Autonomic Computing?”, SoftBank Publishing, ISBN: 4797330376”).
The self-management system realizes automation of the management work required for a fault based on, for example, information on a fault detection method and a repairing method (hereinafter referred to as “operation knowledge”). The operation knowledge is arranged in the self-management system and requires maintenance and control. In the case where the configuration of a system involved undergoes a change or a trouble-shooting method not included in the initial operation knowledge is found, for example, the operation knowledge is required to be corrected.
As a method of maintenance and control of the operation knowledge in the self-management system, a distributed management type self-healing technique has been conceived (see, for example, JP-A 2007-128185 (KOKAI) and “A self-healing technique based on encapsulated operation knowledge, Proceedings of the 3rd IEEE International Conference on Autonomic Computer (ICAC'06), June 2006”). In this technique, the operation knowledge is expressed for each component making up a constituent element of the information system in a form independent of a specified system configuration or other components. As a result, the modularity of the operation knowledge for each component is improved and the maintenance thereof facilitated.
A system using the distributed management type self-healing technique (hereinafter referred to as “the distributed management type self-repairing system”) is constructed by combining the operation knowledge for each component thereby to perform the self-healing operation. Nevertheless, the operation knowledge for the components, which are independent of each other and managed separately from each other, may undergo a change. In such a case, the self-healing operation cannot be grasped with the distributed management type self-healing system constructed by combining the operation knowledge. Specifically, the distributed management type self-healing system is unable to verify the legitimacy of the self-healing operation, and therefore, poses the problem that it cannot be used for a system requiring a high reliability.
In order to verify that the self-repairing operation is legitimate, it is necessary to grasp a fault that may occur in the information system to be analyzed and to confirm that the appropriate management work is related to each fault. Specifically, the analysis of a fault and the management work regarding the fault in the information system to be analyzed is required.
In one method of analyzing a fault that may occur in the information system, an application of the analysis method based on a state transition model may be considered (see, for example, “Concurrency State Models & Java (registered trademark) Programming, Jeff Magee and Jeff Kramer, John Wiley & Sons Inc.”).
In the conventional system analysis method based on the state transition model, however, automatic transition caused by execution of the functions of the information system and manual transition caused by the artificial operation performed by the human system manager or the like are not distinguished from each other, and therefore, the problem is posed that the relation between a fault and the management work required for the fault is not clear.
Also, in the information system capable of various processes and manual operation, the state transition model to be analyzed is so large in scale and complicated that a long time may be required for fault analysis, on top of the high cost and capacity of the storage unit.
According to an aspect of the present invention, there is provided a system analysis apparatus for an information system to be analyzed, comprising: a storage unit which stores state information indicating a plurality of states of the system identified by state identifiers, and transition information including state transition information indicating a transition between said plurality of states and classification information indicating whether the transition is an automatic one or a manual one; an analysis unit configured to acquire restriction information indicating a restriction to be satisfied in the case where the information system is normal and to specify an anomalous state of the information system failing to satisfy the restriction based on the restriction information acquired and the transition information; and an output unit configured to retrieve the transition from the anomalous state specified by the analysis unit to the normal state, based on the state information and the transition information, to generate a management work with reference to the retrieval result, and to output management work information indicating the generated management work as related to the anomalous state.
First, with reference to
In
The transition from the anomalous state shown in
(1) The server module 203 on the computer 200 is restarted.
(2) The ineffective connections 106 held on the computer 100 are erased.
A receiver 31 in a management unit 2202, on the other hand, receives a transit instruction from an external source through a network. The transit instruction is defined as the information indicating that a predetermined function is set in a predetermined state, e.g. the information that the server module 203 is set in the activated state. Also, the receiver 31 specifically receives the information from the computer 100 dependent on the computer 200.
An operational target determining unit 36 specifies a function indicated in the transit instruction received by the receiver 31. In the process, the information contained in configuration information 41 is referred to. The configuration information 41 contains function information. The function information is defined as the information for relating the server module 203 installed in the computer 200 to the function realized by the server module 203.
A state determining unit 35 specifies the present state corresponding to the function specified by the operational target determining unit 36. In the process, the information contained in operation information 2204 is referred to. The operation information 2204 contains the state information of each function. The state information is defined as the information indicating the process state, such as the standby or activation of each function. The information indicating the present state of each function is also held therein.
An operation determining unit 34 specifies the process for transition from the present state specified by the state determining unit 35 to the state indicated in the transit instruction. In the process, the information contained in the operation information 2204 is referred to. An operation execution unit 33 executes the process specified by the operation determining unit 34 and causes the transition of the function indicated in the transit instruction to the state indicated in the transit instruction.
A transmitter 32, upon completion of the process of the operation execution unit 33, transmits a completion notice indicating that the transition to the state indicated in the transit instruction has been completed, to the transit instruction source.
A distributed management type self-repairing system (management apparatus) 2201 of the computer 100 is constructed similarly to the distributed management type self-repairing system (management apparatus) 2202 of the computer 200 described above.
The distributed management type self-repairing systems 2201, 2202 realize the self-reparation based on the operation information 2203, 2204. In order to secure the proper self-repairing operation in the distributed management type self-repairing systems 2201, 2202, a fault and the management work involved in repairing the particular fault are required to be properly described in the operation information 2203, 2204.
The confirmation of propriety of the operation information 2203, 2204 involves the following two requirements:
(a) A fault that may occur in the system shown in
(b) The method of detecting the fault in (a) and the management work in (a) shall be contained in the operation information 2203, 2204.
The present embodiment relates to (a) above and permits the system manager, etc. to definitely recognize that the construction of the information system shown in
In
Next, the information stored in the module state transition information storage unit 320 will be explained. The module state transition information storage unit 320 stores three types of information, including, (1) module information, (2) state information and (3) transition information.
As an example of the state information, the state information specified by the identifier S1, i.e., the state information of the server module 203, is shown in
As an example of the transition information, the transition information specified by the identifier T1, i.e. the transition information of the server module 203 is shown in
In
Next, with reference to
The module information contained in the module state transition information 320 can be identified by the module name. As shown in
Upon complete input of the initial state, etc. shown in
As described above, this embodiment assumes that the connection between the server module 203 and the client module 103 is designated as the configuration information, that the initial state “activated” (state “2” in
Next, the anomalous state analysis (S902 in
According to this embodiment, the state information and the transition information are read for each of the server module 203 and the client module 103. Then, based on the state information and the transition information, the state transition model for each module configured of only the automatic transition (hereinafter referred to as “the automatic state transition model”) is generated (S1203). Specifically, from the data on the transition information that have been read, a state transition model including the transition with “auto=O” and the state before or after the particular transition is generated. According to this embodiment, a state transition model 1301 shown in
Next, a system state transition model is generated by synthesizing the automatic state transition models 1301, 1401 for each module generated in step S1203 (S1204). The state transition models are synthesized taking the transition name appearing on all the plurality of module state transition models into consideration. Incidentally, for the details of the synthesis, refer to “Concurrency State Models & Java (registered trademark) Programming, Jeff Magee and Jeff Kramer, John Wiley & Sons Inc.”.
The synthesis of the state transition models according to this embodiment is shown in
Next, the transition from the state (2, a) will be explained. The transition appearing in both the state transition model 1301 of the server module 203 and the state transition model 1401 of the client module 103 at the same time is regarded as the transition shared by these two state transition models. Assuming that transition “conn” occurs with the system state of (2, a), for example, the transition “conn” is assumed to occur in both the server module 203 and the client module 103. In this case, therefore, the system state transition occurs from (2, a) to (3, b). Based on this assumption, the state transition model 1301 of the server module 203 and the state transition model 1401 of the client module 103 are synthesized. Then, the system state transition model 1501 is generated as shown in
Next, the state on the system state transition model 1501 thus generated is classified into the normal state and the anomalous state in accordance with the restriction information received from the analyzer terminal 31 (S1205). In the process, the state satisfying the restriction designated by the restriction information is regarded as a normal state, and the state failing to satisfy the same restriction as an anomalous state. The restriction information indicates the restriction to be satisfied in the case where the information system to be analyzed is normal.
According to this embodiment, the safety guarantee is designated as the restriction condition as described above. The state in which safety is guaranteed in the state transition model is considered “a given state in which the transition to another state is existent”. Strictly speaking, the definition of safety is different from this. Nevertheless, this definition is used for simplification in this embodiment.
Referring to the system state transition model 1501, it is understood that the states (2, a), (3, b) and (4, c) represent normal states having the transition to another state, while the states (1, a) and (1, d) are anomalous states.
From this classification result, state classification information is generated in step S1205. The state classification information according to this embodiment is shown in
The anomalous state analysis unit 321 finally transmits the state classification information 160 thus generated to the management work analysis unit 322 (S1206).
Next, the management work analysis (S903 in
Next, the state classification information 160 received from the anomalous state analysis unit 321 is checked for any state determined as an anomalous state (S1702). In the absence of any anomalous state in the state classification information 160, the management work information is registered as the one having no anomaly (S1708).
In the presence of an anomalous state in the state classification information 160, on the other hand, the process of steps S1703 to S1706 is repeated for each anomalous state. According to this embodiment, the process of steps S1703 to S1706 is executed for the states (1, a) and (1, d).
Now, the process of steps S1703 to S1706 for the state (1, d) will be explained.
First, the state information and the transition information of the module contained in the configuration information are read from the module state transition information storage unit 320 in step S1703, and based on the state information and the transition information, the state transition model for each module configured of only the manual transition (hereinafter referred to as “the manual state transition model”) is generated. Specifically, from the data of the transition information that have been read, a state transition model including the transition with the item “manual” of “O” and the state before or after the transition is generated.
Next, the manual state transition models 1801, 1901 for each module generated in step S1703 are synthesized thereby to generate a management state transition model (S1704).
Consider the state (1, d) as the initial state. A management state model 2001 shown in
Then, the management state model 2001 generated is accessed, and the transition to the normal state from the intended anomalous state is retrieved (S1705). Now, referring to state determining information shown in
Next, referring to the transition retrieval result, management work information is generated. The management work information is stored with the anomalous state related to the management work for the anomalous state. Management work information 250 finally registered in this embodiment is shown in
Upon completion of the process of steps S1703 to S1706 against all the anomalous states included in the state classification information 160, the management work information 250 generated is transmitted to the analyzer terminal 31 (S1707). The analyzer terminal 31, upon reception of the management work information 250, displays the management work information 250 in an appropriate form.
According to the embodiment explained above, the anomalous state (i.e. “fault”) and the management work required for the anomalous state can be presented to the user by relating them to each other. As a result, the user can grasp an anomalous state which may occur in the information system managed by the user and the management work required upon occurrence of the anomalous state, before the particular anomalous state actually occurs. Thus, the maintenance and management of the information system are facilitated.
Also, though not explained in this embodiment, the information for relating a module state to the operation required for specifying the particular state and the information for relating a transition name to the actual management work corresponding to the particular transition are stored in the storage unit, and retrieval means for retrieving these types of information is preferably arranged at the analyzer terminal 31 or the like, which further facilitates the maintenance and management of the information system.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2007-159283 | Jun 2007 | JP | national |