The present invention relates to a causes ordering estimation device and the like that enable estimation of causes of phenomena occurring in a system.
PTLs 1 to 5 disclose techniques relating to systems that manage prediction models for enabling prediction of availability. Prediction models include various kinds of information such as mathematical models for calculating, examining, or analyzing availability, calculating formulas, parameters, and configurations and behavior of systems. These systems estimate an operation rate of the entire system, for example, on the basis of the predicted availability.
PTL 1 discloses a method for predicting, in a computer included in the system, an operation rate of the entire system on the basis of properties, such as a failure rate and time required for restoring a failure, and monitoring information relating to a failure that occurs when the system is in operation.
According to a method disclosed in PTL 2, a fault tree that is a tool for analyzing a failure state is first composed on the basis of configuration information relating to software included in a system or hardware included in a system. According to the method a failure rate (a failure degree) is further calculated on the basis of the fault tree, to determine whether the calculated failure rate is less than a reference value or not.
According to a method disclosed in PTL 3, information relating to functionality, configuration, security, performance, and the like, and availability for an application program or an application service are stored as metadata upon installation of them. According to the method, configuration management, fault detection, diagnosis, recovery, and the like after the installation are further analyzed based on the stored metadata.
According to a method disclosed in PTL 4, every time a malfunction occurs in a providing service (a failure occurs), a period during a continuation of the failure and the number of users who cannot use the service due to the failure are stored. According to the method, a ratio of the failure period within a certain period, a ratio of users who cannot use the service due to the failure among expected users who use the service, an operation rate or the like are further estimated based on the stored period and the stored number of users.
For hardware, methods for analyzing availability of the hardware by using a mathematical model, such as a fault tree, on the basis of characteristics of components of the hardware, are widely known.
For software, methods for analyzing availability in accordance with a mathematical model, such as a stochastic Petri network and a stochastic reward network, are known. In such a model, transition among system states is described and the system is simulated based on the described model. Availability of the system is analyzed by reproducing the way that the state transits in the simulation.
PTL 5 discloses a simulator system being capable of evaluating availability relating to a computer system. The simulator system includes a client simulator and an evaluation unit. The client simulator transmits a signal to each client device in the computer system and measures a response time that elapses until the client device replies in response to the transmitted signal. The evaluation unit estimates influence that a failure occurring in the client device exerts on the response time on the basis of the response time measured for each client device.
PTL 1: Japanese Patent Application Laid-Open Publication No. 2008-532170
PTL 2: Japanese Patent Application Laid-Open Publication No. 2006-127464
PTL 3: Japanese Patent Application Laid-Open Publication No. 2007-509404
PTL 4: Japanese Patent Application Laid-Open Publication No. 2005-080104
PTL 5: Japanese Patent Application Laid-Open Publication No. 2007-122416
However, even when availability is analyzed by using devices disclosed in PTLs 1 to 5, an administrator managing a data center experiences difficulty in predicting a component (a factor, or a module) that causes a certain phenomenon (for example, a complaint received from a user). This is because, even when the administrator analyzes availability of the data center with such devices, the administrator cannot quantitatively associate the phenomenon occurring in the data center with a module included in the data center.
For example, even when the administrator calculates availability relating to the data center in accordance with a stochastic petri network, the administrator cannot predict a module that causes a complaint on the basis of the complaint relating to the data center.
Thus, the main objective of the present invention is to provide a causes ordering estimation device and the like that enable estimation of a module that causes a phenomenon (an incident) occurring in a system.
In order to achieve the aforementioned object, as an aspect of the present invention, a causes ordering estimation device including:
ordering means that determines an order of modules in accordance with a degree of similarity between a numeric value relating to a service provided by using a system and a degree of influence representing a magnitude of influence that the module in the system exerts on the service.
In addition, as another aspect of the present invention, a causes ordering estimation method including:
determining an order of modules in accordance with a degree of similarity between a numeric value relating to a service provided by using a system and a degree of influence representing a magnitude of influence that the module in the system exerts on the service.
Furthermore, the object is also realized by a causes ordering estimation program, and a computer-readable recording medium which records the program.
The causes ordering estimation device and the like according to the present invention enable estimation of a module that causes a phenomenon occurring in a system.
To facilitate understanding of the present invention, terms used in the claimed description will be described preliminarily.
Availability refers to a ratio of a period during which a user can use a service to a certain period. Availability may be used synonymously with an operation rate.
For example, if a service is not available for one minute per day on average, the availability is 99.93 (=1−1+(24×60))%.
Availability is calculated based on mean time between failure and mean time to repair that is a time required for recovering from a failure (a breakdown).
For example, availability is calculated based on a stochastic petri network (stochastic reward network) formed by combining the state transitions exemplified in
For example, suppose an information system exhibits state transitions as exemplified in
In an example illustrated in
Further, as illustrated in
Likewise, as illustrated in
The above-described virtual server VM1 is not allocated to a hypervisor but is allocated to a user, and further allows the user to access itself (that is, a user VM). A hypervisor, which is accessible only by an administrator who is managing the data center, is a control program that controls the virtual server VM1.
An arrow between the two states in the examples of
The state transition of the virtual server VM1 depends on the state of the physical server PS1. For example, the physical server PS1 performs processing relating to the virtual server VM1. When the physical server PS1 is in the state of being not in operation, the virtual server VM1 is also in the state of being not in operation.
Thus, when the physical server PS1 stops, the virtual server VM1 transits from the state of being in operation to the state of being not in operation with transition rate 1 (arrow 302). Further, when the physical server PS1 is in the state of being in operation, the virtual server VM1 transits from the state of being in operation to the state of being not in operation with transition rate λVM1 (arrow 302).
The transition rate may be, for example, a probability indicating the likelihood of transition from the state of being in operation to the state of being not in operation. Similarly, the transition rate can also be defined as a probability for transition from the state of being not in operation to the state of being in operation.
In the examples illustrated in
For example, the availability of an information system and the availability of the components in the information system are analyzed by a simulation of the states of the components constituting the information system on the basis of a stochastic petri network. In the cases of the examples of
The method of calculating availability is not limited to the above examples.
The administrator for the data center analyzes the availability relating to the data center by generating a stochastic petri network on the basis of the characteristics relating to the infrastructure of the data center (a server infrastructure) and according to an operation procedure relating to the data center. As such, the method for predicting availability depends on, for example, an operation procedure relating to the data center.
The following will describe the details of the example embodiments of the present invention with reference to the drawings.
The components of a causes ordering estimation device 101 according to a first example embodiment of the present invention and the processing performed by the causes ordering estimation device 101 will be described in detail with reference to
The causes ordering estimation device 101 according to the first example embodiment includes an ordering unit 102.
First, the causes ordering estimation device 101 receives, for example, numerical information 501 as exemplified in
The numerical information 501 is information generated by a combination of one or more numeric values. For example, in the example illustrated in
For convenience of description, it is assumed in the following description that qualities of the system degrade, as the above-described numeric values are larger.
The degree of influence is, for example, a stop time rate of a system, which indicates the ratio of time (duration) during which the system could not provide the service to users over the past year. That is, in
As illustrated in
In the example illustrated in
In the example illustrated in
The numerical information 501 may also include the number of cancellations that the users have cancelled the services over the past year, actual values representing the actual magnitudes of influence exerted on the services, and other values. The numerical information 501 is not limited to the above-described examples.
Next, the ordering unit 102 determines an order of modules (that is, determines the order of the modules) in the system on the basis of the received numerical information 501 (step S101).
For example, the ordering unit 102 reads the module influence information as exemplified in
The module influence information may be input from outside or generated by the causes ordering estimation device 101, as will be described later. In the first example embodiment, modules are elements that constitute (are included in) a system and represent functional units implemented by software, hardware, or a combination thereof.
Referring to the example illustrated in
Of the module influence information exemplified in
As such, in the module influence information exemplified in
These degrees of influence may be the actual values indicating the magnitudes of influence that the modules exert on the services or may be values calculated by the causes ordering estimation device 101, as will be described later. The degrees of influence may be 0 or positive values in the same way as the numerical information 501. The influence of the modules exerted on the services becomes greater, as the values of the degrees of influence are larger.
For example, the ordering unit 102 reads the degrees of influence relating to a module from the module influence information. That is, the ordering unit 102 reads a three-dimensional vector including the degrees of influence on the services SV1 to SV3 as shown in the first row of the module influence information exemplified in
The vector shown as the numerical information 501 is, for example, a three-dimensional vector including the degrees of influence exemplified in
Next, the ordering unit 102 calculates an inner product of the normalized vectors (i.e., the normalized first and second vectors). In such a case, the inner product indicates the degree of similarity representing how much the vectors are relevant to each other.
In the first example embodiment, the degree of similarity is defined as the cosine of an angle between vectors, that is, an inner product calculated for normalized vectors. If the angle is 0° (degree), the degree of similarity is 1. If the angle is 90° (degrees), the degree of similarity is 0. As such, the vectors are more relevant as the degree of similarity is closer to 1.
While assuming that the ordering unit 102 reads the degrees of influence relating to the physical server PS1 in the above description, the ordering unit 102 performs similar processing for other modules, such as the physical server PS2 and the virtual server VM3. Further, it is assumed in the above-described example that the vector is three-dimensional, but the vector is not necessarily three-dimensional.
The ordering unit 102 calculates a first vector for each module specified by the module influence information and calculates the degree of similarity between the calculated first vector and the second vector.
Next, the ordering unit 102 determines an order of the modules in the system on the basis of the calculated degrees of similarity. The ordering unit 102 calculates higher orders for modules with larger degrees of similarity. As such, the higher the likelihood of a module causing numerical information 501 is, the higher the degree of similarity becomes and, thus, the calculated order becomes higher.
While the ordering unit 102 estimates the degree of similarity by calculating an inner product as in the above-described example, the ordering unit 102 may estimate the degree of similarity by calculating a distance. That is, the ordering unit 102 may calculate, as the degree of similarity, a distance between the normalized numerical information 501 and the normalized degrees of influence. For example, the distance is expressed by Eqn. 1 as the size of a difference vector between the vectors.
|(Normalized degrees of influence)−(Normalized numerical information 501)| (Eqn. 1)
(where ∥ indicates the size.)
For example, the size may be a geometric distance, a Manhattan distance, a generalized Mahalanobis distance, and the like. In such a case, the degree of similarity is larger as the distance is shorter. That is, the order becomes higher with a shorter distance.
Alternatively, the ordering unit 102 may generate the calculated order as the ordering information as exemplified in
In the ordering information, a module is associated with a degree of similarity and an order. For example, the physical server PS1 is associated with 0.33 and 5. This represents that the degree of similarity calculated by the ordering unit 102 for the physical server PS1 is 0.33. Further, five in the ordering information represents that the order of the physical server PS1 is fifth when the degrees of similarity calculated for the modules of the analysis target system are arranged in a descending order.
For example, the virtual server VM1 is associated with 0.54 and 4. This indicates that the degree of similarity calculated by the ordering unit 102 for the virtual server VM1 is 0.54. Further, four in the ordering information indicates that the order of the virtual server VM1 is fourth when the degrees of similarity calculated for the modules of the analysis target system are arranged in a descending order.
In the example illustrated in
For example, the administrator can select a module causing the numerical information 501 in the analysis target system on the basis of the order calculated by the ordering unit 102. Thus, in the case of the example illustrated in
If the numerical information 501 represents the number of complaints exemplified in
If the numerical information 501 represents maintenance counts exemplified in
The following will describe the effect of the causes ordering estimation device 101 according to the first example embodiment.
The causes ordering estimation device 101 according to the first example embodiment can estimate a module that causes the numerical information 501 in an analysis target system.
This is because the causes ordering estimation device 101 generates the ordering information where the modules are ordered based on the degrees of similarity between the numerical information 501 and the module influence information of the modules. In such a case, the administrator of the system can estimate, for example, a module that may cause the numerical information 501 by selecting a module of a higher order on the basis of the ordering information.
On the other hand, the devices disclosed in PTLs 1 to 5 cannot estimate a module that may cause the numerical information 501 in an analysis target system. This is because these devices do not have a function of analyzing modules that influence the numerical information 501.
The following will describe a second example embodiment of the present invention on the basis of the above-described first example embodiment.
The following description will mainly describe the features of the second example embodiment. The same components as the above-described first example embodiment will be appended with the same reference numerals to omit redundant descriptions.
The components of a causes ordering estimation device 201 according to the second example embodiment and the processing performed by the causes ordering estimation device 201 will be described with reference to
The causes ordering estimation device 201 according to the second example embodiment includes a similarity calculation unit 202 and an ordering unit 203.
First, the similarity calculation unit 202 generates module influence information on the basis of, for example, a recovery rate (a recovery degree) that indicates how easily an application program (hereinafter, referred to as an “application”) involved in a service can be recovered from a failure (step S201). The processing of step S201 will be described later. The ordering unit 203 receives the numerical information 501. The ordering unit 203 calculates the degrees of similarity, as illustrated in the first example embodiment, on the basis of the numerical information 501 and the module influence information generated by the similarity calculation unit 202 (step S202). Next, the ordering unit 203 determines an order of the modules in the analysis target system on the basis of the degrees of similarity (step S203).
First, relation information, module information, and service information that are referred to in the processing of step S201 will be described in detail with reference to
In the relation information exemplified in
For example, in the relation information exemplified in
The relation information may take a form of a relational database table or a file in a text format. The relation information is updated, for example, when a module is added to the analysis target system or a module is deleted from the system. The relation information is also updated when a relationship between the modules is updated.
The modules may further include virtual servers, network routers, applications, or the like, as well as physical servers. The modules in the relation information are associated with identifiers (for example, virtual server identifiers, network router identifiers, application identifiers, and the like) that can uniquely identify the individual modules.
The module information exemplified in
The recovery rate and the failure rate are in the range of 0 to 1 where possibility becomes higher as the value is closer to 1.
For example, when a new module is added to an analysis target system, the new module may be added to the module information. Further, when a module is deleted from an analysis target system, the module may be deleted from the module information. Further, when a recovery rate and the like of a module are updated, the module information may be updated.
Further, in the service information exemplified in
When a new service is introduces in analysis target system, the new service may be added to the service information. Further, when a service is deleted from an analysis target system, the service may be deleted from the service information. Further, when a service is changed, the service information may be updated.
The administrator may set the relation information exemplified in
The system relating to the relation information exemplified in
In the example illustrated in
Referring to
For example, if the first module includes the second module, the first module is associated with the second module in the relation information exemplified in
When generating module influence information, the similarity calculation unit 202 calculates the degree of influence that represents the likelihood of a specific module influencing a specific service.
Next, with reference to
First, the similarity calculation unit 202 reads modules that are associated with one another in the relation information (step S301). For example, the similarity calculation unit 202 reads the virtual servers VM3 and VM4 by reading modules associated with the physical server PS2 from the relation information exemplified in
Next, the similarity calculation unit 202 specifies, for example, a service associated with a module standing for an application from among the modules read in the service information (step S302). For example, the similarity calculation unit 202 specifies services SV1 and SV3 by reading services associated with the read application AP4 from the service information exemplified in
Next, the similarity calculation unit 202 specifies modules associated with the service specified at step S302 from among the modules in the system providing the service (step S303). The similarity calculation unit 202 specifies a service and modules required for implementing the service in the system in this processing. If there is any other modules that mediate between the service and the modules, the similarity calculation unit 202 specifies the mediating module in this processing.
While, in the above-described example, the similarity calculation unit 202 reads relation information and, then, reads service information, the similarity calculation unit 202 may not necessarily follow such a processing flow.
For example, in the relation information as exemplified in
For example, the similarity calculation unit 202 reads the applications AP1 and AP4 by reading modules associated with the service SV1 from the service information exemplified in
Further, the similarity calculation unit 202 reads the virtual server VM4 by reading a module associated with the read application AP3 from the relation information exemplified in
In the service information exemplified in
In such a case, the similarity calculation unit 202 specifies association between the service SV1 and the physical server PS1 and further specifies the mediating modules; the application AP1 and the virtual server VM1, in between the service SV1 and the physical server PS1. Further, in such a case, the similarity calculation unit 202 specifies association between the service SV1 and the physical server PS2 and further specifies the mediating modules; the application AP4 and the virtual server VM4, in between the service SV1 and the physical server PS1.
As described above, the similarity calculation unit 202 specifies a service and modules associated with the service (target modules, such as, the physical server PS1, the virtual server VM1, and the like) on the basis of the relation information and service information. Further, the similarity calculation unit 202 specifies a module mediating between the service and the modules.
Next, when a specific service and a specific module are associated with one another, the similarity calculation unit 202 calculates a degree of influence in accordance with the recovery rates of modules mediating between the specific service and the specific module and the like (step S304). The similarity calculation unit 202 calculates the degree of influence exerted by the specific module on the specific service, for example, by calculating the sum of the inverse numbers of the recovery rates of the modules mediating between the specific service and the specific module.
For example, when the specific module is a physical server, the similarity calculation unit 202 calculates the degree of influence that a physical server PSi (i.e., the specific module) exerts on an application APk in accordance with Eqn. 2, where i, k are natural numbers.
Degree of influence(PSi→APk)=1÷μPSi+1÷μVMj+1÷μAPk (Eqn. 2)
(where μPSi represents a recovery rate of a physical server PSi; μVMj represents a recovery rate of a virtual server VMj; μAPk represents a recovery rate of an application APk; and j represents a natural number.)
Further, when a specific module is a virtual server, the similarity calculation unit 202 calculates the degree of influence that the virtual server VMi exerts on the application APk in accordance with Eqn. 3.
Degree of influence(VMi→APk)=1÷μVMi+1÷μAPk (Eqn. 3)
(where μVMi represents a recovery rate of the virtual server VMi; and μAPk represents a recovery rate of the application APk.)
Next, to calculate the degree of influence that a specific module exerts on a service SVm, the similarity calculation unit 202 calculates the sum of the degrees of influence that the specific module exerts on applications involved in the service SVm. For example, the similarity calculation unit 202 calculates the degree of influence that the physical server PSi exerts on the service SVm in accordance with Eqn. 4, where m is a natural number.
Degree of influence(PSi→SVm)=Σdegree of influence(PSi→APk) (Eqn. 4)
(where Σ indicates the sum of the degrees of influence on APk involved in SVm)
The similarity calculation unit 202 calculates the degree of influence that the virtual server VMi exerts on the service SVm in accordance with Eqn. 5.
Degree of influence(VMi=SVm)=Σdegree of influence(VMi→APk) (Eqn. 5)
(where Σ indicates the sum of the degrees of influence on APk involved in
SVm)
While the similarity calculation unit 202 calculates the degree of influence on the basis of recovery rates in the above-described example, the degree of influence may be calculated on the basis of failure rates, or the failure rates and recovery rates by performing processing that is similar to the calculation of the degree of influence on the basis of the recovery rates. For example, the similarity calculation unit 202 may calculate the degree of influence by using the inverse numbers of failure rates instead of the above-described inverse numbers of the recovery rates. Alternatively, the similarity calculation unit 202 may calculate the degree of influence by using the inverse number of a harmonic mean of recovery rates and failure rates instead of the above-described inverse numbers of the recovery rates.
The similarity calculation unit 202 may use average time interval between timings of failures of modules, average recovery time, the count of failures, the count of successful recoveries from occurring failures, or the like, instead of the inverse numbers of recovery rates. That is, the method for calculating the degree of influence by the similarity calculation unit 202 is not limited to the above-described examples.
Next, the similarity calculation unit 202 may calculate the module influence information exemplified in
The following will describe the effect of the causes ordering estimation device 201 according to the second example embodiment.
In addition to the effect of the first example embodiment, the causes ordering estimation device 201 of the second example embodiment can further easily calculate the degree of influence that a module exerts on a service based on a recovery rate and the like.
This is because of the following reasons 1 and 2. That is,
(Reason 1) the components of the causes ordering estimation device 201 according to the second example embodiment includes the components of the causes ordering estimation device 101 according to the first example embodiment; and
(Reason 2) the similarity calculation unit 202 calculates the degree of influence with a small number of calculations, such as the inverse number of a recovery rate, as described above.
The causes ordering estimation device according to the above-described example embodiments may be also used to facilitate management of modules for improving availability of services in an information system, such as a cloud data center that is managed by using a mathematical model. For example, when planning elimination of a module with a high risk of causing a failure, the administrator may select a module on the basis of the order of modules calculated by the causes ordering estimation device.
A configuration example of hardware resources that realize a causes ordering estimation device in the above-described example embodiments of the present invention using a single calculation processing apparatus (an information processing apparatus or a computer) will be described. However, the availability analysis device may be realized using physically or functionally at least two calculation processing apparatuses. Further, the availability analysis device may be realized as a dedicated apparatus.
The non-transitory recording medium 24 is, for example, a computer-readable Compact Disc, Digital Versatile Disc, Blu-ray Disk (registered trademark). The non-transitory recording medium 24 is, for example, Universal Serial Bus (USB) memory, or Solid State Drive. The non-transitory recording medium 24 allows a related program to be holdable and portable without power supply. The non-transitory recording medium 24 is not limited to the above-described media. Further, a related program can be carried via a communication network by way of the communication I/F 27 instead of the non-transitory medium 24.
In other words, the CPU 21 copies, on the memory 22, a software program (a computer program: hereinafter, referred to simply as a “program”) stored by the disc 23 when executing the program and executes arithmetic processing. The CPU 21 reads data necessary for program execution from the memory 22. When display is needed, the CPU 21 displays an output result on the output apparatus 26. When a program is input from the outside, the CPU 21 reads the program from the input apparatus 25. The CPU 21 interprets and executes a causes ordering estimation program present on the memory 22 corresponding to a function (processing) indicated by each unit illustrated in
In other words, in such a case, it is conceivable that the present invention can also be made using the causes ordering estimation program. Further, it is conceivable that the present invention can also be made using a computer-readable, non-transitory recording medium storing the causes ordering estimation program.
The present invention has been described using the above-described example embodiments as example cases. However, the present invention is not limited to the above-described example embodiments. In other words, the present invention is applicable with various aspects that can be understood by those skilled in the art without departing from the scope of the present invention.
This application is based upon and claims the benefit of priority from Japanese patent application No. 2014-114464, filed on Jun. 3, 2014, the disclosure of which is incorporated herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2014-114464 | Jun 2014 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/002769 | 6/2/2015 | WO | 00 |