The present invention relates to an operations management apparatus, an operations management method and a program thereof, and in particular, relates to an operations management apparatus, an operations management method and a program thereof which monitor a correlation between types of system performance values (metrics).
An example of an operations management system, which detects a fault of a system through generating a system model from time series information on system performance and using the generated system model, is disclosed in a patent literature 1.
According to the operations management system which is disclosed in the patent literature 1, based on measured values of various types of performance values (a plurality of metrics) on the system, a correlation regarding each pair of monitored metrics is detected, and a correlation model is generated. Then, the operations management system judges periodically, by use of the generated correlation model, whether correlation destruction is caused in the measured values of inputted metrics, and detects a fault of the system and a cause of the fault.
Moreover, patents literatures 2 and 3 disclose an operations management system which estimates a value of a metric by use of a correlation model generated as described in the patent literature 1, and estimates a bottleneck of a system.
[Patent Literature 1] Japanese Patent Application Laid-Open No. 2009-199533
[Patent Literature 2] Japanese Patent Application Laid-Open No. 2009-199534
[Patent Literature 3] Japanese Patent Application Laid-Open No. 2010-237910
According to the operations management system which is described in the patent literatures 1 to 3, it is judged whether there is the correlation regarding every pair of metrics in a plurality of the monitored metrics, and the correlation model is generated. For this reason, the operations management systems which are described in the patent literatures 1 to 3 have a problem that it takes a long time to re-generate the correlation model in the case that a monitored metric is added due to a change in a system configuration or a monitoring policy.
An object of the present invention is to solve the problem through providing an operations management apparatus, an operations management method and a program thereof which can update a correlation model quickly in the case that monitored metrics are changed.
An operations management apparatus according to an exemplary aspect of the invention includes correlation model storing means for storing a first correlation model including a correlation detected for a pair of metrics in first plural metrics, and correlation model updating means for, in the case that a metric is added, judging existence of a correlation for each of pairs of metrics obtained by excluding the pair of metrics in first plural metrics from pairs of metrics in second plural metrics including the added metric and the first plural metrics, and generating a second correlation model by adding the detected correlation to the first correlation model.
An operations management method according to an exemplary aspect of the invention includes storing a first correlation model including a correlation detected for a pair of metrics in first plural metrics, in the case that a metric is added, judging existence of a correlation for each of pairs of metrics obtained by excluding the pair of metrics in first plural metrics from pairs of metrics in second plural metrics including the added metric and the first plural metrics, and generating a second correlation model by adding the detected correlation to the first correlation model.
A computer readable storage medium according to an exemplary aspect of the invention, records thereon a program, causing a computer to perform a method including storing a first correlation model including a correlation detected for a pair of metrics in first plural metrics in the case that a metric is added, judging existence of a correlation for each of pairs of metrics obtained by excluding the pair of metrics in first plural metrics from pairs of metrics in second plural metrics including the added metric and the first plural metrics, and generating a second correlation model by adding the detected correlation to the first correlation model.
An advantageous effect of the present invention is that it is possible to update a correlation model quickly in the case that monitored metrics are changed.
Next, a first exemplary embodiment of the present invention will be described.
First, a configuration according to the first exemplary embodiment of the present invention will be described.
With reference to
The operations management apparatus 100 generates a correlation model on the basis of performance information collected from the monitored apparatuses 200, and carries out a fault analysis on the monitored apparatuses 200 by use of the generated correlation model.
The monitored apparatus 200 is a component of a system which provides a user with a service. For example, a Web server, an application server (AP server), a database server (DB server) and the like are exemplified as the monitored apparatus 200.
Each of
Each of the monitored apparatuses 200 measures performance values of plural items at a periodical interval and sends the measured data (measured value) to the operations management apparatus 100. Here, for example, a CPU (Central Processing Unit) usage rate (hereinafter, denoted as CPU), a memory consumption (hereinafter, denoted as MEM), a disk consumption (hereinafter, denoted as DSK) or the like is measured as the item of the performance value.
Here, a set of the monitored apparatus 200 and the item of the performance value is defined as a type of the performance value (metric), and a set of plural metric values measured at the same time is defined as performance information.
The operations management apparatus 100 includes a performance information collecting unit 101, a correlation model generating unit 102, a correlation model updating unit 103, a fault analyzing unit 104, a performance information storing unit 111, a correlation model storing unit 112, a metric information storing unit 113 and an additional metric information storing unit 114.
The performance information collecting unit 101 collects performance information from the monitored apparatuses 200 and makes the performance information storing unit 111 store a sequential change of the performance information as sequential performance information 121.
Each of
The correlation model generating unit 102 generates a correlation model related to a plurality of monitored metrics on the basis of the sequential performance information 121. Here, regarding every pair of metrics in a plurality of monitored metrics, the correlation model generating unit 102 determines a coefficient of a predetermined approximate formula (correlation function or conversion function), which approximates a relation between two metrics included in the pair (determine the correlation function), on the basis of the sequential performance information 121 which is collected for a predetermined period of time. The coefficient of the correlation function is determined by a system identifying process for sequences of measured values of the two metrics as described in the patent literatures 1 and 2. The correlation model generating unit 102 calculates a weight of the correlation function on the basis of a conversion error between the measured values by the correlation function, as described in the patent literatures 1 and 2. Here, the weight becomes, for example, smaller as an average value of the conversion error becomes larger. Then, the correlation model generating unit 102 judges that the correlation between the two metrics related to the correlation function is effective (the correlation between the two metrics exists), in the case that the weight is equal to or greater than a predetermined value. Here, a set of the effective correlations of the monitored metrics is defined as the correlation model.
Note that, while the correlation model generating unit 102 judges existence of the correlation on the basis of the conversion error by the correlation function, it may be preferable to judge with another method. For example, the correlation model generating unit 102 may judge on the basis of a variance between the two metrics or the like.
The correlation model storing unit 112 stores correlation model information 122 which indicates the correlation model generated by the correlation model generating unit 102.
Each of
In the case that a monitored metric is added due to a change in the system configuration or the like, the correlation model updating unit 103 updates the correlation model.
As mentioned above, a correlation model is generated through detecting existence of a correlation between two metrics out of a plurality of monitored metrics, that is, through detecting existence of a common part between an increase or decrease pattern of one metric and an increase or decrease pattern of the other metric out of two metrics. In the first exemplary embodiment of the present invention, it is assumed that the increase or decrease pattern of the metric depends mainly on logic of an application. In this case, it is conceivable that the increase or decrease pattern of the metric value is not changed as far as the logic of the application is not changed. Therefore, in the case that the correlation between two metrics does not exist, and a monitored metric is added due to the change in the system configuration which does not make the logic of the application changed, it is conceivable that the correlation between two metrics does not exist continuously.
In the first exemplary embodiment of the present invention, in the case that a correlation model (first correlation model) including correlations each detected for each pair of metrics in a plurality of monitored metrics (first plural metrics) has been stored in the correlation model storing unit 112 and then a monitored metric is added, the correlation model updating unit 103 does not detect a correlation for a pair of metrics (already-judged pair) in the metrics (first plural metrics) for which existence of a correlation has been judged already, among a plurality of the monitored metrics (second plural metrics) including the added metric. Then, the correlation model updating unit 103 determines the coefficient of the correlation function and detects existence of a correlation, similarly to the correlation model generating model 102, for every pair of metrics except the already-judged pair of metrics. That is, the correlation model updating unit 103 detects existence of the a correlation for every pair of metrics in the added metrics and for every pair of the added metric and the metric except the added metric, out of a plurality of monitored metrics. Then, the correlation model updating unit 103 updates the correlation model (generates a second correlation model) through adding the detected correlation to the correlation model.
The metric information storing unit 113 stores metric information 123 indicating the metric for which existence of a correlation has been judged in the correlation model generating process and the correlation model updating process.
Each of
The additional metric information storing unit 114 stores additional metric information 124 indicating a metric which is added as a metric to be monitored.
The fault analyzing unit 104 detects a system fault and specifies a cause of the system fault through detecting correlation destruction of a correlation, which is included in a correlation model, by use of performance information inputted newly and the correlation model stored in the correlation model storing unit 112, as described in the patent literature 1.
Here, it may be preferable that the operations management apparatus 100 is a computer which includes CPU and a storage medium storing a program, and works with control based on the program. Moreover, it may be preferable that the performance information storing unit 111, the correlation model storing unit 112, the metric information storing unit 113 and the additional metric information storing unit 114 are separated each other or are included in one storage medium.
Next, an operation carried out by the operations management apparatus 100 according to the first exemplary embodiment of the present invention will be described. Here, the operation by the operations management apparatus 100 will be described through exemplifying a case that the system configuration, which includes one DB server, one AP server and one Web server as shown in
First, the correlation model generating unit 102 of the operations management apparatus 100 detects existence of a correlation for each pair of metrics in a plurality of monitored metrics, on the basis of the sequential performance information 121 stored in the performance information storing unit 111 (Step S101), and makes the correlation model storing unit 112 store a correlation model, which includes the detected correlation relation, as the correlation model information 122 (Step S102). Here, the monitored metrics and a period of time on which the sequential performance information 121 used for generating the correlation model is collected is designated by the manager or the like.
For example, the correlation model generating unit 102 detects existence of a correlation for the monitored metrics (SV1.CPU, SV1.DSK, SV2.CPU, SV2.MEM, SV3.CPU, SV3.MEM and SV3.DSK), on the basis of the sequential performance information 121 shown in
Then, the correlation model generating unit 102 makes the correlation model storing unit 112 store the correlation model information 122 shown in
Next, the correlation model generating unit 102 generates the metric information 123 including the metric (monitored metric) for which existence of a correlation has been judged in the correlation model generating process (Step S101), and makes the metric information storing unit 113 store the generated metric information 123 (Step S103).
For example, the correlation model generating unit 102 generates the metric information 123 shown in
Next, when a metric, which is added as a metric to be monitored due to the change in the system configuration or the like, is set in the additional metric information 124, the correlation model updating unit 103 detects existence of a correlation for each pair in the metrics set in the additional metric information 124, on the basis of the sequential performance information 121 which is stored in the performance information storing unit 111 (Step S104). Here, a period of time on which the sequential performance information 121 used for updating the correlation model is collected is designated by the manager or the like.
For example, in the case that the system configuration is changed as shown in
Furthermore, on the basis of the sequential performance information 121 stored in the performance information storing unit 111, the correlation model updating unit 103 detects existence of a correlation between the metric set in the additional metric information 124 and the metric set in the metric information 123, out of the monitored metrics (Step S105).
As mentioned above, by carrying out Steps S104 and S105, the correlation which is detected newly is added to the correlation model (correlations of the correlation model is updated).
Next, the correlation model updating unit 103 makes the correlation model storing unit 112 store the correlation model including the newly-detected correlation, as the correlation model information 122 (Step S106).
For example, the correlation model updating unit 103 makes the correlation model storing unit 112 store the correlation model information 122 shown in
Next, the correlation model updating unit 103 adds the added metric to the metric information 123 (updates the metric information 123), makes the metric information storing unit 113 store the metric information 123, and initializes the additional metric information 124 (Step S107).
For example, the correlation model updating unit 103 updates the metric information 123 as shown in
Note that, while it is unnecessary to detect existence of a correlation for each pair of the metrics for which existence of a correlation has been judged before the correlation model is updated, it may be preferable that the correlation model updating unit 103 updates the coefficient of the correlation function (updates the correlation function) for each pair of metrics for which the correlation has been detected out of the already-judged pairs (Step S108). Moreover, it may be preferable that the correlation model updating unit 103 judges existence of the correlation again on the basis of the updated correlation function.
For example, the correlation model updating unit 103 may update the coefficient of the correlation function related to each of the correlations between SV1.CPU and SV2.CPU, SV2.CPU and SV3.CPU, SV2.CPU and SV3.MEM, and SV3.CPU and SV3.MEM.
Afterward, Steps S104 to S108 are repeated every time when a monitored metric is added newly.
With that, the operation according to the first exemplary embodiment of the present invention is completed.
Next, a characteristic configuration of the first exemplary embodiment of the present invention will be described.
Referring to
Here, the correlation model storing unit 112 stores a first correlation model including a correlation detected for a pair of metrics in first plural metrics.
The correlation model updating unit 103, in the case that a metric is added, judges existence of a correlation for each of pairs of metrics obtained by excluding the pair of metrics in first plural metrics from pairs of metrics in second plural metrics including the added metric and the first plural metrics, and generates a second correlation model by adding the detected correlation to the first correlation model.
According to the first exemplary embodiment of the present invention, it is possible to update a correlation model quickly in the case that monitored metrics are changed. The reason is in the following. That is, in the case a monitored metric is added, the correlation model updating unit 103 judges existence of a correlation for each pair of metrics except pairs of metrics for which existence of a correlation has been already judged, out of pairs of metrics in a plurality of the metrics including the added metrics, and adds the detected correlation to the correlation model stored in the correlation model storing unit 112. As a result, it is possible to update the correlation model quickly since it is unnecessary to detect existence of a correlation for a whole of pairs of the monitored metrics in the case that a monitored metric is added.
Moreover, according to the first exemplary embodiment of the present invention, in the case that monitored metrics are changed, it is possible to update a state related to the correlation which has been detected before updating the correlation model. The reason is that, in the case that a monitored metric is added, the correlation model update unit 103 updates the correlation function for each pair of metrics for which the correlation has been detected, out of pairs of metrics for which existence of a correlation has been judged before updating the correlation model.
While the invention has been particularly shown and described with reference to exemplary embodiments thereof, the invention is not limited to these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.
For example, while the operation has been described through exemplifying the case that the server which composes the redundant configuration is added due to the change in the configuration of the monitored system, the present invention is not limited to the case. The same effect can be obtained in the case that a monitored metric is added due to a change in a managing policy with no change in the configuration of the monitored system, since the logic of the application is not changed.
Moreover, the same effect can be obtained also in the case that, when CPU or a memory resource is reinforced, a metric related to the reinforced CPU or the reinforced memory resource is added as a metric to be monitored with no change in the logic of the application in the virtual environment or the like.
Furthermore, the same effect can be obtained in the case that a parameter of the application is changed, and a metric related to the changed parameter is added with no change in the logic of the application, for example, in the case that the cache size of the database, the number of the worker threads of the AP server or the like is reinforced.
This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2011-011887, filed on Jan. 24, 2011, the disclosure of which is incorporated herein in its entirety by reference.
100 Operations management apparatus
101 Performance information collecting unit
102 Correlation model generating unit
103 Correlation model updating unit
104 Fault analyzing unit
111 Performance information storing unit
112 Correlation model storing unit
113 Metric information storing unit
114 Additional metric information storing unit
121 Sequential performance information.
122 Correlation model information
123 Metric information
124 Additional metric information
200 Monitored apparatus
Number | Date | Country | Kind |
---|---|---|---|
2011-011887 | Jan 2011 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/079275 | 12/13/2011 | WO | 00 | 8/17/2012 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/101933 | 8/2/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7280988 | Helsper et al. | Oct 2007 | B2 |
7337090 | Yemini et al. | Feb 2008 | B1 |
7617313 | Washburn et al. | Nov 2009 | B1 |
7827447 | Eberbach et al. | Nov 2010 | B2 |
8095830 | Cohen et al. | Jan 2012 | B1 |
20050216793 | Entin et al. | Sep 2005 | A1 |
20070130097 | Andreev et al. | Jun 2007 | A1 |
20080004841 | Nakamura | Jan 2008 | A1 |
20080016412 | White et al. | Jan 2008 | A1 |
20080168308 | Eberbach et al. | Jul 2008 | A1 |
20080262822 | Hardwick et al. | Oct 2008 | A1 |
20090210745 | Becker et al. | Aug 2009 | A1 |
20090216624 | Kato | Aug 2009 | A1 |
20090217099 | Kato | Aug 2009 | A1 |
20100229187 | Marwah et al. | Sep 2010 | A1 |
20110185235 | Iizuka | Jul 2011 | A1 |
20110225462 | Kato | Sep 2011 | A1 |
20120023041 | Kariv et al. | Jan 2012 | A1 |
20120304007 | Hanks et al. | Nov 2012 | A1 |
20130231978 | Curbera et al. | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
6-131468 | May 1994 | JP |
2001-344259 | Dec 2001 | JP |
2009-199533 | Sep 2009 | JP |
2009-199534 | Sep 2009 | JP |
2010-237910 | Oct 2010 | JP |
Number | Date | Country | |
---|---|---|---|
20130151907 A1 | Jun 2013 | US |