This invention relates to an analysis technique for responses in a computer system.
Along with development of a network service, a system to provide the service becomes complicated and large-scale. A lot of services have come to be provided by combining many servers. In such a system, it becomes very difficult to grasp how the utilization state of the resources of each server influences the response to the user.
Conventionally, following two methods are known in order to investigate what ratio the delay in each server of a system having plural servers occupies for a response time the user feels. Namely, (1) a special identification tag is attached to messages transmitted and received between servers, and the delay is measured by using the tag. (2) Messages transmitted and received between servers are captured by the packet capture to analyze such information.
However, the method (1) has to change the existing system and the service, and the introduction of this function is not easy. In addition, the method (2) requires an expensive equipment and a storage having a large capacity for the packet capture. Furthermore, in view of the security, the packet capture is not preferable.
In addition, US-2003/0236878-A1 discloses a technique to effectively evaluate, by the limited number of experiment times, the response capability of each application under various utilization states for one or plural applications operating on an information system. More specifically, when the load injection experiment corresponding to various utilization states of the application is carried out plural times, the quantity concerning the utilization state of the application, the quantity concerning the response capability of the application, the quantity concerning the utilization state of the hardware resource and the quantity of the response capability of the hardware resource are obtained, and by creating estimate equations describing the dependence relation between the quantities, the evaluation of the response capability of the application, by using the estimate equations, is enabled. However, this technique needs the “experiment”, and the analysis cannot be carried out while carrying out a regular processing.
Therefore, an object of this invention is to provide a technique for carrying out an analysis concerning the response of a computer system by using information that can be easily obtained from the computer system to be analyzed (hereinafter, to be monitored).
An analysis method according to this invention is an analysis method for carrying out an analysis for responses of a computer system including a plurality of servers. The analysis method includes: obtaining data concerning a CPU utilization ratio of each of the plurality of servers from the computer system, and storing the data concerning the CPU utilization ratio into a CPU utilization ratio storage; obtaining processing history data generated in the computer system, generating data of a request frequency by users of the computer system, and storing the data of the request frequency into a request frequency data storage; and estimating an average delay time in each server by using the CPU utilization ratio of each server, which is stored in the CPU utilization ratio storage, and the request frequency stored in the request frequency data storage, and storing the estimated average delay time into a server delay time storage.
Thus, because the processing is carried out by using data that can be easily obtained such as the CPU utilization ratio and the processing history data, the analysis processing can be carried out while reducing the introduction cost, and without causing any problem on the security.
Furthermore, the aforementioned estimating may include: estimating an average consumed CPU time per one request for each server by using the CPU utilization ratio of each server, which is stored in the CPU utilization ratio storage and the request frequency stored in the request frequency data storage, and storing the average consumed CPU time into a consumed CPU time storage; and estimating an average delay time in each server by using the average consumed CPU time per one request for each server, which is stored in the consumed CPU time storage, and the CPU utilization ratio of each server, which is stored in the CPU utilization ratio storage, and storing the average delay time in each server into a server delay time storage.
In addition, in the aforementioned estimating the average consumed CPU time, the average consumed CPU time per one request for each server may be estimated by carrying out a regression analysis by using the CPU utilization ratio of each server in a predesignated time range and the request frequency. Thus, by limiting to the predesignated time range, it is possible to exclude the time range when the request by the user is not processed so much and to improve the calculation accuracy.
Furthermore, in the aforementioned estimating the average delay time, a pertinent coefficient value representing a relation between the average consumed CPU time per one request for the server and the average delay time in the server may be read out by referring to a matrix storage storing said coefficient values for each predetermined unit of the CPU utilization ratio, which is an element to determine the coefficient value and for each number of CPUs, and the average delay time in the server may be calculated from the coefficient value and the average consumed CPU time per one request for the server. Because the coefficient value is a function of the CPU utilization ratio and the number of CPUs, the coefficient value can be calculated each time. However, because the calculation amount is actually increased, the coefficient values may be stored in the aforementioned matrix storage in order to enhance the processing speed.
In addition, this invention may further include, when the plurality of servers included in the computer system are categorized according to job types to be executed, estimating the average delay time for each category. For example, in a computer in which layers are defined, the average delay time may be calculated for each layer as the category. For example, it is to extract a problem for each job.
Furthermore, this invention may further include estimating an average delay time for the entire computer system by using the data stored in the server delay time storage, and storing the average delay time for the entire computer system into a system delay time storage.
In addition, this invention may further include: obtaining an average actual measurement value of a response time for a request by a user, and storing the average actual measurement value into an average actual measurement value storage; and estimating a delay time, which occurs in a portion other than the servers, by a difference between the average actual measurement value stored in the average actual measurement value storage and the average delay time of the entire computer system, which is stored in the system delay time storage. When the delay time, which occurs in the portion other than the server is shorter than the average delay time of the entire computer system, the estimation is improper because of any reasons, and it also becomes possible to detect such a case.
Furthermore, this invention may further include: calculating, for each category, a correlation coefficient between a total sum of the average consumed CPU times and the request frequency, determining a confidence degree of the average delay time for each category based on the correlation coefficient, and storing the confidence degree into a confidence degree data storage; and correcting the average delay time for each category based on the confidence degree of the average delay time for each category, which is stored in the confidence degree data storage, and storing the corrected average delay time into a storage device. For example, as for the average delay time whose confidence degree is high, the average delay time is used as it is, and as for the average delay time whose confidence degree is low, the average delay time is largely corrected.
Furthermore, the aforementioned correcting may include: sorting the average delay times in descending order of the confidence degree; accumulating the average delay times for each category in the descending order of the confidence degree, and identifying an order of the confidence degree at which the accumulated average delay time becomes the maximum value less than the delay actual measurement value; and correcting the delay time in a next order of the identified order of the confidence degree to a difference between the delay actual measurement value and a value obtained by accumulating the average delay times for each category in the descending order of the confidence degree up to the identified order of the confidence degree.
In addition, this invention may further include: when the request frequency is experimentally changed, for example, changing the CPU utilization ratio of each server according to the changed request frequency, and storing the changed CPU utilization ratio into the storage device; estimating the average delay time for each server by using the changed CPU utilization ratio for each server, which is stored in the storage device, and storing the estimated average delay time into the storage device; and outputting the average delay time for each server before and after the change, which are stored in the server delay time storage and the storage device, in a comparable manner. It is possible to know how the delay time is changed for the change of the request frequency.
In addition, this invention may further include: when the number of CPUs is experimentally changed, for example, changing the CPU utilization ratio of each server according to the changed number of CPUs, and storing the changed CPU utilization ratio into the storage device; estimating the average delay time in each server by using the changed CPU utilization ratio of each server, which is stored in the storage device, and the changed number of CPUs, and storing the estimated average delay time into the storage device; and outputting the average delay times of each server after and before the change, which are stored in the server delay time storage and the storage device, in a comparable manner. When increasing the number of CPUs, for example, it is possible to try how much the delay time is decreased, and the reasonability of the investment can be judged from the effect.
This invention may further include: when the number of servers is changed, calculating an average consumed CPU time per one request for each server according to the changed number of servers, and storing the calculated average consumed CPU time into the storage device; calculating a CPU utilization ratio for each server after the change by using the number of CPUs and the average consumed CPU time per one request for each server after the change, which is stored in the storage device, and storing the calculated CPU utilization ratio into the storage device; and estimating an average delay time for each server after the change by using the average consumed CPU time per one request for each server after the change, which is stored in the storage device, and the CPU utilization ratio for each server after the change, and storing the estimated average delay time into the storage device. When the number of servers is increased, for example, it is possible to try how much the delay time is decreased, and the reasonability of the investment can be judged from the effect.
Furthermore, this invention may further include estimating an average delay time for each category defined by classifying the plurality of servers in the computer system according to a job type to be executed by using the average delay time for each server after the change, which is stored in the storage device, and the changed number of servers, and storing the estimated average delay time into the storage device.
Incidentally, it is possible to create a program for causing a computer to execute the aforementioned analysis method. The program is stored into a storage medium or a storage device such as a flexible disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, or a hard disk. In addition, the program may be distributed as digital signals over a network in some cases. Incidentally, data under processing is temporarily stored in the storage device such as a computer memory.
[Principle of this Invention]
A. Derivation of a Theoretical Value Xˆ (a Symbol That ˆ is Attached on the Top of X is Also Indicated as “Xˆ”) of an Average Delay Time in a Web System Model
A-1. Modeling of the Delay Time of a Single Server
First, by using
From the expressions (1) to (3), the average stay time T(C, λ, ρ) in the server S satisfies the following relation. Incidentally, α represents a ratio of requests that reach the server S.
A-2. Modeling of the Delay Time in the N-Th Server Layer
Here, by using a delay model in the single server, an average delay time of the requests in a specific single layer of plural layers is calculated.
Different roles such as a Web server used as a front end for the user, an application server for dynamically processing the requests and the like are respectively assigned to each layer.
Then, when the request frequency to the n-th layer server S(n,m) is λ(n,m), the average delay time in the server S(n,m) can be represented by T(C(n,m), λ(n,m), ρ(n,m). In addition, when the total sum of the requests input into the n-th layer is αnλall, and those are evenly assigned to Mn servers, the following expressions are satisfied.
Because the requests are evenly assigned to each server, the average delay time W, of all the requests in the n-th layer is an average of the average delay times of all the servers existing in the n-th layer.
Here, Wn is represented by using the expressions (1) to (4) as follows:
Here, in order to simplify the notation, Hn is defined as follows:
A-3. Modeling of the Delay Time in the Entire System
Here, by using the delay model in each layer, the modeling of the delay time in the entire system is carried out. After the servers from the first layer to the n-th layer are used, the number Rn of requests leaving from the system among all the requests is as follows:
Rn=(αn-αn+1)λall(α1=1 , αN+1=0) (8)
In addition, after the servers from the first layer to the n-th layer are used, the average delay Ln of the requests leaving from the system is as follows:
In addition, the following relation is satisfied from the definition.
Ln−Ln−1=Wn (10)
Because the average delay time Xˆ per one request is represented by the product of the delay for requests leaving from the system after the servers from the first layer to the i-th layer are used and a ratio of the requests for all the requests, the average delay time Xˆ is represented as follows:
As the aforementioned results, when considering the average delay time of all the requests, H, represents the delay, which occurs in each layer, and it can be said that its total sum Xˆ represents the average delay time in the entire system for all the requests.
[Specific Processing]
The delay analysis apparatus 120 is connected with the monitor target system 100, and carries out a processing by using the log data stored in the server log 111a and the CPU utilization ratio. Thus, different from the conventional arts, because there is no need to install any special mechanism into the monitor target system 100, the introduction of the delay analysis apparatus 120 is easy, and furthermore, because all the packets processed in the monitor target system 100 do not have to be analyzed, there is no need to use a storage having a large capacity, and the problem on the security does not occurs easily. The delay analysis apparatus 120 is connected to an input/output unit 121 such as a display device, mouse, keyboard and the like.
The request frequency obtaining unit 1201 receives the log data from the server log 111a of the monitor target system 100, and stores the log data into the log data storage 1203, and processes the log data stored in the log data storage 1203 to calculate a request frequency (req/sec), and stores the request frequency into the request frequency storage 1204. In addition, the request frequency obtaining unit 1201 processes the log data stored in the log data storage 1203 to calculate an average delay actual measurement value, and stores the average delay actual measurement value into the delay actual measurement value storage 1205. The CPU utilization obtaining unit 1202 obtains data of a CPU utilization ratio from the CPU utilization ratio obtaining unit 112 of the monitor target system 100r and stores the data into the CPU utilization ratio storage 1206.
The CPU time calculator 1208 refers to the request frequency storage 1204, the CPU utilization ratio storage 1206 and the system configuration data storage 1207 to calculate a consumed CPU time per one request, and stores the calculated data into the CPU time storage 1209. The server delay time calculator 1210 refers to the CPU time storage 1209, the G table storage 1211 and the CPU utilization ratio storage 1206 to calculate a delay time for each server, and stores the calculated data into the server delay time storage 1214. Incidentally, the server delay time calculator 1210 may refer to the request frequency storage 1204 and the system configuration storage 1207 when the G table storage 1211 is not referenced.
Furthermore, the layer delay time calculator 1215 refers to the server delay time storage 1214 and the system configuration data storage 1207 to calculate the delay time for each layer, and stores the calculated data into the layer delay time storage 1216. The system delay time calculator 1217 refers to the layer delay time storage 1216 and the system configuration data storage 1207 to calculate the delay time of the entire system, and stores the calculated data into the system delay time storage 1218. The remaining delay time calculator 1219 refers to the delay actual measurement value storage 1205 and the system delay time storage 1218 to calculate a remaining delay time consumed by other apparatuses other than the servers, and stores the calculated data into the remaining delay time storage 1220.
In addition, the confidence degree calculator 1221 refers to the remaining delay time storage 1220, the system configuration data storage 1207, the delay actual measurement value storage 1206, the request frequency storage 1204, the CPU utilization ratio storage 1206 and the layer delay time storage 1216, and when the remaining delay time consumed by other apparatuses other than the servers is less than 0, the confidence degree calculator 1221 calculates a confidence degree for the delay time of each layer, and stores the calculated confidence degree data into the confidence degree storage 1222. The delay time corrector 1223 refers to the layer delay time storage 1216 and the confidence degree storage 1222 to correct the delay time for each layer, and stores data of the corrected delay time into the corrected delay time storage 1224.
The performance prediction processor 1213 carries out a processing by using the CPU utilization ratio storage 1206, the system configuration data storage 1207, the CPU time storage 1209 and the request frequency storage 1204.
Incidentally, the input/output unit 121 can output the data in the respective storages in the delay analysis apparatus 120 to the display device or the like.
Next, the processing content of the system shown in
An example of the log data stored in the log data storage 1203 is shown below.
“192.168.164.108−−[14/Sep/2004:12:27:50+0900] “GET/˜hoge/SSSS/SSSS—20040816.pdfHTTP/1.1” 200 147067 “−” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)” 0.053”(Windows is the Registered Trade Mark.)
This is an example of a log picked in a custom log format in the Apache Web server. Generally, the logs are stored as the server log 111a under a directory /var/log/httpd/ of the Web server included in the monitor target system 100 or the like. This first section “192.168.164.108” represents an IP address of an access source client. The second and third sections are omitted. The fourth section “[14/Sep/2004:12:27:50+0900]” represents an access time. The fifth section “GET/˜hoge/SSSS/SSSS_20040817.pdf HTTP/1.1” represents an access content. The sixth section “200” represents the status (here, normal). The seventh section “147067” represents the number of transmitted and received bytes. The eighth section “−” represents a URL path requested. The ninth section “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)” represents a browser used in the access source client. The tenth section “0.053” represents a time (sec) consumed to handle the request.
Next, the input/output unit 121 accepts setting inputs of a period to be analyzed and a business time range, and stores the setting inputs into a storage device such as a main memory (step S3). The business time range means that a time range that the CPU time the server consumes for a processing other than requests from the users is few. By designating the business time range, it is possible to reduce an estimation error caused by consuming, by the server, the large CPU time when the request is few such as night.
Then, the request frequency obtaining unit 1201 reads out the log data in the designated period to be analyzed and the business time range from the log data storage 1203, and counts the requests processed for each one hour, for example, and divides the count value by 3600 seconds (=one hour) to calculate the request frequency λ per one second (req/sec), and stores the request frequency into the request frequency storage 1204. In addition, the request frequency obtaining unit 1201 adds the time consumed to handle all the requests every one hour, for example, and divides the added time by the number of requests to calculate an average delay actual measurement value, and stores the average delay actual measurement value into the delay actual measurement value storage 1205. Furthermore, the CPU utilization ratio calculator 1208 calculates an average CPU utilization ratio ρi(n,m) of each server S(n,m) for each one hour based on data of the CPU utilization ratio stored in the CPU utilization ratio storage 1206, and stores the average CPU utilization ratio ρi(n,m) into the CPU utilization ratio storage 1206 (step S5). When one server has plural CPUs, an average CPU utilization ratio of the plural CPUs is calculated to obtain the CPU utilization ratio of the server. Incidentally, i in the average CPU utilization ratio ρi(n,m) represents the i-th unit time (here, for each one hour). In addition, hereinafter, the word “average” may be omitted.
When the processing result until here is summarized, it is as shown in
Next, the CPU time calculator 1208 refers to the request frequency storage 1204, the CPU utilization ratio storage 1206 and the system configuration data storage 1207 to calculate a consumed CPU time per one request, and stores the consumed CPU time into the CPU time storage 1209 (step S7). In order to calculate the delay time, which occurs in each server, first, it is necessary to calculate how long the CPU time per one request is consumed in each server for the request frequency λi (req/sec) input from the outside to the entire system. However, when the average consumed CPU time per one request is calculated, as a following expression, by simply dividing the product of the CPU utilization ratio ρi(n,m) of the server S(n,m) in the unit time i and the number C(n,m) of CPUs by request frequency λi, the following problem occurs.
That is, in the server, generally, a few CPU time other than the processing of the request is consumed by the maintenance and the like of the system. When the request frequency is extremely small, because the ratio of such a CPU time becomes relatively large, the consumed CPU time per one request is estimated to be large and an error may be caused. That is, when, as shown in
In order to solve this problem, it is supposed that the consumed CPU time per one request 1/μ(n,m) is represented as follows:
Then, the consumed CPU time per one request 1/μ(n,m) is calculated by the regression analysis, and the approximation is carried out by the following expression.
As shown in
Incidentally, when the regression calculation is carried out, only data within the business time range designated by the user is used. In a case where all data in the period to be analyzed is used, when the batch processing or the like is carried out during the night in which the number of requests is small, and a phenomenon that a large CPU time is consumed occurs, a phenomenon that the CPU utilization ratio in a case where the number of requests is small is higher than one in a case where the number of requests is large occurs. Then, there is possibility that a large error in the estimation of the consumed CPU time per one request by using the regression calculation is caused. As shown in
The aforementioned regression calculation is described in detail. When drawing a straight line like an expression (13) for data (CPU utilization ratio ρ(n,m), the number C(n,m) of CPUs, which is the system configuration data, and the request frequency λi) in the business time range designated by the user among data in the period to be analyzed, the inclination 1/μ(n,m) and an intercept α(n,m) are calculated by the least-square method so that the deviation becomes the least, and stored into the CPU time storage 1209. However, when α(n,m) becomes negative, because the possibility that the inclination is excessively estimated is high, the intercept is set to “0”, and 1/μ(n,m) is calculated by carrying out the regression analysis as the following straight line again.
In addition, when the inclination 1/μ(n,m) becomes negative, it is judged that the average delay time per one request in the server cannot be analyzed, and a code representing it cannot be analyzed is stored into the CPU time storage 1209. When such a code is stored, the average delay time, which occurs in the layer in which the server is included, cannot be also analyzed.
Returning to the explanation of
However, when ρ=0, G(C, 0)=1.
Here, although 1/μ(n,m) represents the consumed CPU time per one request, this is equal to the average delay time, which occurs when the load is 0%. Then, when the load is p, it means that the delay becomes G(C, ρ) times of a case when the load is 0%.
G(C, ρ) is calculated by the number of CPUs and the CPU utilization ratio of the server, as shown in the expression (3). However, because it takes relatively long time to calculate the expression (3) as it is, when the grain size of the analysis has been determined, it is possible to calculate G(C, ρ) in advance by changing the number of CPUs and the CPU utilization ratio of the server. For example, when the grain size of the analysis is enough in 1% unit for the CPU utilization ratio and the assumed number of CPUs per one server is equal to or less than 50, G(C, ρ) is calculated in advance in respective cases of the CPU utilization ratio from 0 to 99% (1% interval) and the number of CPUs in the server from 1 to 50, and they are stored in the G table storage 1211 as a matrix 100×50. Then, when obtaining the number of CPUs from the system configuration data storage 1207, and obtaining the CPU utilization ratio from the CPU utilization ratio storage 1206, a value of G(C, ρ) can be obtained from the G table storage 1211.
Finally, the average delay time Ti(n,m) per one request, which occurs in each server, (hereinafter, also called as the average delay time of each server, simply) is calculated according to the expression (14), and stored into the server delay time storage 1214.
Next, the layer delay time calculator 1215 refers to the server delay time storage 1214 and the system configuration data storage 1207 to calculate the delay time Lin in each layer, and stores the delay time into the layer delay time storage 1216 (step S11). The delay time Lin, in each layer is the sum of the average delay times of the servers for each layer. Mn is obtained from the system configuration data storage 1207.
Then, the system delay time calculator 1217 refers to the layer delay time storage 1216 and the system configuration data storage 1207 to calculate the delay time Di of the entire system, and stores the delay time into the system delay time storage 1218 (step S13). The delay time Di of the entire system is the sum of the delay times Lin in each layer n, and is represented as follows:
N is obtained from the system configuration data storage 1207.
After that, the remaining delay time calculator 1219 refers to the delay actual measurement value storage 1205 and the system delay time storage 1218 to calculate the delay time Ei consumed in the portion other than the server, and stores the delay time into the remaining delay time storage 1220 (step S15). The delay time Ei is a difference between the delay time Di of the entire system and the delay actual measurement value Ai, and is calculated as follows:
Ai<Di means that the aforementioned estimation result is not proper, and in such a case, Ei=0 is set.
Then, in order to correct the delay time mainly in a case of Ei=0, the confidence degree calculator 1221 refers to the remaining delay time storage 1220, the layer delay time storage 1216, the system configuration data storage 1207, the request frequency storage 1204, the CPU utilization ratio storage 1206 and the delay actual measurement value storage 1205 to carry out a calculation processing of the confidence degree of the average delay time for each layer, and stores the processing result into the confidence degree storage 1222 (step S17). This processing is explained by using
The first item of the correl function in the expression (15) is the total sum of the consumed CPU time in the n-th layer. Incidentally, because the correlation coefficient is also used for the later calculation, that is held for each layer.
Then, the confidence degree calculator 1221 judges whether or not the correlation coefficient Rin is negative (step S33). In a case of the correlation coefficient <0, the confidence degree calculator 1221 sets the confidence degree Rin=0 (step S37). This is because it is assumed that the positive correlation exists between the consumed CPU time and the request frequency, and there is no meaning for the negative correlation. On the other hand, in a case of the correlation coefficient ≧0, the confidence degree calculator 1221 judges whether or not the estimated delay time Di of the entire system is longer than the average delay actual measurement value Ai (step S35). When Di>Ai is satisfied, the processing shifts to step S37 because impossible estimation is made and the calculated delay time itself has the low confidence. That is, the confidence degree calculator 1221 sets the confidence degree Rin=0. On the other hand, in a case of Di≦Ai, the correlation coefficient calculated at the step S31 is used as the confidence degree as it is.
Returning to the explanation of
Then, the delay time corrector 1223 adds the delay time of the layer in descending order of the confidence degree according to the sorting result, and identifies an order B of the confidence degree at which the added value becomes the maximum value less than the average delay actual measurement value (step S43). Here, it is assumed that Px=n represents the order of the confidence degree Rin of the n-th layer is the x-th from the top. Then, RiPx>RiPx+1 is always satisfied. Then, at the step S43, the maximum y satisfying the following expression is calculated. This is B.
It is unnecessary to correct the delay time of the layer whose confidence degree is one of 1st to B-th, which was calculated as described above. Therefore, the delay time corrector 1223 corrects the delay time of the layer whose confidence degree is the (B+1)-th as follows: (step S45). That is, the estimated delay time LiPx+1 of the (PB+1)-th layer is corrected, and the result is L′iPx+1. The correction result and the delay times of the layers, which is unnecessary to correct (layers whose confidence degree is one of 1st to B-th), are stored into the corrected delay time storage 1224.
This expression represents that the delay time of the layer whose confidence degree is the (B+1)-th so that the delay actual measurement value is equal to the total sum of the delay times (estimated average value) from the top of the confidence degree to the (B+1)-th among the confidence degree of each layer.
In addition, the delay time corrector 1223 corrects the confidence degree of the layer whose confidence degree is the (B+1)-th as follows (step S47). That is, the delay time corrector 1223 corrects the confidence degree RiPx+1 of the (PB+1)-th layer, and uses the result as R′iPx+1. The correction result and the confidence degree data of the layers, which are unnecessary to correct, (layers whose confidence degree is one of 1st to B-th) is stored into the corrected delay time storage 1224.
This expression represents that the confidence degree is corrected so that the smaller the difference between the delay time before the correction and the delay time after the correction is, the higher the confidence degree becomes.
Furthermore, the delay time corrector 1223 corrects the delay time and the confidence degree of the layer whose confidence degree is the (B+2)-th or the subsequent as follows (step S49). The correction result is stored into the corrected delay time storage 1224.
L′iPn=0 (n>B+1)
R′iPn=0 (n>B+1)
A specific example of this correction processing will be explained by using
Then, when the sorting is carried out at the step 541, as shown in
Therefore, as shown in
By carrying out such a processing, the correction so as to fit the estimated value to the actual measurement value is carried out.
Returning to the explanation of
The categorization of the confidence degree such as “high”, “middle” and “low”, which is described above, is based on values generally used for the judgment of the correlation strength in the correlation coefficient. That is, generally, when the absolute value of the correlation coefficient is equal to or greater than 0.7, it is judged that there is strong correlation between two parameters, when it is within a range from 0.3 to 0.7, it is judged that there is weak correlation, and when it is equal to or less than 0.3, it is judged that there is almost no correlation. This is because the square of the correlation coefficient is an explanatory rate of the variance. Then, when the correlation coefficient is 0.7, the explanatory rate is 0.49 (about 50%). That is, about a half of the variance of the dependent variable can be explained by the explanatory variable. In addition, when the correlation coefficient is 0.3, the explanatory rate is 0.1 (about 10%), and because the variance caused by the explanatory variable among the variance of the dependent variable is only about 10%, it is judged that there is almost no correlation between the explanatory variable and the dependent variable.
Similarly considering in this embodiment, when the correlation coefficient is equal to or greater than 0.7, there is enough correlation between the CPU utilization ratio and the request frequency, and because the consumed CPU time per one request can be appropriately estimated, it is considered that the confidence degree becomes high. In addition, when the guidance of the relation between this confidence degree and the prediction error is obtained from the experimental result in the experiment environment, the possibility is high in which the prediction error is about within ±50% in a case of the confidence degree “high”, the prediction error is about within ±100% in a case of the confidence degree “middle”, and the prediction error is greater than ±100% in a case of the confidence degree “low”. However, this result is mere guidance based on the experimental result after all, and the aforementioned accuracy (error range) is not secured.
By carrying out such a processing as described above, it becomes possible to calculate the delay times of each server, each layer and the entire system by using the elements, which already exist in the monitor target system 100. In addition, it is possible to correct the delay time from the relation with the delay actual measurement value, and further present the confidence degree for the user.
Next, the performance prediction using the aforementioned model will be explained.
First, the estimation of the delay time change at the request frequency change will be explained by using
Then, the layer delay time calculator 1215 calculates the delay time in each layer after the change by using the delay time T′i(n,m) of each server after the change, which is stored in the server delay time storage 1214, and stores the calculated delay time into the layer delay time storage 1216 (step S55). Furthermore, the system delay time calculator 1217 calculates the delay time of the entire system after the change by using the delay time in each layer after the change, which is stored in the layer delay time storage 1216, and stores the calculated delay time into the system delay time storage 1218 (step S57).
After that, the input/output unit 121 outputs each delay time and the like before and after the change (step 559). Thus, the user can investigate the change of the delay time according the change of the request frequency.
Next, the performance prediction at the change of the number of CPUs will be explained by using
Then, the layer delay time calculator 1215 calculates the delay time in the layer relating to the change by using the delay time T′i(n,m) of the server after the changer which is stored in the server delay time storage 1214, and stores the calculated delay time into the layer delay time storage 1216 (step S67). Furthermore, the system delay time calculator 1217 calculates the delay time of the entire system after the change by using the delay time in each layer, which is stored in the layer delay time storage 1216, and stores the calculated delay time into the system delay time storage 1218 (step S68).
After that, the input/output unit 121 outputs each delay time before and after the change (step S69). Thus, the user can consider the change of the delay time according to the change of the number of CPUs. For example, by using this result, he or she investigates the effect in a case where the number of CPUs is increased.
Next, the performance prediction at the number of servers will be explained by using
Incidentally, α(n,m) is an intercept obtained when 1/μ(n,m) is calculated, and is stored in the CPU time storage 1209. Therefore, this value is used.
Next, the server delay time calculator 1210 calculates the server delay time after the change by using the CPU utilization ratio ρ′ after the change, which is stored in the CPU utilization ratio storage 1206, and the consumed CPU time 1/μ′(n,m) per one request after the change, which is stored in the CPU time storage 1209, and stores the calculated server delay time into the server delay time storage 1214 (step S77). The server delay time T′i(n,m) after the change is represented as follows:
Then, the layer delay time calculator 1215 calculates the delay time in each layer by using the server delay time T′i(n,m) after the change, which is stored in the server delay time storage 1214, and stores the calculated delay time into the layer delay time storage 1216 (step S79). Incidentally, also at this step, Min from the performance prediction processor 1213 is used for the following calculation.
Incidentally, L′in is represented from the expression (16) as follows:
Furthermore, the system delay time calculator 1217 calculates the delay time of the entire system after the change by using the delay time in each layer, which is stored in the layer delay time storage 1216, and stores the calculated delay time into the system delay time storage 1218 (step S81).
After that, the input/output unit 121 outputs each delay time before and after the change (step S83). Thus, the user can consider the change of the delay time according to the change of the number of servers. For example, by using this result, he or she investigates the effect when the number of servers is increased.
Although the embodiment of this invention is described above, this invention is not limited to this. For example, the functional block diagrams shown in
Incidentally, the aforementioned delay analysis apparatus 120 is a computer device as shown in
This application is a continuing application, filed under 35 U.S.C. section 111(a), of International Application PCT/JP2004/016051, filed Oct. 28, 2004.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP04/16051 | Oct 2004 | US |
Child | 11739946 | Apr 2007 | US |