This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2014-215689, filed on Oct. 22 2014, the entire contents of which are incorporated herein by reference.
The present inventions relates to a computer-readable recording medium having stored therein an analysis program, an analysis apparatus, and an analysis method.
In an application program, a network service, or the like, attempts have been made to find a delay point or an abnormal point (see, for example, Patent Document 1 listed below).
A processing performed by an application program or a network component is realized by passing through a plurality of modules A to D, such as a processing sequence of start-A-B-C-D-end, and each of the modules may be used by a plurality of processings. Therefore, the application program or the network component performs a processing using a common module in the plurality of processings.
A delay of a particular module causes a delay in a plurality of relevant processings. To identify a module being delayed, the following technique is known (for example, see Patent Document 2 listed below).
For example, an analysis apparatus stores path information which identifies modules used by each of a plurality of processings from a result of performing the plurality of processings and extracts log information output during a predetermined time interval, which includes a time point at which a delay of a processing time is detected from the log information including processings and processing times. Therefore, the analysis apparatus can identify a module that causes a processing delay based on the extracted log information and the path information.
In the above-described technology, information on modules through which the plurality of processings pass (path information) is generated from the result of performing the processings in advance.
An analysis apparatus assumes that the path information generated in advance is correct, but there is a possibility that the generated path information is incorrect, in other words, a module identified by the path information is different from a module used in a processing included in extracted log information. Such a situation occurs when there is a different condition, such as, a performance timing or a designated parameter of each processing related to the path information or the log information, for example, when a different module is performed due to condition branch.
When the path information is not correct, the analysis apparatus is hard to identify a module used in processing included in the extracted log information and may have a difficulty to identify a module that is a cause of delay of processing.
According to an aspect of the embodiments, a computer-readable recording medium having stored therein an analysis program causes a computer to execute the following process. The process includes: storing information on modules through which each processing passes with respect to each of a plurality of processings in which shared modules exist; and determining a normal or abnormal state of each of the processings which are performed during a predetermined time interval based on log information related to the plurality of processings which are performed during the predetermined time interval. In addition, the process includes correcting the information on the modules according to each of the processings which are performed during the predetermined time interval, based on a predetermined condition, when an abnormal module is not identified in a process of identifying the abnormal module by using a determination result of the normal or abnormal state and the information on the modules according to each of the processings which are performed during the predetermined time interval. Furthermore, the process includes identifying the abnormal module by using the determination result of the normal or abnormal state and the corrected information on the modules.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Hereinafter, embodiments will be described with reference to the drawings. However, the embodiments to be described below are merely exemplary and are not intended to exclude a variety of unspecified modifications or technical applications. Note that, in the drawings referred to in the following embodiments, parts assigned with the same reference numerals indicate the same or similar parts unless otherwise particularly mentioned.
The AP server 40, for example, includes a pre-analysis block 401, an operation block 402, a user request database 403, and a path information database 404. Optionally, the AP server 40 may include an appearance probability database 405.
The AP server 40 may include a Central Processing Unit (CPU), a memory, such as a Random Access Memory (RAM) or the like, a storage device, such as a hard disk device, a display device, a printer, and the like (which are not illustrated in the drawings), as an example of a processing unit (not illustrated). In the AP server 40, the CPU implements a necessary function unit by reading and operating a predetermined program from the memory or the storage device. For example, the program includes an analysis program as an example of the program that implements the function of the pre-analysis block 401 or the operation block 402. The display device or the printer, for example, may output the results of operations by the CPU. Note that, other server 20 or the web server 30 may include a CPU, a memory, a storage device such as a hard disk device or the like, a display device, and a printer as hardware devices.
The function of the analysis program (all or partial function of each unit) is implemented in such a manner that a processing unit, such as, the CPU or the like executes the predetermined program.
The program, for example, is provided in a form of being stored in a computer-readable recording medium such as a floppy disk, a Compact Disc-Read Only Memory (CD-ROM), a CD-R, a CD-RW, an Magneto-Optical Disc (MO), a Digital Versatile Disc (DVD), a blue-ray disk, a portable hard disk, a Universal Serial Bus (USB) memory, or the like. In this case, the computer (processing unit of the computer) uses the program that is read from the recording medium, transmitted to an internal storage device or an external storage device, and stored in the storage device. Also, the program, for example, may be stored in a storage device (recording medium) such as a magnetic disk, an optical disk, an optical magnetic disk, or the like, and provided to the computer (information processing apparatus), such as the AP server 40, from the storage device via a communication line.
In addition, the computer, such as the AP server 40, can include means to read the program stored in the recording medium.
The application program includes a program code executing the functions of the analysis program in the computer as described above. Also, a part of the function may be implemented by not the application program but the Operating System (OS).
Also, the recording medium may use a variety of computer-readable media such as an Integrated Circuit (IC) card, a ROM cartridge, a magnetic tape, a punch card, an internal storage device (memory such as a RAM or a ROM) of the computer, an external storage device of the computer, or a printed matter with a printed code such as a bar code, in addition to the floppy disk, the CD-ROM, the CD-R, the CD-RW, the DVD, the magnetic disk, and the optical magnetic disk as described above.
The user request database 403, the path information database 404, and the appearance probability database 405, for example, are implemented in the memory or the storage device of the AP server 40.
The pre-analysis block 401, for example, includes a pre-data collection unit 410 and a path analysis unit 420.
The pre-data collection unit 410 inputs (transmits) data (request or the like) of the user request database 403 to the network 10 as virtual user data. Note that, the pre-data collection unit 410 may store an actual request, state, and the like of an actual operation and reproduce an operational state of an actual operation.
The path analysis unit 420, for example, collects message data flowing through the respective servers 20, 30 and 40 as the result by the input of the virtual user data, performs the path analysis, and stores the analysis result in the path information database 404 as the path information.
The operation block 402, for example, includes an operational data collection unit 430, a function selection unit 440, a data slicing unit 450, and a problem point identifying unit 460.
The operational data collection unit 430 collects, for example, Uniform Resource Locator (URL)+Common Gateway Interface (CGI) parameter or the like from data flowing through the servers 20, 30 and 40 during the actual operation in the operation phase as, for example, log data. Note that, in the actual operation, only information of a “front server” may be collected. The “front server” refers to a server closest to the user side, which receives the request from the user, as compared with “all servers” in the pre-analysis phase. In the configuration illustrated in
The function selection unit 440 compares the collected log data with the path information of the path information database 404, and perform the function selection (classification) of the log data.
The data slicing unit 450 performs processing of cutting a time interval in which normal and abnormal states are not mixed in each selected function (processing of calculating a state change timing). Details will be described below.
The problem point identifying unit 460 performs a delay detection in the time interval cut by the data slicing unit 450, and narrows or identifies a problem point by comparison with the path information when the delay is detected. There is a case where it is hard to narrow or identify a problem point when the path information is incorrect. In this case, the problem point identifying unit 460 may perform comparison with the path information after correction during a time interval in which delay id detected, and narrow or identify the problem point, by performing correction or re-generation (hereinafter, collectively referred to as “correction”) of the path information according to the below-described method.
The “functions” (or “processings”) as used herein are classified as follows.
First, captured actual data is collected or data is collected by reproducing (replaying) test data by the pre-data collection unit 410, and the path analysis unit 420 classifies the path of each function of the system.
For example, as illustrated in
When F1 and F2 are delayed more than usual, the problem point identifying unit 460 may determine in view of the analyzed path information that the paths (check points) p1, p2, p3, p4 and p5, through which F1 and F2 pass, have problem (abnormality) probability.
Also, for example, it may be determined that there are no problems in the common paths p1, p2, p3 and p4 of F1, F2 and F3 by information indicating that F3 and F4 are not delayed and path information of F3 and F4. As a result, the remaining path p5 may be diagnosed as the cause of the delay.
Note that, when the analysis target is a program, p1 to p5 may be processed as method (function) call unit, block unit, log output point unit designated by a user, or a combination thereof, as exemplarily described below.
As a simple example, as illustrated in
As illustrated in
Further, as illustrated in
(Analysis Phase)
As illustrated in
First, in the analysis block 401, the pre-data collection unit 410 inputs a request message to the servers 20, 30 and 40 by reproducing request data prepared in advance in the user request database 403 (data reproduction: processing P10). The processing is repeated until a predetermined end condition is satisfied (until determined as Yes in processing P20) (No route of processing P20). Note that, as the request data, those collected in the actual operation, those generated as test data, or the like may be used.
The pre-data collection unit 410 acquires data by capturing network data called by the data input in the data reproduction or by acquiring log data of the servers 20, 30 and 40 (processing P30).
Next, in the analysis block 401, for example, the path analysis unit 420 performs association processing on the acquired data and generates path information (processing P40). An example of the association processing is illustrated in
As illustrated in
Next, the path analysis unit 420 performs primary association processing on each selected type (processing P430). Further, the path analysis unit 420 checks whether a transaction is ended (processing P440). When all constituent data types of data are provided, it is determined as the transaction end (Yes route of processing P440), and the path analysis unit 420 performs secondary association processing on all constituent data types of data by using an identification key (processing P450). Note that, processings subsequent to processing P410 are repeated until it is determined as the transaction end (No route of processing P440).
The upper side of
Different types of data are secondarily associated by an identification key (for example, the transaction ID (t01, t02, and the like). Note that, all data do not necessarily have identification keys that are needed for the secondary association.
When the secondary association is completed, the path analysis unit 420 registers (stores) the associated result (processing P460).
When such association processing is completed, the path analysis unit 420 performs function extraction processing as illustrated in
The path analysis unit 420 registers the analysis result in the path information database 404 as the path information (processing P60). Note that, as described below, in order to improve the accuracy of the problem point identification, a method using appearance probability (frequency) information may be considered. In this case, the path analysis unit 420 stores the appearance probability information in the appearance probability information database 405 (see
(Operation Phase)
Next, an example of processing in the operation phase will be described with reference to
In the operation phase (operation block 402), the operational data collection unit 430 collects information such as the URL+CGI parameter and the response time among the actual operational data from the network switch 50 or the web server 30 (processing P100).
Next, in the operation block 402, the function selection unit 440 selects function units from the collected data, based on the parameters such as URL, CGI, and the like (processing P110).
Further, in the operation block 402, the data slicing unit 450 performs function extraction processing, that is, processing of cutting a time interval in which normal and abnormal states are not mixed in each selected function (processing of calculating a state change timing) (processing P120). Note that, when the selected function is not included in the path information, it is applied to the function of the path information.
Thereafter, the data slicing unit 450 registers (stores) the function and response information in an analysis target data table (not illustrated) as aggregation information (processing P130). An example of a registration form is illustrated in Table 1 below.
In the example of Table 1 above, an entry in which data appears in an interval identified by the interval ID is registered. F3 represents that no data has existed in that interval. Note that, the interval information corresponding to the interval ID, for example, may be managed in other table (interval table) illustrated in Table 2 below. A length of the interval may be different at each slice.
Next, in the operation block 402, the problem point identifying unit 460 determines whether the response is degraded (processing P140). The determination may be performed in single response unit or aggregation unit.
When the response is not degraded, the operation block 402 repeats the processings subsequent to processing P100 (No route of processing P140). On the other hand, when the response is degraded (Yes in processing P140), the problem point identifying unit 460 performs the problem point identification by comparing the aggregation information and the path information (processing P150).
When the problem point identification is possible (Yes route of processing P160), the problem point identifying unit 460 outputs the information of the identified problem point on the display device or the like (processing P170). At this time, when a plurality of candidates exists, the plurality of candidates may be output after assigning priorities thereto. However, the assignment of the priorities may be omitted.
An example of output data is illustrated in
Herein, when wanting to know more information about the delay point, a details display window 510 illustrated on the right side of
When the problem point identification is impossible (No route of processing P160), the problem point identifying unit 460 performs correction of the path information and performs identification of a problem point similarly to processing P150 based on the path information after correction (processing P180). At this time, the problem point identifying unit 460 may perform storage of the corrected path information (accumulation), updating of the path information of the path information database 404, or the like (processing P190).
When the problem point is identified by correction of the path information, the problem point identifying unit 460 outputs information on the identified problem point to a display device or the like (processing P170).
Next, in connection with processing P120 and P130 of
In
Even in the same functions, normal data and abnormal data may be mixed depending on timings. In that case, the aforementioned narrowing using the matrix cannot be performed.
For example, when a threshold value of the response time is 1 second (abnormal if equal to or more than 1 second, and normal if less than 1 second), the analysis may be imprecision even if the average 1 second is determined as abnormal (see, for example, an arrow 601 of
The data, in the data slicing unit 450 automatically seperates the region (time interval) in which the normal and abnormal states are not mixed, allowing for narrowing.
As an example of the basic processing, first, a timing of a change of the normal and abnormal states is calculated by each URL, and a time interval in which the normal and abnormal states are not mixed is separated by each URL, based on the corresponding timing. In a range where each time interval is superimposed, a matrix is made and an operation is performed (an abnormal module being a problem point is calculated (detected), based on “relationship information” between the plurality of processings (or functions) and the modules).
Note that, the “relationship information” (path information) may be appropriately updated. For example, the request data in the actual operation phase is stored in the user request database 403, and when unknown data having not appeared in the pre-analysis phase appears in the actual operation phase, the “relationship information” is updated by performing the pre-analysis again by using the stored request data.
However, if the interval in which the normal and abnormal states are not mixed in one URL is cut by a plurality of URLs, the interval is cut into too small pieces and thus combinations (computation time) become enormous. Therefore, among processings exemplified in the following (a) to (c), the abnormal point narrowing is performed by only (a), (a)+(b), (a)+(c), or (a)+(b)+(c).
(a) A slice that does not include an abnormal state is excluded.
(b) An operation is performed by selecting a slice that covers more points (components) (for example, since the component used by the URL is already known (analyzed), a slice including more components by combination is selected. Candidates of the combination are prepared by previously calculating “can most components be included if which combination of URLs is covered”.
(c) A slice that covers more URLs is selected and an operation is performed.
(Solution to Minimize Aggregation Interval)
Although wanting to make the corresponding operation applicable by adjusting the aggregation interval, effective data cannot be found by merely shortening the aggregation interval. If the aggregation interval is excessively shortened, functions (URLs) appearing at the same time are reduced and thus the analysis is not effective. Also, if data suitable for the analysis in various durations are found while changing the duration, the combinations are exploded and an estimate of a computation amount becomes impossible.
For example, as indicated by reference numeral 602 in
(Determination in Superimposed Manner by Separating Normal Interval and Abnormal Interval)
For example, as illustrated in
(Case Example in Business Processing System)
A problem occurring when a new service (airline ticketing system) of a business processing system is provided will be described with reference to
There has been no problems at the beginning of the system operation, but slowdown of the system occurred after one month. The direct cause is the increase in the load of the reservation inquiry (p3) by the airline ticketing status (F3) and the post-settlement (F2) because the search of all cases is performed in the reservation inquiry (p3) and the reservation inquiry (p3) is performed regardless of the existence and non-existence of the airline ticketing in the post-settlement (F3) of the travel expense.
Since an operator cannot imagine the increase in the load of the airline reservation inquiry (p3) due to the post-settlement, it has taken a long time to separate problems.
(Occurrence of Symptom in Business Processing System)
For example, as illustrated in
Advance Preparation
First, the path analysis unit 420 (see
F1=//foo/ . . . pre-settlement:p1-p2-p3
F2=//boo/ . . . post-settlement:p1-p2-p3-p4-p5
F3=//bar/ . . . airline ticketing status:p1-p3-p5 (“http:”s are omitted from the above URLs)
Overview of Diagnosis
In a case where F1 is normal and F2 and F3 are delay, abnormal components are diagnosed. In a case where F2 and F3 are abnormal, it may be determined from the path information of F2 and F3 that there is a probability that p1, p2, p3, p4 and p5 (that is, all components in the case of the present example) are abnormal. Herein, since F1 is normal, the probability that p1, p2 and p4 are abnormal from the path information of F1 is excluded.
As a result, p3 (reservation inquiry) and p5 (DB2) are diagnosed as the cause of the delay. Note that, with respect to the abnormal components primarily separated by the diagnosis, a prompt attention is enabled by automatically performing additional monitoring or analysis.
First, the data slicing unit 450 classifies the normal interval and the abnormal interval at each path (processing P221), and generates slices of all intervals in a range where the normal interval and the abnormal interval are not mixed at each path (processing P222).
Next, the problem point identifying unit 460 (see
The problem point identifying unit 460 updates a narrowing degree and records a more narrowed slice (processing P228). Next, the problem point identifying unit 460 determines whether the abnormal point can be identified (processing P229). When the abnormal point can be identified (Yes in processing P229), the problem point identifying unit 460 performs notification processing for example, by displaying information of the identified abnormal point on the display device or the like (processing P230).
Note that, when the abnormal interval is not included in the slice (No in processing P225) or when the abnormal point cannot be identified (No in processing P229), all processings proceed to processing P223. Meanwhile, when the next slice does not exist (No in processing P224), the notification processing is performed.
(Application to Business Processing System)
For example, as illustrated in
(Method of Classifying Mormal Interval and Abnormal Interval)
A sparse case where the normal interval and the abnormal interval exist in a sparse manner and a superimposed case where the normal interval and the abnormal interval exist in a superimposing manner may be considered.
(Sparse Case)
In the sparse case, classification by the following methods may be considered.
(Method 1) A request-response data (hereinafter, referred to as “RR data”) unit is set as the determination interval (see
(Method 2) The normal RR data are merged and set as the normal interval, and the abnormal RR data are merged and set as the abnormal interval (see
(Method 2-1) The RR data non-existence interval of the switch of the normal interval and the abnormal interval is neither normal nor abnormal and is treated as “no data” (see
(Method 2-1′) Like the above method 2-1, the RR data non-existence interval exceeding the threshold values of the normal interval and the abnormal interval is treated as “no data” (see
(Method 2-2) The interval may be switched at a timing where next RR data of the switch (different type (normal/abnormal)) of the normal interval and the abnormal interval appears (see
(Method 2-3) The interval is switched at an end timing of the last RR data of the same type (same normal/abnormal type) of RR data (see
(Method 2-4) The interval is switched at a middle point of the normal RR data group and the abnormal RR data group (see
Basically, the method 2-1 or the method 2-1′ is used, and a case where the RR data non-existence interval is long may be treated as “no data”. This is because a correct result is not obtained even when identification processing is performed using the matrix based on ambiguous information (even when data does not exist, it is treated as normal). However, in a case where RR data are too small and interval information necessary for analysis is incomplete, identification processing may be performed at the expense of accuracy, for example, by loosening the threshold value.
(Superimposed Case)
When the RR data are superimposed as illustrated in
(Method 1) The interval is separated at a start timing (appearance timing) of different types (normal/abnormal) of next RR data (see
(Method 2) The interval is separated at an end timing of a previous type of the last RR data upon appearance of different types (normal/abnormal) of RR data (see
(Method 3) The interval is separated as the normal interval at the start of the normal RR data, and the interval is separated at the end of the normal RR data (see
(Variation)
A timing covering components as many as possible may be found. This is because as more components appear, the narrowing degree is high. Also, a timing where functions (for example, URL type) are gathered as many as possible may be found. This is because as there are more patterns, the narrowing is easier.
For example, at a timing A illustrated in
Further, it may wait until a plurality of RR data of the same function (for example, URL) appears. This is because just one may be a chance. For example, at the timing A illustrated in
(Analysis Apparatus Notifying Point Having Conflict Probability)
As schematically illustrated in
(Detection of Concrete Conflict)
It is notified that the narrowing is actually possible as the problem point.
(Detection of Implicit Conflict)
The point that does not appear as a common problem point but generates the same problem at a high probability if a problem occurs is notified as an implicit conflict (it should not conflict but it conflicts with something in the back). This corresponds to a combination of a short-term analysis and a long-term analysis in a certain sense.
(Notification of Conflict-Possible Point)
The point is notified as the conflict-possible point, including the concrete conflict and/or the implicit conflict. The accuracy may be ranked from the narrowing degree and the simultaneous generation probability.
(Accuracy is Improved by Supplement with Information Upon Analysis)
When the narrowing is impossible in the information of the analysis phase, a point that “the identification is possible if no problem (or deterioration) is proved by this checkpoint” is extracted. For example, in
A flow of generating the supplementary table is illustrated in
For example, the path analysis unit 420 (see
As the checking result, when the point exists (Yes in processing P312), the path analysis unit 420 extracts all function IDs passing through the currently targeted point (key point) (x) (processing P313). For example, in
Next, the path analysis unit 420 extracts all points (Y) used by the extracted function ID group (processing P314). For example, when the extracted functions are F1 and F3, p1, p2, p3, p4 and p5 are extracted. Also, when the extracted functions are F1, F2, F2, F3 and F4, p1, p2, p4 and p5 are extracted.
When there is a point (exclusive point) (z) not passing through self-function (a) in a point combination (x)-(Y) for each function ID (a), the path analysis unit 420 outputs a combination with (x) to the table (processing P315), and returns to processing P311.
For example, in the function F1, all pass through (Y)=p1, p2, p4 and p5. In the function F3, p5 does not pass through (Y)=p1, p2, p4 and p5. In this case, the path analysis unit 420 outputs the record of p4, p5 and F3 to the table. The corresponding record means that the function F3 passes through p4 but does not pass through p5 (see
Also, the function F1 passes through all points. The function F2 does not pass through the points p4 and p5. Therefore, the path analysis unit 420 outputs the record of (p1, p4, F2) and (p1, p5, F2) to the table. Also, since the function F3 does not pass through the point p5, the path analysis unit 420 outputs the record of p1, p5, (F2), and F3 to the table. Also, since the function F4 does not pass through the point p4, the path analysis unit 420 outputs the record of p1, p4, (F2), and F4 to the table.
In this manner, with respect to the path information illustrated in
Note that, it is more efficient if preparing for a table (index) such that the “candidate request” can be extracted by narrowing. A flow of supplementing data when data is lacking is illustrated in
The path analysis unit 420 performs analysis (processing P321), and checks whether a plurality of candidates exists (processing P322). For example, in the actual operation phase, p4 and p5 become delay candidates when data in which the function F1 is abnormal and the function F2 is normal exists and data regarding the functions F3 and F4 does not exist.
When the plurality of candidates exists (Yes in processing P322), the path analysis unit 420 divides points of the candidates (processing P323). For example, when the candidates are p4 and p5, the points of the candidates are divided into p4 and p5.
Next, the path analysis unit 420 searches the exclusive point table (see, for example,
The path analysis unit 420 checks whether the exclusive point exists (processing P325). When the exclusive point exists (Yes in processing P325), reanalysis is performed by searching a found function group from data of the pre-analysis phase and re-inputting the searched function group (processing P326). For example, data corresponding to the functions F3 and F4 found in processings P324 and P325 are re-input and analyzed.
By re-inputting the “candidate request”, the information of a deficient targeted check point can be supplemented, and the problem point can be narrowed (identified). For example, if there is no problem by re-inputting the request corresponding to the function F3, it may be determined (identified) that the cause of deterioration is p5.
Note that, when the narrowing (identification) cannot be performed, other “candidate request” may be used. For example, if the deterioration is caused by re-inputting the request corresponding to the function F4, it may be determined that p4 is suspected as the cause of deterioration. The reliability may be increased by re-inputting a plurality of requests.
(Accuracy Improvement 1 using Appearance Probability)
In order to improve the accuracy of the problem point identification, a method using an appearance probability (frequency) may be considered.
(Pre-Analysis Phase)
For example, as illustrated in
Herein, the path of p1-p2-p3 is set as F1-1, and the path of p1-p2-p3-p4 is set as F1-2. The parameter of F1 alone cannot classify which one of F1-1 and F1-2 the function F1 passes through, but can identify which path the function F1 passes through in the pre-analysis phase. Thus, the path analysis unit 420 counts each frequency. As a result, the appearance probability of F1, for example, may be prepared as follows: F1-1 is 70% and F1-2 is 30%.
(Actual Operation Phase)
The information of the actual operation phase alone can know that the function is F1 by the parameter, but cannot identify whether the path is the F1-1 path or the F1-2 path. In a case where F1 has good response at a probability of 70% and has bad response at a probability of 30%, it may be estimated by the problem point identifying unit 460 that the point of p4, which is a difference between F1-1 and F1-2, is the cause of deterioration.
(Flow Using Appearance Probability)
In processing P60 of the flow in the pre-analysis phase illustrated in
As illustrated in
For example, the data and the function are associated as follows: “data 1:F1=◯”, “data 2:F1=◯”, “data 3:F2=◯”, “data 4:F3=×”, and “data 5:F1=×”.
Next, the path analysis unit 420 assembles a data group of functions having a plurality of paths (processing P332). For example, it can be known from the frequency information table illustrated in
Further, the path analysis unit 420 calculates a normal to abnormal ratio with respect to data where a plurality of paths exists in one function (processing P333). In the case of the above-described example, 66.7% is normal and 33.3% is abnormal.
The path analysis unit 420 checks whether it can be considered that the normal to abnormal ratio of data is equal to the frequency information (processing P334). In the case of the above-described example, since 66.7% is normal and 33.3% is abnormal, it can be considered as equal to each other. When considered as equal to each other (Yes in processing P334), the path analysis unit 420 associates the frequency information with appropriate path information (processing P335). On the other hand, when not considered as equal to each other (No in processing P334), the path analysis unit 420 treats a high-frequency path as representative data (processing P336).
(Accuracy Improvement 2 Using Appearance Probability)
As illustrated in
The pre-data collection unit 410 reproduces the request data stored in the user request database 403, and the path analysis unit 420 counts a frequency of the request data passing through each function as illustrated in
In the actual operation phase, the function selection unit 440 counts the appearance frequency of each check point (pi) (see
In
In connection with the processings P150 to P190 in
There is a case in which a component which is different from a component used in the pre-analysis phase are used in the operation phase. In this case, the problem point identifying unit 460 has a difficult to identify the problem point.
As an example, as illustrated in
In addition,
As a case where a situation as illustrated in FIGS.
43A and 43B occurs, there is a case where processing content in a function Fi is branched by an internal state (a time point, a value obtained from DB, or the like) or a parameter which is hard to identify from externally or the like. In this case, if the condition branch of the function Fi has not been covered in the pre-analysis phase, in the function Fi, different components may be used in a pre-analysis phase and an operation phase. Even in any pattern of
In the present embodiment, a case where it is impossible to find out a component pj (j is natural number) which becomes a problem point from an abnormal function Fi is grasped as an “inconsistency state” between the path information and a system state (a path of the function Fi in the case of being recorded in the log information). The inconsistent state may refer to a state in which a component of a normal function hides a component of the abnormal function Fi.
(Processing in Case of Detecting Inconsistent State)
Processing in a case where a problem point is not identified and an inconsistent state is detected in processing P160 of
As an example of a general processing, when detecting the inconsistent state in identification of a problem point (processings P150 and P160 in
In the upper sides of
As correction methods of the path information, there are the following methods.
(Method I) At least one function is deleted based on a predetermined condition from a plurality of functions including normal functions and abnormal functions.
In this method, as illustrated in
(Method I-1) In the example of
The reliability is an example of an evaluation value indicating a possibility that a normal function passes through an abnormal module candidate. In the example of
As illustrated in the lower side of
In addition, an example in the upper side of
(Method I-2) In the example of
As illustrated in the lower side of
In addition, an example in the upper side of
Although the examples in which a problem point is identified through the aforementioned (Method I-1) and (Method I-2) has been illustrated in
For example, in the upper side of
As an example, the problem point identifying unit 460 may delete all normal functions (in this case, F3 and F4) using p3 and identify p3 as a problem point in the abnormal function F1. Based on the identified result, a pattern (pattern of
Alternatively, the problem point identifying unit 460 may delete all normal functions (in this case, F2 and F6) using p4 and identify p4 as a problem point in the abnormal function F1. In this case, a pattern (pattern of
When the inconsistent state is detected, the problem point identifying unit 460 can determine which method is used among (I-1) and (I-2) of Method (I), and (II-1) and (II-2) of Method (II), through the below-described method. Alternatively, the problem point identifying unit 460 may select an optimal result (the number of components which are narrowed, for example) from results identified by using such two or more methods.
Although at least one function is described as being deleted from a plurality of functions in (Method I), all components (all components in the record of path information) using the function may be deleted instead of deleting all functions (record of path information).
(Method II) At least one component is changed based on a predetermined condition, from a plurality of functions including normal functions and abnormal functions.
In this method, as illustrated in
(Method II-1) As illustrated in the upper side of
As illustrated in the lower side of
In addition, in a case where at this timing (slice), another function (for example, F5) in which a request does not occur exists in the path information and the function F5 uses p4, since F5 is not related with an inconsistent state, p4 is not deleted.
(Method II-2) As illustrated in the upper side of
As illustrated in the lower side of
Even in
For example, in the upper side of
As an example, the problem point identifying unit 460 may delete p3 from all normal functions (in this case, F3 and F4) using p3 and identify p3 as a problem point in the abnormal function F1. Based on the identified result, a pattern (pattern of
Alternatively, the problem point identifying unit 460 may delete p4 from all normal functions (in this case, F2 and F6) using p4 and identify p4 as a problem point in the abnormal function F1. In this case, a pattern (pattern of
As in the example of
In addition, when a reliability of a deletion candidate component is not sufficiently low than a reliability of another component, the problem point identifying unit 460 may give top priority to addition of a component. “Reliability is sufficiently low” refers to, for example, a case where, with respect to a deletion target component, the number of components of a normal function is smaller than 1 and a reliability thereof is equal to or smaller than ½ of a reliability of another component. In the example of
In addition, in operation after identification of a problem point in which correction of the path information is performed by the aforementioned method, the problem point identifying unit 460 may perform analysis by using at least one of corrected path information and non-corrected path information and output an analysis result. For example, the problem point identifying unit 460 may use at least one of the corrected path information (F1, F3, and F4) and the non-corrected path information (F1 to F4) in operation after correction of the path information in the example of
Next, an example of processing in the case of detecting the inconsistent state in the operation phase will be described with reference to
In the following description, it is assumed that the following information is generated by the pre-analysis block 401.
Example of Path Information (Reference Information)
Example of Component Defining Information
Example of Parameter Ttable
Example of Delay Determination Threshold Value
In addition, the parameter table is information representing handling of parameters included in a URL of an access log in function selection. In a case where the access log illustrated in
In the example illustrated in
When information on the access log illustrated in
As described above, when it is impossible to identify a problem point in processing P160 of
First, as illustrated in
As another example, when the number of components is larger than a specific value (for example, 10), (Method II) may be selected, and when the number of components is equal to or smaller than the specific value, (Method I) may be selected.
Alternatively, for example, when the number of functions is larger than a specific value (for example, 20), (Method I) may be selected, and when the number of functions is equal to or smaller than the specific value and the number of components is larger than a specific value (for example, 30) or the number of functions×a specific number (for example, 1.5) (Method II) may be selected. In this case, if the condition is not satisfied, it may be possible to select (Method I).
Such a condition may be arbitrarily set by an administrator of the analysis apparatus. In addition, a certain correction method for the path information may be fixedly used, without selecting the correction method according to a condition as described above.
In addition, the problem point identifying unit 460 may perform correction of the path information and analysis of a problem point by using the certain correction method, and then perform re-analysis by using another correction method depending on an analysis result. The problem point identifying unit 460 may employ the correction method in which the analysis result is satisfactory (for example, in which the number of identified components is smaller). For example, in the case of selecting a certain correction method, when identification of a problem point is insufficient (sufficient narrowing is impossible), such as when many components are identified as problem points continuously as an analysis result, the problem point identifying unit 460 can perform re-analysis through another correction method.
Since an inconsistent state can be removed by single execution of correction processing, the problem point identifying unit 460 does not need to repeatedly perform the same correction method several times when the inconsistent state is detected.
Next, the problem point identifying unit 460 performs correction on the path information by using the selected correction method (for example, (Method II) because the number of functions: 4<the number of components: 5 in the example of
The problem point identifying unit 460 performs analysis by using the path information after correction and identifies a problem point (processing P183). In the example of
In addition, the problem point identifying unit 460 can perform management of the path information on which correction is performed after identification of a cause component (see processing P190 in
For example, as illustrated in
Next, the problem point identifying unit 460 determines whether the accumulated path information after correction satisfies a condition for replacement (processing P192). When the condition for replacement is satisfied (Yes route of processing P192), the problem point identifying unit 460 performs replacement of the path information (processing P193) and processing proceeds to processing P170 of
As the condition for replacement, there is a case in which, for example, the number of times of correction of a pattern with the largest number of times of correction is equal to or larger than a predetermined number of times (for example, 15 times), and is equal to or larger than predetermined times (for example, 10 times) the number of times of correction of a pattern with the second largest number of times of correction. In the example of
In addition, the condition for replacement of the path information is not limited to the aforementioned condition and several methods can be used.
Furthermore, when the problem point is identified (or narrowed), the problem point identifying unit 460 outputs the problem point in the form illustrated in
Even when there is correction of the path information, the problem point identifying unit 460 may omit to notify that there is correction, and therefore, may omit to display an indication of whether there is correction on the details display window 510.
Another output example of the problem point is illustrated in
In this case, as illustrated in the details display window 510 of the right side of
(Another Example 1 of Reliability Used in Correction of Path Information)
In the aforementioned (Method I) or (Method II), there has been described an example of correcting the path information by using the number of components of a normal function overlapping with a delay function (with respect to each component using the delay function, the number of normal functions that use the component) as reliability. The reliability is not limited to what is described above and may include, for example, the following information.
In the upper right side of
In the pre-analysis phase, the pre-analysis block 401 may count a frequency (the number of times) of a component used whenever a function is called, with respect to each component of each function, and store the frequency as frequency information.
Correction of the path information based on the example (Method II) of
In this case, the problem point identifying unit 460 may identify p3 instead of p4 in the abnormal function F1 by using F3=p1-p2-p5, and F4=p1-p2, resulting from deleting p3 from the normal functions F3 and F4 respectively, as the path information after correction. Based on the identified result, a pattern (pattern of
As described above, it is possible to perform high reliable analysis and improve analysis precision by performing weighting on a reliability by using frequency information representing how often a normal function uses a component used by an abnormal function.
In the example of
In addition, the example of
(Another Example 2 of Reliability Used in Correction of Path Information)
Although whether a component is used or not is used as frequency information every time when a function Fi is called in the pre-analysis phase in the example of
Depending on systems, there is a case where, when a function is called once, the same component is used a plurality of times. The pre-analysis block 401 may count the frequency (the number of times) thereof with respect to each component of each function, and store the frequency as the frequency information.
In addition, since a method for calculation of reliability and correction of path information based on the frequency information illustrated in
Even in the example of
(Another Example 1 of Correction Processing of Path Information)
The problem point identifying unit 460 may delete all deletion component candidates when there is no significant difference in a reliability of a component which is the deletion candidate (correction candidate).
For example, as illustrated in
In addition, in a case where there is not significant difference in reliability of a plurality of components which are narrowed as a problem point, the problem point identifying unit 460, as illustrated in
Since there is a high possibility that a component having a low reliability isn't originally used and is a delay cause, it is possible to output analysis result having high adequacy by ranking and outputting cause components as in the example of
Even when there is no significant difference between reliabilities of deletion candidate components, it is possible to limit deletion targets by previously setting an upper limit of components to be deleted. In the case of the example of
In addition, in addition to designation of an upper limit of components to be deleted, it is possible to set a deletion condition, such as a reference to determine whether there is a significant difference or a reference to determine the number of deletions, resulting in limitation of deletion targets.
Examples according to (Method II) have been described until now. Similarly, with respect to (Method I), in a case where there is no significant difference in reliability of components of deletion candidates, all functions of deletion candidates may be deleted or targets or functions of deletion targets may be limited.
(Another Example 2 of Correction Processing of Path Information)
When the correction rate of the path information before and after correction exceeds a designated value, the problem point identifying unit 460 may suppress the processing of correcting the path information even when detecting an inconsistent state.
The correction rate can be calculated as described below, for example.
Calculation of Whole
Calculation of Each Function Fi
The problem point identifying unit 460 performs calculation for the whole, and when a rate resulting from the calculation (correction rate) exceeds a designated value (for example, 10%), may omit to perform correction processing of the entire path information. In the example of
The problem point identifying unit 460 performs, for example, calculation for each function Fi, and when a rate of a calculation result exceeds a designated value (for example, 20%), may omit to perform correction processing of the path information with respect to a corresponding function. In the example of
In both the aforementioned examples, when the correction rate is as large as exceeding the designated value, an effect of correction of the path information on the path information as reference information increases. In addition, a possibility that all correction target components are problem points in practice decreases as the number of correction target components (the number of non-narrowed components) increases.
Therefore, when the correction rate exceeds the designated value, it is possible to improve precision of an analysis result by decreasing a possibility that imprecise (misled) analysis result is output.
In addition, (Method I) can be applied to the aforementioned example, not the described (Method II).
(Another Example 3 of Correction Processing of Path Information)
Even when the correction rate exceeds the designated value in the aforementioned example, it may be possible to find out whether correction target components at the this time (correction pattern) are reasonable, that is are all problem points, by records of accumulation of the path information after correction and a correction frequency (the number of times of corrections).
In other words, even when the correction rate exceeds the designated value, the problem point identifying unit 460 may perform the correction processing of the path information and accumulate the path information after correction in a memory, a storage device, or the like. In a case where a correction pattern of a case in which the correction rate exceeds the designated value is reasonable, as the operation block 402 operates, the number of times of correction of a correction pattern increases, and replacement of the path information is performed.
In this case, the problem point identifying unit 460 may suppress identification of a problem point through the path information after correction and output of an identification result, and output only a result of narrowing by the path information before correction.
(Another Example 4 of Correction Processing of Path Information)
Instead of correction of the path information by the aforementioned (Method I) or (Method II), the following (Method III) can be used.
(Method III) The path information is re-generated (corrected) by re-inserting, into a verification environment, a user request which occurs during a predetermined time interval including a timing at which an inconsistent state in which a problem point cannot be identified is detected.
For example, the operation block 402 captures a request packet always even in the operation phase. In addition, the pre-analysis block 401 obtains a detailed log by re-enacting an access in the verification environment (pre-analysis phase) and re-generates the path information by using capture data around timing at which the operation block 402 detects an inconsistent state.
The problem point identifying unit 460 may update the path information of the path information database 404 by the re-generated path information and accumulate the path information in the memory or the storage device as described above.
Therefore, since the path information for a component which actually uses a function causing an inconsistent state is re-generated, the problem point identifying unit 460 can accurately and easily identify the problem point by using the re-generated path information.
As described above, it is possible to identify the problem point and improve precision of the analysis result by re-enacting an access in the case of detecting the inconsistent state and re-generating the path information.
In addition, the aforementioned example may be performed by any combination of (Method I) to (Method III).
As described above, the problem point identifying unit 460 according to an embodiment corrects path information related to each processing performed during a predetermined time interval based on a predetermined condition even when the problem point is not identified in a processing of identifying a problem point. The problem point identifying unit 460 identifies an abnormal component by using a result of determining an abnormal or normal state and the corrected path information. Accordingly, it is possible to identify an abnormal component even when information on components through which each of a plurality of processings passes is incorrect, resulting in reduction in time used to resolve a problem when failure/trouble occurs.
According to an embodiment, it is possible to identify an abnormal module even when information on modules through which each of a plurality of processings passes is incorrect.
All examples and conditional language recited herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2014-215689 | Oct 2014 | JP | national |