This application is a National Stage of International Application No. PCT/JP2016/04844 filed Nov. 9, 2016, claiming priority based on Japanese Patent Application No. 2015-223053 filed Nov. 13, 2015, the disclosure of which is incorporated herein in its entirety by reference.
The present invention relates to a log analysis system, a log analysis method, and a log analysis program for performing log analysis.
In a system executed on a computer, logs each including a result of an event, a message, or the like are output. When analyzing logs in order to search for a cause of a system anomaly or the like, a user (operator or the like) is required to review a number of logs output from the system. In order to reduce burden on the user, it is demanded to output logs in a manner to be easily viewed.
Patent Literature 1 discloses an art of extracting logs in accordance with a keyword input by a user and displaying the appearance position of the logs in a temporal map. A use of the art of Patent Literature 1 enables a user to acquire logs including a particular keyword and know the timing or the distribution of appearances of the logs in a visible manner.
PTL 1: Japanese Patent Application Publication No. 2005-141663
A plurality of devices and programs are included in a general system, and multiple types of log data (for example, a log file or the like) are output from these devices and programs. However, since the art of Patent Literature 1 extracts logs based on whether or not a keyword is included, it is not possible to extract logs based on a correlation among logs from multiple types of log data. In order to analyze a correlation among a plurality of logs, the user is required to find a related part from multiple types of log data.
The present invention has been made in view of the problem described above and intends to provide a log analysis system, a log analysis method, and a log analysis program that can aggregate and output logs having a correlation.
The first example aspect of the present invention is a log analysis system including: a sequence determination unit that determines which predetermined sequence is matched with a plurality of logs of an analysis target log; and a log aggregation unit that, based on the sequence, aggregates and outputs the plurality of logs determined to match the sequence by the sequence determination unit.
The second example aspect of the present invention is a log analysis method including steps of: determining which predetermined sequence is matched with a plurality of logs of an analysis target log; and, based on the sequence, aggregating and outputting the plurality of logs determined to match the sequence.
The third example aspect of the present invention is a log analysis program that causes a computer to perform steps of: determining which predetermined sequence is matched with a plurality of logs of an analysis target log; and, based on the sequence, aggregating and outputting the plurality of logs determined to match the sequence by the sequence determination unit.
According to the present invention, a plurality of logs having a predetermined correlation can be aggregated and output based on whether or not the plurality of logs match a predetermined sequence.
While example embodiments of the present invention will be described below with reference to the drawings, the present invention is not limited to these example embodiments. Note that, in the drawings described below, those having the same function are labeled with the same reference, and the duplicated description thereof may be omitted.
The log analysis system 100 has a log input unit 110, a format determination unit 120, a sequence determination unit 130, a log aggregation unit 140, and an output unit 150 as a processing unit. Further, the log analysis system 100 has a format storage unit 161 and a sequence storage unit 162 as a storage unit.
The log input unit 110 acquires an analysis target log 10 and inputs the analysis target log 10 to the log analysis system 100. The analysis target log 10 may be acquired from the outside of the log analysis system 100 or may be acquired by reading those recorded in advance inside the log analysis system 100. The analysis target log 10 includes one or more logs output from one or more devices or programs. The analysis target log 10 is a log that is represented in any data form (file form), which may be binary data or text data, for example. Further, the analysis target log 10 may be recorded as a table of a database or may be recorded as a text file.
The analysis target log 10 is made of first log data 11 and second log data 12. The first log data 11 and the second log data 12 are recorded as separate data (for example, files, tables, or the like) each of which includes one or more logs. The first log data 11 and the second log data 12 are attached with identifiers (for example, a file name such as message.log or syslog.log), respectively, and thereby distinguished. The number of log data is not limited to the above and may be any number of one or more.
The format determination unit 120 is a form determination unit that determines which format prerecorded in the format storage unit 161 each log included in the analysis target log 10 (the first log data 11 and the second log data 12) conforms to and that uses the conforming format to separate each log into a variable and a common part. A format is a form of a known log. A variable is a changeable part in a format, and a common part is unchanging part in a format of a log. A value (including a number, a character string, and other data) of a variable in the input log is referred to as a variable value.
For example, the format determination unit 120 determines that a log on the second row of the second log data 12 conforms to a format whose ID is 223 in FIG. 2B. The format determination unit 120 then processes the log based on the determined format and determines the timestamp “2015/08/17 08:29:59”, the character string “SV003”, and the IP address “192.168.1.23” as variable values.
While represented by a list of character strings for better visibility in
The sequence determination unit 130 is a log analysis unit that performs determination of a sequence based on sequence information recorded in the sequence storage unit 162. The sequence information is information that defines the order (sequence) by which logs are output based on information on logs.
For example, because the format IDs of logs on the second row and the third row of the first log data 11 is 039 and the format ID of a log on the second row of the second log data 12 is 223, the sequence determination unit 130 determines that these logs match a sequence whose sequence ID is A in
The output unit 150 outputs aggregation results obtained by the log aggregation unit 140. In the present example embodiment, the output unit 150 outputs aggregation results to a display device 20 and displays the aggregation results to the user as an image. The display device 20 has a display unit such as a liquid crystal display, a cathode ray tube (CRT) display, or the like for displaying an image.
When the user selects a particular sequence heading C1 by using an input device, the window C displays developed logs C2 which match the sequence as illustrated in
The windows illustrated in
An output scheme of aggregation results is not limited to the image display to the user. For example, the output unit 150 may output aggregation results as data, and the log analysis system 100 or other systems may perform an analysis process, a statistics process, or the like on the data of aggregation results obtained from the output unit 150.
The communication interface 104 is a communication unit that transmits and receives data and is configured to be able to perform at least one of the communication schemes of wired communication and wireless communication. The communication interface 104 includes a processor, an electric circuit, an antenna, a connection terminal, or the like required for the above communication scheme. The communication interface 104 is connected to a network using the above communication scheme in accordance with signals from the CPU 101 for communication. For example, the communication interface 104 externally receives an analysis target log 10.
The storage device 103 stores a program executed by the log analysis system 100, data resulted from processing by the program, or the like. The storage device 103 includes a read only memory (ROM) that is dedicated to reading, a hard disk drive or a flash memory that is readable and writable, or the like. Further, the storage device 103 may include a computer readable portable storage medium such as a CD-ROM. The memory 102 includes a random access memory (RAM) or the like that temporarily stores data being processed by the CPU 101 or a program and data read from the storage device 103.
The CPU 101 is a processor as a processing unit that temporarily stores transient data used for processing in the memory 102, reads a program stored in the storage device 103, and performs various processing operations such as calculation, control, determination, or the like on the transient data in accordance with the program. Further, the CPU 101 stores data of a process result in the storage device 103 and also transmits the data of the process result externally via the communication interface 104.
The CPU 101 in the present example embodiment functions as the log input unit 110, the format determination unit 120, the sequence determination unit 130, the log aggregation unit 140, and the output unit 150 of
The log analysis system 100 is not limited to the specific configuration illustrated in
Further, at least a part of the log analysis system 100 may be provided in a form of Software as a Service (SaaS). That is, at least a part of the functions for implementing the log analysis system 100 may be performed by software executed via a network.
If the log to be determined does not conform to any of the formats recorded in the format storage unit 161 in step S102 (step S103, NO), the next log in the analysis target log 10 is designated as a log to be determined, and steps S102 to S103 are repeated.
If the log to be determined conforms to any format recorded in the format storage unit 161 in step S102 (step S103, YES), the format determination unit 120 uses the format to separate the log to be determined into a variable and a common part (step S104). The format determination unit 120 records variable values in the log to be determined.
If the analysis is not finished for all the logs in the analysis target log 10 (step S105, NO), the next log in the analysis target log 10 designated as a log is to be determined, and steps S102 to S105 are repeated.
If the analysis is finished for all the logs in the analysis target log 10 (step S105, YES), it is determined which sequence recorded in the sequence storage unit 162 is matched with the logs whose formats have been determined in step S104, and logs which match the sequence are extracted (step S106). The log aggregation unit 140 then rearranges, aggregates, and outputs the logs determined to match the sequence in step S106 in accordance with the sequence (step S107).
Finally, the output unit 150 outputs the aggregation result acquired in step S106 to the display unit 20 for display to the user (step S108).
As described above, the log analysis system 100 aggregates and displays logs in accordance with the preregistered sequence. Thus, the user is able to perform log analysis with reference to logs extracted and arranged based on the correlation among the logs. Further, since logs can be aggregated from and across multiple types of log data, it is possible to reduce burden of log analysis which would otherwise be performed by reviewing multiple log data.
While the first example embodiment uses the identifier (format ID) of a format to define a sequence, the present example embodiment uses another information on logs to define a sequence. The device to be used and the process to be performed in the present example embodiment are the same as the first example embodiment.
The sequence determination unit 130 extracts, out of logs whose formats have been determined by the format determination unit 120, logs each having the format that matches a format ID of a sequence whose format has been recorded in the sequence storage unit 162 and including a variable value paired with the format ID. Such a configuration enables the determination of a sequence based on not only the identifier of a format of the log but also a variable value included in the log.
The sequence determination unit 130 extracts logs which include variable values recorded in the sequence storage unit 162 out of logs whose formats have been determined by the format determination unit 120. Such a configuration enables the determination of a sequence based on a variable value included in the log.
The sequence determination unit 130 extracts, out of logs whose formats have been determined by the format determination unit 120, logs each having the format that matches a format ID of a sequence whose format has been recorded in the sequence storage unit 162 and included in log data having the file name paired with the format ID. Such a configuration can determine not only the identifier of the format of the log but also the identifier (file name) of the log data in which the log is included.
The sequence information of
As described above, in the present example embodiment, various information on a log to be analyzed can be used to determine a sequence and aggregate the logs.
When the format determination unit 120 determines a format and when a log to be determined does not conform to any of the formats recorded in the format storage unit 161, the format leaning unit 171 creates a new format and records the new format in the format storage unit 161.
As a first method for the format learning unit 171 to learn a format, the format learning unit 171 can define a new format by accumulating a plurality of logs whose formats are unknown and statistically separating the logs into changeable variables and unchangeable common parts. As a second method for the format learning unit 171 to learn a format, the format learning unit 171 can define a new format by reading a list of known variable values, determining, as a variable, a part which is the same as or similar to the known variable value out of a log whose format is unknown, and determining other parts as a common part. A value itself may be used as a known variable value, or a pattern such as normalized expression may be used. The learning method of a format is not to limited to the above, and any learning algorithm that can define a new format for an input log may be used.
When the sequence determination unit 130 determines a sequence and when a log to be determined does not match any of the sequences recorded in the sequence storage unit 162, the sequence learning unit 172 counts the number of appearances (frequency) for each format or format combination of the log. When the calculated frequency increases to a predetermined threshold or higher for a certain format combination, the sequence learning unit 172 creates and records a new sequence in the sequence storage unit 162 based on the format combination.
The sequence learning unit 172 may define a new sequence based on user input. In this case, once the user designates, via the input device, a combination of logs for which the user intends to define a sequence, the sequence learning unit 172 creates and records a new sequence based on the format combination of the log in the sequence storage unit 162. The learning method of a sequence is not limited to the above, and any learning algorithm that can define a new sequence from an input log may be used.
As described above, the log analysis system 100-1 has learning units of a format and a sequence and therefore can generate and record a new format and a new sequence from logs having an unknown format and an unknown sequence.
The present example embodiment provides a window used for editing sequence information.
The user uses the input device to select a sequence ID on the selection box D2 for one or more logs D1 that the user intends to add to a sequence and then press down the setting button D3. Then, in the sequence information recorded in the sequence storage unit 162, the log analysis system 100 adds a format ID corresponding to the log D1 that the user intends to add to the sequence with respect to the sequence ID selected on the selection box D2. When the sequence ID selected on the selection box D2 has not been registered to the sequence information in the sequence storage unit 162, a pair of the sequence ID selected on the selection box D2 and the format ID corresponding to the log D1 that the user intends to add to the sequence are newly recorded.
A window E illustrated in
The format ID field E2 displays format IDs associated with the sequence ID from the top to the bottom in the order defined by the sequence information in the sequence storage unit 162. The order change button E3 is a button used for moving a format ID of the format ID field E2, and a format ID moves upward or downward one by one each time the order change button E3 is pressed down. Further, the delete button E4 is a button used for deleting a format ID of the format ID field E2 and, when pressed down, deletes the format ID.
The user uses the input device to perform order change of the format ID by using the order change button E3 or deletion of the format ID by using the delete button E4 and then press down the set button E5. Then, the log analysis system 100 changes the order of the format IDs as set in the window E or deletes the format ID in the sequence information recorded in the sequence storage unit 162.
The windows illustrated in
The present invention is not limited to the example embodiments described above and can be properly changed within a scope not departing from the spirit of the present invention.
Further, the scope of each of the example embodiments includes a processing method that stores, in a storage medium, a program causing the configuration of each of the example embodiments to operate so as to realize the function of each of the example embodiments described above (more specifically, a program causing a computer to perform the process illustrated in
As the storage medium, for example, a floppy (registered trademark) disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a magnetic tape, a nonvolatile memory card, or a ROM can be used. Further, the scope of each of the example embodiments includes an example that operates on OS to perform a process in cooperation with another software or a function of an add-in board without being limited to an example that performs a process by an individual program stored in the storage medium.
The whole or part of the example embodiments disclosed above can be described as, but not limited to, the following supplementary notes.
(Supplementary Note 1)
A log analysis system comprising:
a sequence determination unit that determines which predetermined sequence is matched with a plurality of logs of an analysis target log; and
a log aggregation unit that, based on the sequence, aggregates and outputs the plurality of logs determined to match the sequence by the sequence determination unit.
(Supplementary Note 2)
The log analysis system according to supplementary note 1, wherein the sequence is information that associates information on the plurality of logs with order of outputting the plurality of logs.
(Supplementary Note 3)
The log analysis system according to supplementary note 2 further comprising:
a form determination unit that determines which predetermined form is matched with the plurality of logs,
wherein the information on the plurality of logs includes information indicating the form.
(Supplementary Note 4)
The log analysis system according to supplementary note 3,
wherein the form determination unit extracts a variable value from the plurality of logs based on the form, and
wherein the information on the plurality of logs includes information indicating the variable value.
(Supplementary Note 5)
The log analysis system according to any one of supplementary notes 1 to 4,
wherein the analysis target log comprises a plurality of data, and
wherein the sequence determination unit is configured to perform determination on the plurality of logs read from the plurality of data.
(Supplementary Note 6)
The log analysis system according to any one of supplementary notes 1 to 5 further comprising:
a sequence learning unit that newly generates the sequence based on the plurality of logs determined to not match the sequence.
(Supplementary Note 7)
A log analysis method comprising steps of:
determining which predetermined sequence is matched with a plurality of logs of an analysis target log; and
based on the sequence, aggregating and outputting the plurality of logs determined to match the sequence.
(Supplementary Note 8)
A log analysis program that causes a computer to perform steps of:
determining which predetermined sequence is matched with a plurality of logs of an analysis target log; and
based on the sequence, aggregating and outputting the plurality of logs determined to match the sequence.
This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2015-223053, filed on Nov. 13, 2015, the disclosure of which is incorporated herein in its entirety by reference.
Number | Date | Country | Kind |
---|---|---|---|
JP2015-223053 | Nov 2015 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/004844 | 11/9/2016 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/081866 | 5/18/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
9552249 | James | Jan 2017 | B1 |
9729671 | Faizanullah | Aug 2017 | B2 |
10528454 | Baraty | Jan 2020 | B1 |
20020026589 | Fukasawa | Feb 2002 | A1 |
20100115443 | Richstein | May 2010 | A1 |
20120143893 | Abraham | Jun 2012 | A1 |
20130073526 | Deluca | Mar 2013 | A1 |
20140317137 | Hanaoka et al. | Oct 2014 | A1 |
20160098342 | Faizanullah | Apr 2016 | A1 |
20170344413 | Chakra | Nov 2017 | A1 |
20180203795 | Gadiya | Jul 2018 | A1 |
20200073741 | Wang | Mar 2020 | A1 |
Number | Date | Country |
---|---|---|
2005-141663 | Jun 2005 | JP |
2006252459 | Sep 2006 | JP |
2006259811 | Sep 2006 | JP |
2007-249694 | Sep 2007 | JP |
2006-020984 | Jan 2008 | JP |
2008-041041 | Feb 2008 | JP |
2013136418 | Sep 2013 | WO |
Entry |
---|
International Search Report of PCT/JP2016/004844 dated Jan. 31, 2017. |
Communication dated Dec. 17, 2020, from the Japanese Patent Office in application No. 2017549986. |
Japanese Office Action for JP Application No. 2017-549986 dated Jun. 22, 2021 with English Translation. |
Number | Date | Country | |
---|---|---|---|
20200257610 A1 | Aug 2020 | US |