The present invention relates to a data processing method and system for managing computer data logs, and more particularly to a technique for generating a log parser.
A log parser is a set of regular expressions that are used to parse each line of a particular type of log file (i.e., a computer file that includes a computer data log). The log file may include, for example, a record of system activity events (e.g., login, login failed, logout, and password changed). In currently used techniques for generating log parsers, a user manually writes regular expressions for a log parser using a known interface. The known interface applies each manually written regular expression to a log file and presents information that allows the user to determine whether or not the regular expression is effective.
In first embodiments, the present invention provides a method of generating a log parser. The method includes a computer receiving regular expressions and storing the regular expressions in a crowd-sourced data repository. The method further includes, subsequent to receiving and storing the regular expressions, the computer receiving an instruction to create a log parser based on a sample log. The method further includes the computer receiving the sample log. The method further includes, based on the stored regular expressions and the received sample log, the computer identifying matches between a plurality of strings of characters included in the received sample log and a plurality of regular expressions included in the stored regular expressions. Each match indicates a regular expression included in the plurality of regular expressions is capable of parsing a respective string included in the plurality of strings. The method further includes, based on the identified matches, the computer generating the log parser as including the plurality of regular expressions that match the plurality of strings included in the sample log.
In second embodiments, the present invention provides a computer system including a central processing unit (CPU), a memory coupled to the CPU, and a computer-readable, tangible storage device coupled to the CPU. The storage device contains instructions that, when carried out by the CPU via the memory, implement a method of generating a log parser. The method includes the computer system receiving regular expressions and storing the regular expressions in a crowd-sourced data repository. The method further includes, subsequent to receiving and storing the regular expressions, the computer system receiving an instruction to create a log parser based on a sample log. The method further includes the computer system receiving the sample log. The method further includes, based on the stored regular expressions and the received sample log, the computer system identifying matches between a plurality of strings of characters included in the received sample log and a plurality of regular expressions included in the stored regular expressions. Each match indicates a regular expression included in the plurality of regular expressions is capable of parsing a respective string included in the plurality of strings. The method further includes, based on the identified matches, the computer system generating the log parser as including the plurality of regular expressions that match the plurality of strings included in the sample log.
In third embodiments, the present invention provides a computer program product including a computer-readable, tangible storage device and computer-readable program instructions stored in the computer-readable, tangible storage device. The computer-readable program instructions, when carried out by a central processing unit (CPU) of a computer system, implement a method of generating a custom log parser. The method includes the computer system receiving regular expressions and storing the regular expressions in a crowd-sourced data repository. The method further includes, subsequent to receiving and storing the regular expressions, the computer system receiving an instruction to create a log parser based on a sample log. The method further includes the computer system receiving the sample log. The method further includes, based on the stored regular expressions and the received sample log, the computer system identifying matches between a plurality of strings of characters included in the received sample log and a plurality of regular expressions included in the stored regular expressions, each match indicating a regular expression included in the plurality of regular expressions is capable of parsing a respective string included in the plurality of strings. The method further includes, based on the identified matches, the computer system generating the log parser as including the plurality of regular expressions that match the plurality of strings included in the sample log.
In fourth embodiments, the present invention provides a process for supporting computing infrastructure. The process includes a first computer system providing at least one support service for at least one of creating, integrating, hosting, maintaining, and deploying computer-readable code in a second computer system. The computer-readable code contains instructions. The instructions, when carried out by a processor of the second computer system, implement a method of generating a log parser. The method includes the second computer system receiving regular expressions and storing the regular expressions in a crowd-sourced data repository. The method further includes, subsequent to receiving and storing the regular expressions, the second computer system receiving an instruction to create a log parser based on a sample log. The method further includes the second computer system receiving the sample log. The method further includes, based on the stored regular expressions and the received sample log, the second computer system identifying matches between a plurality of strings of characters included in the received sample log and a plurality of regular expressions included in the stored regular expressions, each match indicating a regular expression included in the plurality of regular expressions is capable of parsing a respective string included in the plurality of strings. The method further includes, based on the identified matches, the second computer system generating the log parser as including the plurality of regular expressions that match the plurality of strings included in the sample log.
Embodiments of the present invention saves the user time by automating the generation of log parsers based on a sample log. An embodiment of the present invention automatically queries regular expressions from a database and attempts to match the regular expressions against a sample log to determine a log parser. An embodiment of the present invention leverages user-generated (i.e., crowd sourced) regular expressions to populate a regular expression database that is subsequently queried to match the regular expressions in the database against a sample log, thereby determining a log parser.
Embodiments of the present invention generate a log parser based on a sample log by automatically querying a database of regular expressions to determine potential matches between the regular expressions and the sample log. An embodiment of the present invention employs crowd sourcing techniques to populate a database of regular expressions with new entries of user-generated regular expressions. The new entries of user-generated regular expressions are defined for data in a log sample that is identified as not being parsed by previously stored regular expressions. The crowd sourcing techniques allow a repository of log parsers to be built up to recognize and support a greater number of logs from platforms and applications that are currently unrecognized and unsupported. Herein, a regular expression is also referred to as a regex.
The regex database 110 may include regular expressions included in one or more manually generated custom log parsers 112 (i.e., log parsers generated by one or more methods other than the process of
Tool 106 identifies matches between elements of sample log 108 and regular expressions included in regex database 110. Each identified match indicates that a regular expression included in regex database 110 is potentially capable of correctly parsing a corresponding element of sample log 108. Tool 106 may identify one or more matches for any single element of sample log 108; therefore, one element of sample log 108 may be matched to one or more regular expressions included regex database 110.
Based on the matches identified by tool 106, custom log parser generator 104 generates a custom log parser, which is stored in custom log parsers 116. The functionality of the components shown in
Although components 108, 112 and 114 of system 100 are shown in
In one embodiment, each regular expression is stored in step 202 along with indicator(s) of the computer application type and/or computer platform type that is associated with the regular expression. That is, regex database 110 (see
In one embodiment, regex database 110 (see
In step 206, computer system 102 (see
In step 208, custom log parser generator 104 (see
In step 210, custom log parser generator 104 (see
In step 212, custom log parser generator 104 (see
If the result of step 210 is a subset of the regular expressions in regex database 110 (see
In step 214, custom log parser generator 104 (see
In one example, step 214 includes custom log parser generator 104 (see
In step 216, based on the potential matches presented in step 214, custom log parser generator 104 (see
Step 216 may also include, based on the potential matches presented in step 214, custom log parser generator 104 (see
In step 218, custom log parser generator 104 (see
Further, step 218 optionally includes custom log parser generator 104 (see
After step 218, the process of
In step 222, custom log parser generator 104 (see
In step 224, custom log parser generator 104 (see
If custom log parser generator 104 (see
Memory 304 may comprise any known computer-readable storage medium, which is described below. In one embodiment, cache memory elements of memory 304 provide temporary storage of at least some program code (e.g., program code 314) in order to reduce the number of times code must be retrieved from bulk storage while instructions of the program code are carried out. Moreover, similar to CPU 302, memory 304 may reside at a single physical location, comprising one or more types of data storage, or be distributed across a plurality of physical systems in various forms. Further, memory 304 can include data distributed across, for example, a local area network (LAN) or a wide area network (WAN).
I/O interface 306 comprises any system for exchanging information to or from an external source. I/O devices 310 comprise any known type of external device, including a display device (e.g., monitor), keyboard, mouse, printer, speakers, handheld device, facsimile, etc. Bus 308 provides a communication link between each of the components in computer system 102, and may comprise any type of transmission link, including electrical, optical, wireless, etc.
I/O interface 306 also allows computer system 102 to store information (e.g., data or program instructions such as program code 314) on and retrieve the information from computer data storage unit 312 or another computer data storage unit (not shown). Computer data storage unit 312 may comprise any known computer-readable storage medium, which is described below. For example, computer data storage unit 312 may be a non-volatile data storage device, such as a magnetic disk drive (i.e., hard disk drive) or an optical disc drive (e.g., a CD-ROM drive which receives a CD-ROM disk).
Memory 304 and/or storage unit 312 may store computer program code 314 that includes instructions that are carried out by CPU 302 via memory 304 to generate a log parser. Although
Further, memory 304 may include other systems not shown in
Storage unit 312 and/or one or more other computer data storage units (not shown) that are coupled to computer system 102 may store regex database 110 (see
As will be appreciated by one skilled in the art, in a first embodiment, the present invention may be a system; in a second embodiment, the present invention may be a method; and in a third embodiment, the present invention may be a computer program product. A component of an embodiment of the present invention may take the form of an entirely hardware-based component, an entirely software component (including firmware, resident software, micro-code, etc.) or a component combining software and hardware sub-components that may all generally be referred to herein as a “module”.
An embodiment of the present invention may take the form of a computer program product embodied in one or more computer-readable medium(s) (e.g., memory 304 and/or computer data storage unit 312) having computer-readable program code (e.g., program code 314) embodied or stored thereon.
Any combination of one or more computer-readable mediums (e.g., memory 304 and computer data storage unit 312) may be utilized. The computer readable medium may be a computer-readable signal medium or a computer-readable storage medium. In one embodiment, the computer-readable storage medium is a computer-readable storage device or computer-readable storage apparatus. A computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared or semiconductor system, apparatus, device or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer-readable storage medium includes: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer-readable storage medium may be a tangible medium that can contain or store a program (e.g., program 314) for use by or in connection with a system, apparatus, or device for carrying out instructions.
A computer readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electromagnetic, optical, or any suitable combination thereof. A computer-readable signal medium may be any computer-readable medium that is not a computer-readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a system, apparatus, or device for carrying out instructions.
Program code (e.g., program code 314) embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code (e.g., program code 314) for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java®, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Instructions of the program code may be carried out entirely on a user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server, where the aforementioned user's computer, remote computer and server may be, for example, computer system 102 or another computer system (not shown) having components analogous to the components of computer system 102 included in
Aspects of the present invention are described herein with reference to flowchart illustrations (e.g.,
These computer program instructions may also be stored in a computer-readable medium (e.g., memory 304 or computer data storage unit 312) that can direct a computer (e.g., computer system 102), other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions (e.g., program 314) stored in the computer-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowcharts and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer (e.g., computer system 102), other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other devices to produce a computer implemented process such that the instructions (e.g., program 314) which are carried out on the computer, other programmable apparatus, or other devices provide processes for implementing the functions/acts specified in the flowcharts and/or block diagram block or blocks.
Any of the components of an embodiment of the present invention can be deployed, managed, serviced, etc. by a service provider that offers to deploy or integrate computing infrastructure with respect to generating a log parser. Thus, an embodiment of the present invention discloses a process for supporting computer infrastructure, wherein the process comprises a first computer system providing at least one support service for at least one of integrating, hosting, maintaining and deploying computer-readable code (e.g., program code 314) in a second computer system (e.g., computer system 102) comprising one or more processors (e.g., CPU 302), wherein the processor(s) carry out instructions contained in the code causing the second computer system to generate a log parser.
In another embodiment, the invention provides a method that performs the process steps of the invention on a subscription, advertising and/or fee basis. That is, a service provider, such as a Solution Integrator, can offer to create, maintain, support, etc. a process of generating a log parser. In this case, the service provider can create, maintain, support, etc. a computer infrastructure that performs the process steps of the invention for one or more customers. In return, the service provider can receive payment from the customer(s) under a subscription and/or fee agreement, and/or the service provider can receive payment from the sale of advertising content to one or more third parties.
The flowcharts in
While embodiments of the present invention have been described herein for purposes of illustration, many modifications and changes will become apparent to those skilled in the art. Accordingly, the appended claims are intended to encompass all such modifications and changes as fall within the true spirit and scope of this invention.