This application claims priority under 35 U.S.C. § 119 to Chinese Patent Application No. 2021103879434, filed on Apr. 9, 2021. The contents of Chinese Patent Application No. 2021103879434 are incorporated by reference in its entirety.
Embodiments of the present disclosure relate to the field of computers, and more particularly, to a method, an electronic device, and a program product for determining a score of a log file.
Network devices, systems, and service programs all generate event records during operation, and these event records may be stored as log files according to log entries (for example, in a form of lines). Each log entry may record operation-related description information such as date, time, user, and action. These log files may generally be used to train a related model to achieve a specific function. However, product development teams usually adopt different forms in writing logs, and therefore, availability of log files varies greatly, thus affecting subsequent operations such as model training.
A solution for determining a score of a log file is provided in the present disclosure.
In one aspect of the present disclosure, a method for determining a score of a log file is provided. The method may include acquiring a log file related to a monitored system and source code corresponding to the log file. The method may further include determining a first score of the log file based on a first log rule subset in a log rule set, wherein the log rule set is used to evaluate at least one of analyzability of the log file and supportability of the monitored system. The method may further include determining a second score of the source code based on a second log rule subset in the log rule set. Moreover, the method may include determining a third score of the log file at least based on the first score and the second score.
In another aspect of the present disclosure, an electronic device is provided, including a processor; and a memory coupled to the processor and having instructions stored therein, wherein the instructions, when executed by the processor, cause the electronic device to perform actions including: acquiring a log file related to a monitored system and source code corresponding to the log file; determining a first score of the log file based on a first log rule subset in a log rule set, wherein the log rule set is used to evaluate at least one of analyzability of the log file and supportability of the monitored system; determining a second score of the source code based on a second log rule subset in the log rule set; and determining a third score of the log file at least based on the first score and the second score.
In another aspect of the present disclosure, a computer program product is provided. The computer program product is tangibly stored on a computer-readable medium and includes machine-executable instructions. The machine-executable instructions, when executed, cause a machine to perform any steps of the method according to the first aspect.
The Summary of the Invention section is provided to introduce the selection of concepts in a simplified form, which will be further described in the Detailed Description below. The Summary of the Invention section is neither intended to identify key features or main features of the present disclosure, nor intended to limit the scope of the present disclosure.
The above and other objectives, features, and advantages of the present disclosure will become more apparent by describing the example embodiments of the present disclosure in more detail with reference to the accompanying drawings. In the example embodiments of the present disclosure, the same or similar reference numerals generally represent the same or similar parts. In the accompanying drawings,
The principles of the present disclosure will be described below with reference to some example embodiments shown in the accompanying drawings.
As used herein, the term “include” and variations thereof mean open-ended inclusion, that is, “including but not limited to.” Unless specifically stated, the term “or” means “and/or.” The term “based on” means “based at least in part on.” The terms “an example embodiment” and “an embodiment” indicate “a group of example embodiments.” The term “another embodiment” indicates “a group of additional embodiments.” The terms “first,” “second,” and the like may refer to different or identical objects. Other explicit and implicit definitions may also be included below.
As discussed above, when a model is used to efficiently analyze log files or log entries, a large number of log files need to be used as a training data set to train the model. However, due to different writing formats of the log files, a considerable part of the log files is of little use value. If a log file is directly used in an operation such as model training without any identification or evaluation, there may be a problem that the operation cannot meet predetermined requirements.
In order to address, at least in part, the above disadvantages, a solution for scoring a log file is provided in the embodiments of the present disclosure. This solution can score a log file (which contains log entries, for example, a line of log in the log file that may correspond to an event record) in a system, and can score source code corresponding to the log file. The two scoring operations may adopt different log rules. For example, a log rule set may be created in advance. The scoring operation on the log file may be performed based on a first log rule subset in the log rule set, and the scoring operation on the source code corresponding to the log file may be performed based on a second log rule subset in the log rule set. A third score of the log file may be determined at least based on a first score and a second score (which may even include other scores, such as a manual score).
This solution can acquire a maturity or usability score of each log file, and further, can use this score as a basis to use log entries whose scores meet predetermined requirements for subsequent processing such as model training. The processing may also include, for example, storage and retrieval, and additionally or alternatively, also include analysis processing of content recorded in the log entries to facilitate subsequent processing.
In some embodiments, computing device 130 may include dynamic analysis unit 131 and static analysis unit 132. The log file acquired by log file acquisition unit 110 and the source code acquired by source code acquisition unit 120 may be input to dynamic analysis unit 131 and static analysis unit 132, respectively. It should be understood that dynamic analysis unit 131 is configured to analyze a log file dynamically generated by the monitored system and determine a score, and static analysis unit 132 is configured to analyze static code of the monitored system and determine a score.
As shown in
After that, the determined first score and second score are input to summarization unit 140. Correspondingly, summarization unit 140 may determine a third score, that is, a comprehensive score of the log file based on the first score and the second score.
It should be understood that although
The computing device may be any device with a computing capability. As a non-limiting example, the computing device may be any type of fixed computing device, mobile computing device, or portable computing device, including but not limited to a desktop computer, a laptop computer, a notebook computer, a netbook computer, a tablet computer, a smart phone, and the like. All or part of components of the computing device may be distributed in cloud. The computing device may also adopt a cloud-edge architecture.
A storage apparatus (not shown) includes (a plurality of) storage disk(s) for storing data. The storage disks may be various types of devices with a storage function, including but not limited to a hard disk drive (HDD), a solid state disk (SSD), a removable disk, any other magnetic storage device and any other optical storage device, or any combination thereof. The computing device may be configured to store data such as a group of log entries 110 in the storage apparatus in an indexable manner.
In some embodiments, example environment 100 may also include a model to be trained and a model training apparatus (not shown). As an example, the model training apparatus may use scored and filtered log files to train, for example, a natural language processing model. In the description of the embodiment of the present disclosure, the term “model” may learn correlations between corresponding inputs and outputs from training data, so that a given input is processed based on a parameter set obtained by the training after the training is completed, so as to generate a corresponding output. The “model” may sometimes be referred to as a “neural network,” a “learning model,” a “learning network,” or a “network.” These terms are used interchangeably herein.
In the model training apparatus, a training data acquisition apparatus may acquire scored and filtered log files as input data and provide them to the model. The input data may be one of a training set, a validation set, and a test set. Herein, each sample in the input data may be a text recorded by one or more log entries. The model training apparatus may train the model based on the input data. In a model training stage, parameters (e.g., weight and bias) of the model may be adjusted based on at least one constraint (sometimes referred to as loss), and the constraint may represent a performance index (e.g., accuracy) of the model. The training process may adjust the parameters of the model so that at least one constraint moves in a decreasing direction.
It should be understood that the architecture and functions of example environment 100 are described for illustrative purposes only, and do not imply any limitation to the scope of the present disclosure. The embodiments of the present disclosure may also be applied to other environments having different structures and/or functions.
Log rule set 150 will be described in detail below with reference to
In some embodiments, computing device 130 may evaluate, based on any rule in log rule set 150, the log file to be analyzed or its corresponding source code. As an example, dynamic analysis unit 131 configured in computing device 130 may score the log file based on the first log rule subset in log rule set 150. For example, the first log rule subset may include “Does the log file have a format header?”, “Does the log entry have a consistent structure?”, “Does the log entry contain a source class name?”, “Does the log entry contain a source function name?”, “Does the log entry exclude personal data or security-related data?”, “Does the log entry use a timestamp for each event?”, “Does the timestamp of the log entry contain a time zone?”, “Is the accuracy of the timestamp of the log entry in milliseconds?”, “Does the log entry contain context for troubleshooting?”, “Does the log define start and stop of a service?”, “Does the log entry have an event ID of an event to be tracked?”, and other log rules.
As an example, static analysis unit 132 configured in computing device 130 may score the log file based on the second log rule subset in log rule set 150. For example, the second log rule subset may include “Does the log entry use a key-value pair?”, “Does the log entry use a jsonl format to record a class variable?”, “Does the log entry define start and end of a task?”, “Does the log entry provide context when an exception/error occurs?”, and other log rules.
As another example, a staff may also score text information. For example, the second log rule subset may include “Is a log retention strategy flexible?”, “Is the log stored in a single storage location?”, “Is a rotation strategy of a log application flexible?”, “Has the log been encrypted during transmission?”, “Is a log level configurable?”, and other log rules. It should be understood that in addition to scoring manually, text information may further be scored based on a machine learning model.
It is understandable that although only the method of scoring by rules in four dimensions for maturity rating is shown, rules in more or fewer dimensions may further be set as needed. In addition, each dimension may be divided into three levels according to two score thresholds, but more score thresholds may be set to be divided into more levels as needed, or different score thresholds may be set. The present disclosure is not limited thereto.
A process according to one or more embodiments of the present disclosure will be described in detail below with reference to
As shown in
In 204, computing device 130 may determine a first score of the log file based on a first log rule subset in log rule set 150. The log rule set is used to evaluate at least one of analyzability of the log file and supportability of the monitored system. In some embodiments, the log rule set is used to evaluate at least one of the analyzability, maintainability, security, and supportability of the log file.
In some embodiments, in order to determine the first score, computing device 130 may score log entries in the acquired log file.
As shown in
As an example, computing device 130 may extract timestamps of the log entries in the log file, and determine the number of a group of timestamps meeting a predetermined time accuracy requirement (i.e., “Is the accuracy of the timestamp of the log entry in milliseconds?”) among the extracted timestamps as the first number.
In 304, computing device 130 may determine the first score by determining a ratio of the first number to a total number of the log entries in the log file. In this way, scoring information of the log file can be acquired.
After that, returning to
In some embodiments, in order to determine the second score, computing device 130 may score log entries in the acquired source code.
As shown in
As an example, computing device 130 may extract log entries in the function of the source code, and determine the number of a group of log entries including key-value pairs (i.e., “Does the log entry use a key-value pair?”) among the extracted log entries as the second number.
In 404, computing device 130 may determine the second score by determining a ratio of the second number to a total number of the log entries in the function of the source code. In this manner, scoring information of the source code may be determined.
After that, returning to
In order to comprehensively consider the scoring of the log file, scoring of the text information by a user may be further added.
In some embodiments, based on a group of log entries, the computing device may determine at least one first performance metric for representing the group of log entries, and the at least one first performance metric indicates at least one of the following: analyzability, maintainability, security, and supportability.
Multiple components in device 600 are connected to I/O interface 605, including: input unit 606, such as a keyboard and a mouse; output unit 607, such as various types of displays and speakers; storage unit 608, such as a magnetic disk and an optical disc; and communication unit 609, such as a network card, a modem, and a wireless communication transceiver. Communication unit 609 allows device 600 to exchange information/data with other devices over a computer network such as an Internet and/or various telecommunication networks.
Processing unit 601 performs the various methods and processing described above, such as processes 200, 300, and 400. For example, in some embodiments, the various methods and processing described above may be implemented as a computer software program or a computer program product, which is tangibly included in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 600 via ROM 602 and/or communication unit 609. When the computer program is loaded into RAM 603 and executed by CPU 601, one or more steps of any process described above may be implemented. Alternatively, in other embodiments, CPU 601 may be configured in any other suitable manners (for example, by means of firmware) to perform a process such as processes 200, 300, and 400.
The present disclosure may be a method, an apparatus, a system, and/or a computer program product. The computer program product may include a computer-readable storage medium on which computer-readable program instructions for performing various aspects of the present disclosure are loaded.
The computer-readable storage medium may be a tangible device capable of retaining and storing instructions used by an instruction-executing device. For example, the computer-readable storage medium may be, but is not limited to, an electric storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, any non-transient storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, for example, a punch card or a raised structure in a groove with instructions stored thereon, and any appropriate combination of the foregoing. The computer-readable storage medium used herein is not to be interpreted as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber-optic cables), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device.
The computer program instructions for executing the operation of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in one programming language or any combination of several programming languages, including an object oriented programming language, such as Smalltalk and C++, and a conventional procedural programming language, such as the “C” language or similar programming languages. The computer-readable program instructions may be executed entirely on a user's computer, partly on a user's computer, as a stand-alone software package, partly on a user's computer and partly on a remote computer, or entirely on a remote computer or a server. In a case where a remote computer is involved, the remote computer may be connected to a user computer through any kind of networks, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (for example, connected through the Internet using an Internet service provider). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may be customized by utilizing status information of the computer-readable program instructions. The electronic circuit may execute the computer-readable program instructions to implement various aspects of the present disclosure.
Various aspects of the present disclosure are described here with reference to flowcharts and/or block diagrams of the method, the apparatus (system), and the computer program product implemented according to the embodiments of the present disclosure. It should be understood that each block of the flowcharts and/or block diagrams and combinations of blocks in the flowcharts and/or block diagrams may be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processing unit of a general-purpose computer, a special-purpose computer, or a further programmable data processing apparatus, thereby producing a machine, such that these instructions, when executed by the processing unit of the computer or the further programmable data processing apparatus, produce means for implementing functions/actions specified in one or more blocks in the flowcharts and/or block diagrams. These computer-readable program instructions may also be stored in a computer-readable storage medium, and these instructions cause a computer, a programmable data processing apparatus, and/or other devices to operate in a specific manner; and thus the computer-readable medium having instructions stored includes an article of manufacture that includes instructions that implement various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
The computer-readable program instructions may also be loaded to a computer, a further programmable data processing apparatus, or a further device, so that a series of operating steps may be performed on the computer, the further programmable data processing apparatus, or the further device to produce a computer-implemented process, such that the instructions executed on the computer, the further programmable data processing apparatus, or the further device may implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
The flowcharts and block diagrams in the drawings illustrate the architectures, functions, and operations of possible implementations of the systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or part of an instruction, the module, program segment, or part of an instruction including one or more executable instructions for implementing specified logical functions. In some alternative implementations, functions marked in the blocks may also occur in an order different from that marked in the accompanying drawings. For example, two successive blocks may actually be executed in parallel substantially, and sometimes they may also be executed in an inverse order, which depends on involved functions. It should be further noted that each block in the block diagrams and/or flowcharts as well as a combination of blocks in the block diagrams and/or flowcharts may be implemented using a dedicated hardware-based system that executes specified functions or actions, or using a combination of special hardware and computer instructions.
Various implementations of the present disclosure have been described above. The foregoing description is illustrative rather than exhaustive, and is not limited to the disclosed implementations. Numerous modifications and alterations are apparent to persons of ordinary skill in the art without departing from the scope and spirit of the illustrated implementations. The selection of terms used herein is intended to best explain the principles and practical applications of the implementations or the improvements to technologies on the market, or to enable other persons of ordinary skill in the art to understand the implementations disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
2021103879434 | Apr 2021 | CN | national |