The present disclosure relates to processing of alarm messages in computing systems, and in particular to the clustering of alarm messages based on semantic relationships using knowledge graphs.
Computer networks, particularly large, distributed computer networks, are managed by computer network management systems that receive and process alarm messages from various network elements. Alarm messages may be presented to computer administrators, who may determine what caused the alarm message and how to address it. In a large computer network, the volume of messages can become large to the point of being intractable, particularly if multiple issues arise in the computer network in a short period of time.
In such instances, it is helpful for the computer administrators to have the alarm messages organized in a manner such that related messages are grouped together so that they can be processed and addressed together, rather than as unrelated incidents. The process of grouping related alarm messages is referred to as “clustering.” Unfortunately, however, it may be difficult to determine which alarm messages are related, as many alarm messages have similar structure and content.
Some efforts have been undertaken to computationally cluster documents for various purposes, such as searching for related documents. Historically, grouping of documents has been performed by measuring syntactical relationships between the documents using schemes such as a term frequency-inverse document frequency (TF-IDF) weighting scheme. In a TF-IDF approach, both the frequency of appearance of individual words in a document and the frequency of appearance of the word in the overall corpus of documents is measured. The relative importance of a particular word in a document is determined based on its frequency of appearance in the document and its inverse frequency in the overall corpus. Thus, if a term appears frequently in a given document but infrequently overall, then the document in question is deemed to be more relevant to that term.
Using a TF-IDF approach, each document is represented as a vector of terms, and a similarity function that compares similarity of the document vectors is used to group documents into related clusters. Latent Semantic Analysis (LSA) is a technique that employs TF-IDF to analyze relationships between documents. Latent Semantic Analysis assumes that the cognitive similarity between any two words is reflected in the way they co-occur in small subsamples of the language. LSA is implemented by constructing a matrix with rows corresponding to the d documents in the corpus, and the columns labeled by the a attributes (words, phrases). The entries are the number of times the column attribute occurs in the row document. The entries are then processed by taking the logarithm of the entry and dividing it by the number of documents the attribute occurred in, or some other normalizing function. This results in a sparse but high-dimensional matrix A. Typical approaches to LSA then attempt to reduce the dimensionality of the matrix by projecting it into a subspace of lower dimension using singular value decomposition. Subsequently, the cosine between vectors is evaluated as an estimate of similarity between the terms. However, application of LSA on large datasets may be computationally challenging, and may not adequately capture semantic relationships between documents.
Some embodiments provide a method processing alarm messages generated by a computer network administration system. The method includes, for each one alarm message of a plurality of alarm messages, selecting a plurality of n-grams from the one alarm message, where n is greater than 1, assigning each of the plurality of n-grams to a node in a knowledge graph, generating a node weight for each node in the knowledge graph based on a popularity of the n-gram associated with the node, generating an edge weight for each of a plurality of edges connecting nodes in the knowledge graph to each other, and extracting semantic relationships between nodes in the knowledge graph based on the node weights and the edge weights, grouping selected ones of the plurality of alarm messages into a cluster based on the extracted semantic relationships between nodes corresponding to n-grams in the selected ones of the plurality of alarm messages.
The method may further include, before selecting the plurality of n-grams, excluding stop words from the plurality of alarm messages and performing lemmatization on remaining words in the plurality of alarm messages.
Excluding stop words may include excluding words other than nouns and verbs from the terms in the alarm messages.
The method may further include grouping selected ones of the plurality of alarm messages into plurality of clusters based on the extracted semantic relationships between nodes corresponding to n-grams in the selected ones of the plurality of alarm messages.
The method may further include providing a corpus, C, of alarm messages, dn, C={d1, d2, d3, . . . , dn} and d1, d2, d3, . . . , dn represent the alarm messages, generating a set, S, of terms in the alarm messages in the corpus, S={t1, t2, t3, . . . , tn} and t1, t2, t3, . . . , tn represent terms in the alarm messages in the corpus, and generating n-grams as sequences of terms used in the alarm messages.
Extracting the semantic relationships may include extracting a semantic relationship between a first node and a second node, and extracting a semantic relationship between a first node and a second node includes comparing an edge weight of an edge between the first node and the second node to a metric.
The metric may include an average edge weight or a median edge weight.
Extracting the semantic relationship between the first node and the second node may include comparing a node weight of the first node and a node weight of the second node to an edge weight between the nodes.
Extracting the semantic relationship between the first node and the second node may further include generating a metric based on a node weight of the first node, a node weight of the second node and the edge weight of the edge between the first node and the second node, and comparing the metric to a threshold.
Generating the edge weight between a first node and a second node may include generating the edge weight based on anterior popularity and posterior popularity of an n-gram associated with the first node and an n-gram associated with the second node.
The method may further include receiving a new alarm message, extracting a plurality of n-grams from the new alarm message, grouping the new alarm message into an existing cluster of alarm messages based on semantic relationships between nodes in the knowledge graph corresponding to n-grams in the cluster and nodes in the knowledge graph corresponding to the plurality of n-grams in the new alarm message, and displaying the new alarm message in association with the existing cluster of alarm messages.
The n-grams may include bigrams.
The method may further include, for each n-gram, calculating a background popularity metric based on popularity of the n-gram in the plurality of alarm messages and a foreground popularity metric based on popularity of the n-gram within a subset of alarm messages in the cluster, and adjusting the node weights and edge weights for each node in the knowledge graph based on the background popularity metric and foreground popularity metric of the n-gram associated with the node.
Other methods, devices, and computers according to embodiments of the present disclosure will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such methods, mobile devices, and computers be included within this description, be within the scope of the present inventive subject matter, and be protected by the accompanying claims.
Other features of embodiments will be more readily understood from the following detailed description of specific embodiments thereof when read in conjunction with the accompanying drawings, in which:
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention. It is intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination.
Some embodiments provide systems and/or methods that employ n-gram based processing methods to extract semantic relationships between alarm messages generated in a computer network and group the alarm messages based on the discovered semantic relationships. Some embodiments use an n-gram based approach to build an enhanced knowledge graph from which semantic relationships can be extracted and used to determine similarity between alarm messages. The manual creation of an n-gram based knowledge graph from a large corpus of alarm messages is not practicable. Some embodiments provide computer-based methods that can generate n-gram based knowledge graphs from large data sets from which semantic relationships can be extracted. These relationships can be used to group alarm messages into semantically related groups that can be easier and/or more efficient to process and handle by a network administrator.
One or more of the nodes 130 may host one or more agents 120, which are software applications configured to perform functions in the nodes. In the distributed computing environment illustrated in
In the distributed computing network illustrated in
As noted above, one problem faced by a network management function 112 is that a very large number of alarm messages can be generated in a distributed communication network, and it can be very difficult for a network operator to process all of the alarm messages. Accordingly, in such instances, it is helpful for the computer administrators to have the alarm messages organized in a manner that related messages are grouped together so that they can be processed and addressed together, rather than as unrelated incidents, in a process known as clustering. Some embodiments described herein process alarm messages using a gram-based knowledge graph to extract semantic relationships between alarm messages that can be used to cluster the alarm messages in a semantically meaningful way. Such clustered alarm messages may then be processed by a network management function in a more efficient manner.
According to some embodiments, a knowledge graph may be constructed from a corpus of documents, such as alarm messages, by extracting semantically significant n-grams from the corpus and assigning each extracted n-gram to a node. Nodes in the graph that have a semantic relationship with one another are connected by edges. For example,
To populate a bigram-based knowledge graph based on this message, the message is first grouped into related bigrams (i.e. two-word phrases) shown in Table 1 while excluding stop words:
After lemmatization, the bigrams appear as shown in Table 2.
Each of the bigrams shown in Table 2 may be assigned to a node in the knowledge graph. Since each of the bigrams appears in the same alarm message, an edge is drawn between the two nodes associated with the bigrams, and a weight may be assigned to the edge. The edge weight may be assigned, for example, based on a number of words (distance) between the two terms in the alarm message. Thus, for example, an edge between the bigrams “unable connect” and “axa elaticsearch” would have a higher edge weight than an edge between the terms “unable connect” and “inventory update.” If the alarm message shown above were being processed to supplement an existing knowledge graph, then already existing edge weights may be increased or decreased based on the distance between the nodes in this alarm message.
Weights may be assigned to individual nodes to indicate the relative importance of the node in a particular corpus or cluster of documents. The weight of a node may be based, in some embodiments, on a term frequency-inverse document frequency analysis that takes into account the frequency of occurrence of the term in both the document in question and the overall corpus of documents.
In this manner, an n-gram based knowledge graph may be generated by analyzing a corpus of alarm messages, which may include thousands or even millions of alarm messages. Because the alarm message is analyzed as a collection of n-grams instead of individual terms, semantic relationships among documents in the corpus may be more efficiently identified.
Moreover, some embodiments enable the discovery of deeper semantic relationships by analyzing both forward and reverse relatedness between bigrams. For example,
The anterior edge weight EWA and posterior edge weight EWP provide a measure of how related the terms are based on their order of appearance. That is, from the standpoint of the first node, the anterior edge weight EWA is strengthened when the first n-gram appears more often and closer before the second n-gram, while the posterior edge weight EWP is strengthened when the second n-gram appears more often and closer before the first n-gram. For example, in the above example, if the bigram “axa elasticsearch” appears more often after the bigram “unable connect” than before it, then the anterior edge weight between the nodes “unable connect” and “axa elasticsearch” may be stronger than the posterior edge weight. By assigning both anterior and posterior edge weights between node, semantic relationships between nodes may be more fully characterized, allowing for more efficient identification of semantic relationships between documents containing such terms.
Each node may also be characterized by a self-referential weight (SRW), which may simply be the node weight or a function of the node weight. The SRW may be used in the process of evaluating semantic relatedness as described in more detail below.
Node weights may also be generated in some embodiments in a way that enables more efficient characterization of semantic relationships. As shown in
Some embodiments extract semantic relationships between nodes in the knowledge graph. In particular, some embodiments may extract semantic relationships between nodes by comparing an edge weight of an edge between a first node and a second node to a threshold metric, such as an average edge weight or a median edge weight. In some embodiments, a semantic relationship between a first node and a second node may be extracted by comparing a node weight of the first node and a node weight of the second node to an edge weight between the nodes.
Extracting the semantic relationship between the first node and the second node may further include generating a metric based on a node weight of the first node, a node weight of the second node and the edge weight of the edge between the first node and the second node, and comparing the metric to a threshold.
Once a knowledge graph has been created with associated node weights and edge weights, in some embodiments, a total semantic weight may be generated for each document in the corpus by, for example, generating a sum of all node weights of all nodes associated with n-grams in the document. In some embodiments, the total semantic weight of a document may be generated by generating a sum of all node weights of all nodes associated with n-grams in the document and all edge weights between nodes associated with n-grams in the document. Other metrics for calculating the total semantic weight of a document may be employed within the scope of the inventive concepts.
In addition, one or more relatedness metrics may be generated for each pair of documents in the corpus that was used to create the knowledge graph by analyzing n-grams in the documents associated with nodes in the knowledge graph that are connected by an edge. One relatedness metric may be generated based by iterating through each n-gram in each document and evaluating edge weights and/or node weights of common and/or connected nodes associated with the n-grams. For example, referring to
A metric of semantic relatedness may be generated by summing the edge weights or self-referential weights between nodes in Document 1 and Document 2. For example, a metric SS of semantic similarity between documents D1 and D2 may be generated by evaluating the following equation:
SS=EW1+EW2+EW3+SRW1 [1]
More generally, the semantic similarity (SS) between Documents 1 and 2 may be expressed as the sum of all n edge weights that exist between nodes associated with n-grams in the two documents:
SS=μ
i=1
n
EW
i [2]
where the edge weight between nodes corresponding to the same n-gram are self-referential weights. Other formulas may be used within the scope of the inventive concepts.
In some embodiments, the metric of semantic similarity may be normalized to a value between 0 and 100 according to the following formula:
where MAX SS is the maximum semantic similarity across all documents in the corpus.
Some embodiments evaluate the semantic similarity between all documents in a corpus, such as all alarm messages received by a particular network management server or all alarm messages transmitted in a distributed computing system over a period of time. Once the semantic similarity between documents has been generated, the semantic similarity may be used to group documents in the corpus into clusters. Referring to
Ranking the documents by semantic weight yields the result shown in Table 4.
Taking document D5 as the document having the highest total semantic weight, the remaining documents are then ranked according to semantic similarity to document D5, resulting in the result shown in Table 5.
A cluster may then be generated by grouping all documents having a semantic similarity to D5 greater than a predetermined threshold. For example, if the threshold is set at 70, then documents D2, D6 and D8 may be grouped into a cluster with D5. The process may then be repeated with the remaining ungrouped documents (D1, D3, D4, D7, D9 and D10) until all documents have been placed into a cluster or until no more documents remain to be grouped. For example, the process would select document D9 as the document having the highest total semantic weight of the remaining documents, and then group all documents having a semantic similarity of 70 or more with document D9 into a second cluster, and so on.
An approach such as that described above may generate a cluster such as the cluster 230B shown in
Before the plurality of n-grams are selected, the method may exclude stop words from the plurality of alarm messages and perform lemmatization on remaining words in the plurality of alarm messages. In some embodiments, stop words may include any words other than nouns and verbs in the alarm messages.
The method may further include grouping selected ones of the plurality of alarm messages into plurality of clusters based on the extracted semantic relationships between nodes in the alarm messages corresponding to n-grams in the selected ones of the plurality of alarm messages.
The method may further include providing a corpus, C, of alarm messages, dn, C={d1, d2, d3, . . . , dn} and d1, d2, d3, . . . , dn represent the alarm messages, generating a set, S, of terms in the alarm messages in the corpus, S={t1, t2, t3, . . . , tn} and t1, t2, t3, . . . , tn represent terms in the alarm messages in the corpus, and generating n-grams as sequences of terms used in the alarm messages. In some embodiments, the method may generate a metric of semantic similarity between each pair of documents (d1, d2) in the corpus, and the clusters may be generated based on the metric of semantic similarity between the documents.
Extracting the semantic relationship between a first node and a second node may be performed by comparing an edge weight of an edge between the first node and the second node to a threshold metric. The threshold metric may include an average edge weight or a median edge weight.
Extracting the semantic relationship between the first node and the second node may further include generating a metric based on a node weight of the first node, a node weight of the second node and the edge weight of the edge between the first node and the second node, and comparing the metric to a threshold.
Referring to
The processor 800 may include one or more data processing circuits, such as a general purpose and/or special purpose processor (e.g., microprocessor and/or digital signal processor) that may be collocated or distributed across one or more networks. The processor 800 is configured to execute computer program code in the memory 810, described below as a non-transitory computer readable medium, to perform at least some of the operations described herein. The computer 800 may further include a user input interface 820 (e.g., touch screen, keyboard, keypad, etc.) and a display device 822.
The memory 810 includes computer readable code that configures the network management server 50 to implement the data collection component 106, the alarm message processor 102, the alert queue 105 and the network management function 112. In particular, the memory 810 includes alarm message analysis code 812 that configures the network management server 50 to analyze and cluster alarm messages according to the methods described above and alarm message presentation code 814 that configures the network management server to present alarm messages for processing based on the clustering of alarm messages as described above.
In the above-description of various embodiments of the present disclosure, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or contexts including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented in entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or combining software and hardware implementation that may all generally be referred to herein as a “circuit,” “module,” “component,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product comprising one or more computer readable media having computer readable program code embodied thereon.
Any combination of one or more computer readable media may be used. The computer readable media may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an appropriate optical fiber with a repeater, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS).
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable instruction execution apparatus, create a mechanism for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that when executed can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions when stored in the computer readable medium produce an article of manufacture including instructions which when executed, cause a computer to implement the function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer, other programmable instruction execution apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatuses or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense expressly so defined herein.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various aspects of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Like reference numbers signify like elements throughout the description of the figures.
The corresponding structures, materials, acts, and equivalents of any means or step plus function elements in the claims below are intended to include any disclosed structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. The aspects of the disclosure herein were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure with various modifications as are suited to the particular use contemplated.