The technology disclosed relates to graph presentation for prioritization of security incidents and incident analysis.
The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claimed technology.
Security analysts use log data generated by security and operations systems to identify and protect enterprise networks against cybersecurity threats. Gigabytes of log security and operations log data can be generated in a short time. These logs contain security events with varying levels of threat. Firstly, it is difficult for an analyst to go through these logs and identify the alerts that need immediate attention. Secondly, it is difficult to identify different computer network entities related to a particular alert. Graphs can be used to visualize computer network entities which are connected to other entities through edges. However for a typical enterprise network, graphs can become very large with hundreds of thousands of entities connected through tens of millions edges. Security analysts are overwhelmed by such graphs of security events and they can miss most important alerts and entities related to those alerts. Some of these alerts are false positives. In most cases, a well-planned cyberattack impacts more than one entity in the enterprise network. It is difficult for security analysts to review the graph and identify groups of entities impacted by one or more alerts in the logs.
Therefore, an opportunity arises to automatically identify groups of entities in an enterprise network that are impacted by one or more alerts in the logs of data generated by security systems in a computer network and to present analysts most important nodes in graphs representing computer network entities.
In the drawings, like reference characters generally refer to like parts throughout the different views. Also, the drawings are not necessarily to scale, with an emphasis instead generally being placed upon illustrating the principles of the technology disclosed. In the following description, various implementations of the technology disclosed are described with reference to the following drawings, in which:
The following discussion is presented to enable any person skilled in the art to make and use the technology disclosed, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed implementations will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other implementations and applications without departing from the spirit and scope of the technology disclosed. Thus, the technology disclosed is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
Protecting enterprise networks against cybersecurity attacks is a priority of every organization. Gigabytes of security log data can be generated by packet filters, firewalls, anti-malware software, intrusion detection and prevention systems, vulnerability management software, authentication servers, network quarantine servers, application servers, database servers and other devices, even in a single 24 hour period. The logs generated by these systems contain alerts for different entities of the computer network. Some security systems assign scores to such alerts. However, not all alerts are equal and some alerts are false positives. Security analysts determine from voluminous logs alerts that present a threat that require immediate attention. Groups of security alerts, spanning different entities in the enterprise network, can be more telling than individual alerts, but grouping is challenging and time consuming.
More generally, log records are generated by both security systems and operation systems. The operational systems, such as servers, caches and load balancers, report audit logs that detail all activity of the systems. Log information is presented to security analysts for a variety of purposes, including investigating security incidents and identifying potential threats.
Graphs of enterprise networks can help security analysts visualize entities in the computer network and their alert status. The technology disclosed builds on a graph of enterprise network, with nodes representing entities in the network. The technology disclosed assigns alert scores generated by security systems to nodes or edges connecting the nodes. We refer to these assigned alert scores as “native” scores, to distinguish them from scores resulting from propagation through the graph. Different types of edges represent different types of relationships between the nodes. Consistent with edge types, we assign weights to edges representing the strength of the relationship between the connected nodes. Simply rendering an annotated graph would create a visualization of logs, but would be too cluttered to facilitate prioritization of threats to the enterprise network, so we do more.
The technology disclosed reduces the burden on security analysts by automatically finding groups of security alerts and presenting prioritized groups to the security analyst. This includes applying rules to propagate the native scores through the graph, leading to node clusters based on an aggregation of native and propagated alert scores.
Graph traversal determines the propagated impact of a native alert score on connected, neighboring nodes. The technique can involve an extra step if alert scores are assigned to edges, a step of imputing the assigned alert scores to one node or both connected nodes, in cases of a directed edge or of an undirected or bi-directed edge, respectively. Alternatively, scores on edges can be propagated in the same way that we describe propagating scores on nodes. For each starting node with a native alert score, we traverse the graph following edges from the starting node to propagate the starting node's native alert score to neighboring nodes. Native scores of other nodes encountered during the propagation are ignored, are handled when those other nodes become starting nodes. Traversal can be terminated after a pre-determined number of edges/nodes, such as five, or when propagation attenuates the score below a predetermined threshold. Weights on edges attenuate propagation. We normalize the propagated score at each visited node using the number of edges of the same type connected to the visited node, which also attenuates propagation. For instance, a node representing a server may be connected to a hundred client nodes and so receives only a small contribution propagated from each client node. Over multiple propagations from starting nodes, we sum the propagated scores at visited nodes to accumulate aggregate scores. The sum of propagated scores can be further normalized based on a sum of weights of relationship strengths on edges connected to the visited node. Scoring supports clustering for prioritized display.
The technology disclosed clusters connected nodes based on uninterrupted chains of summed propagated scores. Connected nodes are clustered when they have aggregate scores above a selected threshold. Clusters are separated by at least one node that has an aggregated score below the selected threshold, effectively breaking the chain. The threshold can be a predetermined score, a ratio of scores between connected nodes, or a combination of both. For instance, a pair of connected nodes can be separated into different clusters when one node has a score 10× the other node. We calculate cluster scores by summing aggregate scores of nodes in the cluster and, in some instances, normalizing the sum. We rank and prioritize clusters for display and potential analysis using the cluster scores.
Graphs are one way to help analysts visualize the computer network entities, both for incident response and threat hunting. Logs for an enterprise network can identify hundreds of thousands of nodes connected through tens of millions of edges, referred to as a graph. Graphs become more complex over larger windows, such as a week or month of security events. Presenting a detailed graph with the month of security events is overwhelming or meaningless to a security analyst. It is overwhelming if the analyst tries to make sense of individual edges. It is meaningless when the graphic visualization looks like a ball of string.
The technology disclosed includes two collapsing methods, equivalence collapsing and chain collapsing, which can be used to simplify graph structures without hiding nodes of high interest to analysts. In equivalence collapsing, a group of nodes can be collapsed into a single representative node, a so-called equivalence node, when nodes in the group are equivalent, in the sense that the nodes have matching degrees, are connected to the same endpoint nodes, and are connected by matching edge types. To avoid hiding nodes of high interest, equivalent nodes are scored before the collapse. Nodes that score above a predetermined threshold are excluded from collapsing.
In chain collapsing, a chain of nodes can be collapsed into a single representative node, a so-called chain-collapsed node, when nodes in the chain have a degree of one or two. Chain collapsing is only applied to simple chains, not chains with branches. Slightly different cases are presented by a chain of nodes that forms a whisker ending in a leaf node (degree of one at the end) and by a chain of nodes connected at both ends to two other nodes (degree of two for all nodes). Before collapsing, nodes in the chain are scored. Chains that score above a predetermined threshold are excluded from collapsing. After collapsing, the representative chain-collapsed node is given a score that combines scores of the collapsed nodes.
Chain-collapsed nodes can be further equivalence collapsed. When equivalence collapsing follows chain collapsing, an additional factor is considered whether chain-collapsed nodes being judged for equivalence represent chains of matching length.
We describe a system to group security alerts generated in a computer network and prioritize grouped security alerts for analysis. The system also simplifies graph structures without hiding nodes of high interest to analysis. The system is described with reference to
Servers 161a-m and user endpoints 121 such as computers 131a-n, tablets 141a-n, and cell phones 151a-n access and interact with the Internet-based services 117. In one implementation, this access and interaction is modulated by an inline proxy (not shown in
In a so-called managed device implementation, user endpoints 121 are configured with routing agents (not shown) which ensure that requests for the Internet-based services 117 originating from the user endpoints 121 and response to the requests are routed through the inline proxy for policy enforcement. Once the user endpoints 121 are configured with the routing agents, they are under the ambit or purview of the inline proxy, regardless of their location (on premise or off premise).
In a so-called unmanaged device implementation, certain user endpoints that are not configured with the routing agents can still be under the purview of the inline proxy when they are operating in an on premise network monitored by the inline proxy. Both managed and unmanaged devices can be configured with security software to detect malicious activity and store logs of security events in the security log database 175.
The enterprise users access Internet-based services 117 to perform a wide variety of operations such as search for information on webpages hosted by the Internet-based hosting service 136, send and receive emails, upload documents to a cloud-based storage service 139 and download documents from the cloud-based storage service 139. The log database accumulates logs of events related to users and the enterprise from multiple sources. Two sources of such log data include security systems and operations systems. Security systems include packet filters, firewalls, anti-malware software, intrusion detection and prevention systems, vulnerability management software, authentication servers, network quarantine servers. Operations systems include servers, workstations, caches and load balancers and networking devices (e.g., routers and switches). These systems can report hundreds, thousands or millions of events in an enterprise network in one day. Some security systems apply scores (such as on a scale of 1 to 100) indicating the risk associated with an individual event. An alert with a score of 100 likely poses a higher threat to the organization's network as compared to an alert with a score of 10. Not all alerts reported in the logs present the same level of threat and some alerts are false positives. Security analysts can review these logs to identify and analyze high priority alerts that present threats to the enterprise network 111 by well-equipped adversaries, but doing so is tedious.
High priority situations are often presented as a group of interrelated security alerts generated for different entities in the computer network. It is challenging and time consuming to identify these groups of alerts using logs of security data. The technology disclosed reduces burden on security analyst by automatically finding groups of security alerts and presenting prioritized groups to the security analyst. This grouping of security alerts and prioritizing of grouped alerts enables security analyst to focus on nodes that are of interest for high risk security events. Consider a first example of a log entry in the security log database 175 reporting a security event indicating a failed authentication from a user endpoint 121. Now consider a second example of a log entry in the security log database 175 which is also an authentication failure but represents a high risk to the organization. In the second example, an attacker has gained access to a user endpoint 121 in the enterprise network 111. The attacker steals confidential information from the compromised user endpoint. Such information can include a list of servers 161a-m in the enterprise network. The attacker then attempts to authenticate to the servers. This can result in a spike in the number of failed authentications from the compromised user endpoint. The attacker can also move laterally to other user endpoints in the enterprise network. The second example presents a situation which requires accelerated investigation by a security analyst.
A serious cyberattack on an enterprise network will likely raise interrelated alerts from multiple, disjoint security systems. Alerts from some of the monitored entities present higher risks than alerts from other entities. For example, a malware execution on a user endpoint 121 may not have the same priority level as compared to a malware execution on a system used as a jump box to access other user endpoints in the network. The security analyst can be well advised to analyze the jump box alert before the endpoint alert, as the jump box immediately impacts many entities in the network. When the analyst reviews a log that doesn't highlight the roles of the jump box and endpoint, it is difficult to prioritize the alerts.
Security analyst analyzes these logs to identify threats to the enterprise network 111. Security analyst is overwhelmed when presented hundreds of events to analyze. The technology disclosed can be used in other contexts and can include collection of data from a variety of data sources, beyond the example operations performed by users visiting the Internet-based services 117. Of course, other contexts, in addition to security monitoring, can make use of the technology disclosed, such as network operations and social networks, and, more generally, any network represented by large graph of nodes connected by relationships that can be analyzed to identify collapsible groups of nodes.
Not all security events present the same level of anomalous behavior in the enterprise network. Consider a first example of a log entry in the security log database 175 reporting a failed authentication from a user endpoint, which is common with long passphrases and frequently changed passwords. A second example of a log entry is also an authentication failure but represents a high risk to the organization. In the second example, an attacker gained access to a user endpoint 121 in the enterprise network 111 and obtained a list of servers 161a-m in the enterprise network. The attacker attempted to authenticate to the servers. This resulted in a spike in the number of failed authentications originating from the compromised user endpoint. The attacker can also move laterally to other user endpoints in the enterprise network. The second example requires accelerated investigation by a security analyst. The investigation in such situations is sometimes referred to as threat hunting, as it requires the security analyst to proactively and iteratively search through the enterprise network to detect and isolate threats that evade existing security solutions. A real time response from the security analyst can limit the loss to the organization. This is somewhat different than another type of analysis referred to as incident response. Consider for example, a file containing malware is downloaded to a server in the enterprise network. The malware can start several processes on the server. The security analyst will perform incident response analysis to determine the computer network entities that are impacted by the malware. Such security events also need to be prioritized to get security analyst's attention as they can potentially impact a large number of computer network entities.
Graphs are one way to help analysts visualize the computer network entities, both for threat hunting and incident response types of analysis. Logs for an enterprise network can identify hundreds of nodes connected through thousands of edges, referred to as a graph. Graphs become more complex over larger windows, such as a week or month of security events. Presenting a detailed graph with the month of security events is also overwhelming or meaningless to a security analyst.
Graphs of enterprise networks can help security analysts visualize entities in the computer network and their alert status. The technology disclosed builds on a graph of enterprise network, with nodes representing entities in the network. Examples of entities include user endpoints 121, servers 161a-m, file names, usernames, hostnames, IP addresses, mac addresses, email addresses, physical locations, instance identifiers, and autonomous system numbers (ASNs) etc. These example entities typically exist across a longer time scale in an enterprise network, however entities that are short-lived can also be included in the graph if they are important for presenting the correlations, for example, certain emails and transaction identifiers, etc. The technology disclosed builds on a graph of enterprise network with nodes, representing entities, connected with each other by edges representing different connection types. The technology disclosed assigns alert scores generated by security systems to respective nodes or edges connecting the nodes.
The nodes in graphs of enterprise computer network are connected to each other with different types of edges representing different types of relationships between the nodes. Examples of connection types can include an association connection type, a communication connection type, a failure connection type, a location connection type, and an action or operation connection type. The first association connection type indicates that two entities are associated, for example, a host is assigned an IP address statically or via dynamic host configuration protocol (DHCP). The second communication connection type indicates that network communication is observed between two connected entities in the enterprise network. The third failure connection type indicates that an action was attempted but failed, for example a failed authentication attempt. The fourth location connection type indicates geographical relationships between connected entities, for example, an IP address is associated with a geographic region. The fifth action or operation connection type indicates an action or an operation was performed by one of the connected entities. Entities can perform actions, for example, a user can perform an authentication action on a host or a host can execute a process. Additional connection types can be present between entities in the enterprise computer network.
The technology disclosed assigns weights to edges representing the strength of the relationship between the connected nodes. Alerts can also be represented as edges between nodes representing entities in the network. Alert edges can be in addition to other types of edges connecting nodes. The weights reflect the connections types represented by the edges. For example, an association connection between a user and an IP address is stronger than an authentication action connection between a user and a host, because the IP address is associated with the user for longer than the authenticated session of the user on the host. Under these circumstances, the weight assigned to an edge representing an association connection type would be more than the weight assigned to an edge representing an authentication action connection type.
We refer to these assigned alert scores as “native” scores to distinguish them from scores resulting from propagation through the graph. Graph traversal determines impact of native alert scores of nodes on connected, neighboring nodes. If alert scores are assigned to edges, the technology disclosed imputes the score to one or both connected nodes, in case of directed or undirected or bi-directed edge, respectively. In another implementation, the technology disclosed propagates alert scores on edges in the same way as propagation of scores assigned to nodes is described.
The technology disclosed propagates native scores from starting nodes with non-zero native scores. For each starting node, we traverse the graph to propagate starting node's native score to connected, neighboring nodes. Native scores of other nodes encountered during the propagation are ignored, until those score loaded nodes become starting nodes. Traversal can be terminated after a predetermined span from the starting node or when the propagated score falls below a threshold. Weights on edges attenuate (or amplify) the propagated score. Additionally, we normalize the propagated score at each visited node using the number of edges of the same type connected to the visited node to attenuate the propagated score. The propagated scores at visited nodes are accumulated over multiple traversals from different starting nodes, to determine aggregate scores.
The technology disclosed reduces the burden on the security analyst by clustering connected nodes based on uninterrupted chains of aggregate scores. Connected nodes are clustered when they have aggregate scores above a threshold. The threshold can be a predetermined value of aggregate score, a ratio of scores between connected nodes, or a combination of both. Cluster scores are calculated by summing aggregate scores of nodes in the clusters. Clusters with higher cluster scores are prioritized for review by security analyst.
The technology disclosed simplifies graph structures for the security analyst by providing two node collapsing techniques performed by the equivalence collapser 149 and the chain collapser 159. Nodes that are of high interest to security analyst are not hidden in the graph while the nodes that represent other computer network entities can be collapsed into a single representative node. Application of equivalence and chain collapsing to security events graphs simplifies complex graphs so that the security analyst can focus on nodes that are of interest for high risk security events. The two node collapsing techniques apply to two different types of graph structures. Nodes in the graph can represent a variety of network resources in a computer network. Network resources can include data, hardware devices, or services that can be accessed from a remote computer in an enterprise network. Examples of nodes include servers, clients, services, applications, service principals, load balancers, routers, switches, storage buckets, databases, hub, IP addresses, etc. There can be tens to hundreds of different types of nodes in a computer network graph. Some examples of services built on open source frameworks and represented as nodes include Zookeeper™, Kafka™, Elasticsearch™, etc. In other contexts, graphs can represent people, departments, organizations, etc.
Equivalence collapsing applies to a first type of graph structure consisting of multiple nodes connected to the same node with the same type of edge and simplifies such graphs by collapsing the multiple nodes to a single representative node. In the simplified graph, the multiple collapsed nodes are represented by a single representative node, a so-called equivalence node. This scenario occurs frequently in graphs representing computer network entities. For example, consider multiple user endpoints connected to a server, or multiple processes started by a user via a user endpoint. In these examples, the nodes representing multiple user endpoints or multiple processes can be respectively collapsed to an “equivalence node”. The nodes collapsed into an equivalence node are equivalent in the sense that the nodes have matching degrees, are connected to the same node (such as the server or the user endpoint in two examples above), and are connected by matching edge types. In the examples above, all endpoints have the same type of connection to the server and all processes have the same type of connection to the user. Entities in a computer network can be connected to each other through different types of connections such as association, action, or communication. For example, an IP address entity is associated with a user endpoint entity or a user endpoint entity performs an action, such as authentication, with a server entity. Equivalence nodes simplify the graph for visualization purposes by collapsing nodes presenting similar information, including connections to other entities.
The technology disclosed avoids hiding nodes of high interest by scoring nodes before applying equivalence collapsing. Nodes that score above a predetermined threshold are excluded from collapsing. In the example of multiple user endpoints connected to a server, if one user endpoint has been compromised by an attacker, its score is increased. This will keep the compromised node visible after the application of equivalence collapsing while the remaining equivalent nodes in the group will be collapsed and represented by an equivalence node. Therefore, the technology disclosed enables avoidance of hiding nodes of high interest.
The second method for simplifying graphs is chain collapsing which applies to a second type of graph structure consisting of multiple nodes connected in a chain having a degree of one or two. Chain collapsing simplifies such graphs by collapsing multiple nodes to a single representative node. In the simplified graph, the multiple collapsed nodes are represented by a single representative node, a so-called chain-collapsed node. These types of graph structures also appear frequently in graphs of computer network entities. For example, a file that is renamed many times will appear as a chain of nodes connected to each other in which each node indicates a new file name. Another example which will form a chain of nodes in a graph of computer network entities is that of a process connected to its long-path filename which is further connected to pathless filename. Equivalence collapsing technique does not simplify the chains of nodes in the graph as the nodes connected in the chain do not fulfill the conditions of equivalence nodes. Chain collapsing is only applied to simple chains which consist of nodes having degrees one or two and not chains with branches.
Chain-collapsing can be applied to two slightly different cases of chains. A first case is that of a chain of nodes that forms a whisker by ending in a leaf node. In this type of chain all nodes have a degree of two except one node at the end of the chain which has a degree of one. A second case is that of a chain of nodes that is connected at both ends to two other nodes. In this type of chain, all nodes have a degree of two. The technology disclosed can also collapse chains that are variation of the second case in which the starting and the ending nodes are the same. This type of chain is in the form of a loop, with all nodes in the chain having a degree of two and the starting/ending node have a degree greater than two.
Scores are assigned to nodes in the chains before collapsing the chains. In one implementation, all nodes in chains are assigned equal score. Scores for chains are calculated by summing the scores of the nodes in respective chains. Chains that have scores above a threshold are not collapsed. This is to avoid collapsing chains of unusual length so that these are visible to the security analyst. The technology disclosed can apply other criteria to score nodes in a chain. For example, if one or more nodes in a chain have an alert associated with them, their scores are increased above the threshold so that this chain of nodes is not collapsed. This causes nodes of high interest to remain visible to the security analyst. After the chains are collapsed, each chain is represented by a single chain-collapsed node.
Chain-collapsed nodes can be further equivalence collapsed, if the chain-collapsed nodes fulfill an additional factor: whether chain-collapsed nodes that are being considered for equivalence collapsing have matching length represented by their respective scores. Applying the two collapsing techniques sequentially considerably reduces the complexity of the graph representing computer network entities.
Completing the description of
The technology disclosed presents an enterprise network in a graph, with computer network entities represented by nodes connected by edges. The graph generator 225 can use the information in the security log database 175 to determine entities in the network. If a network topology is available, it can begin with a previously constructed node list. The graph generator can also use log data of operations systems such as servers, workstations, caches and load balancers, and network devices (e.g., routers and switches) to build a network topology. The graph generator connects entity nodes using edges that represent different types of relationships between the entities. Examples of entities and relationships are presented above. Alerts can also be represented as edges between nodes. Alert edges can be in addition to other types of edges connecting the nodes. The graph generator further assigns native alert scores to nodes that capture alert scores generated by security systems. In one implementation, the graph generator distributes alerts scores assigned to edges to the nodes connected by the edges. In case of a directed edge, an edge assigned score can be distributed to the node at a source or destination of the edge, instead of both. In case of an undirected or bi-directed edge, the score is distributed between the two nodes connected by the edge. In case of an edge connecting a node to itself (i.e., a loop), the entire score is assigned to the connected node. In another implementation, the technology disclosed retains the scores on edges and uses the edge scores in propagation of native scores to nodes connected with the edges. The graph generator 225 assigns weights to edges connecting the nodes based on a connection type of the edge. The weights represent relationship strength between the nodes connected by the edge. For example, an association type relationship is stronger than an action type relationship as explained above. Therefore, the graph generator can, for example, assign a weight of 1.0 to the edge representing an association type connection and a weight of 0.9 to an edge representing an action type connection. In one implementation, the edge weights are on a scale of 0 to 1. A higher score is given to edges of a connection type representing a stronger relationship.
The graph traverser 235 systematically traverses the graph representing the computer network to propagate native scores for alerts associated with the starting nodes with non-zero scores. For each starting node with a non-zero native score, the graph traverser 235 traverses the graph following edges from the starting node to propagate the starting node's native alert score to neighboring nodes. The traversal terminates after visiting a predetermined number of edges or nodes or when the propagated score attenuates below a predetermined threshold. An example pseudocode of a recursive graph traversal algorithm is presented below. In the following pseudocode, the native score of a starting node is referred to as an “initial_score” and an aggregated score of a visited node is referred to as a “priority_score”.
Prerequisites: Every node in the graph has an initial_score which can be non-zero or zero. The algorithm computes a new score, the priority_score, for each node by propagation and aggregation. This is initialized to zero for all nodes. The priority_score will be used to determine the clusters.
Comment: Starting with each node with non-zero initial score, we traverse. Every time we start a traversal, we empty the set of visited nodes and spread the initial score of the starting node around.
Start
Comment: For the traversal, we propagate the starting score around to its neighbors until its magnitude falls below a preset threshold. The scorePropagator function calculates the score to propagate to the neighbors.
As the graph traverser 235, traverses the graph representing the computer network, the alert score propagator 245 calculates propagated score at visited nodes. For a visited node v, we represent the aggregate score of a node α(v) in equation (1) as sum of its native score and scores propagated to the visited node. The aggregate score, native score and propagated score are referred to as priority_score, initial_score, and neighbor_score, respectively, in the graph traversal pseudocode presented above.
The aggregate score of a node α(v) can be recursively calculated by applying equation (1):
where the base case is presented below in equation (2):
Tail recursion is one way to propagate a native score from a starting node through connected nodes in the graph.
Equation (1) has two parts. The simpler, first part of equation (1) is a native alert score, a sum of alert scores (α) generated by security systems and assigned to node (v) and/or edges connected to node (v). The same approach can, of course, be applied to scores other than security alert scores, for collapsible graphs other than security graphs.
The second part of equation (1) is a nested sum comprising three terms that represent propagated scores contributed by neighboring nodes n score to the visited node v's score. The propagated score from neighboring node n is attenuated by the three terms.
The outer term
attenuates the propagated score by an inverse of the sum of weights of edges W=Σ(wgms(T)) of all connection types T incident on the visited node v. The added 1 in the denominator assures attenuation and prevents a divide by zero condition.
The outer summation ΣedgetypeTwgms(T) iterates over edge types incident to the visited node v. This term attenuates propagated scores for a particular edge type T by a weight wgms(T) assigned to an edge of connection type T. In general, edge types in the graph are assigned weights corresponding to the strength of the relationship of the connection type represented by the edge. The stronger the relationship, the higher the weight. A weight for a particular edge type T is applied to an inner sum—the outer and inner sums are not calculated independently.
Finally, the inner summation
iterates over edges of type T to calculate an average score for nodes connected to visited node v by each edge type. The denominator factor NT(v) represents the number of neighbors of the visited node that are connected to the visited node with a same edge connection type T.
Equation (1) is conceptual, whereas the graph traversal pseudocode, above, is practical. Equation (1) defines one recursive calculation of an aggregate score that applies to any node in a graph representing the computer network. Applying Equation (1) directly would involve fanning out from a starting node to draw propagated scores from neighboring nodes into the starting node. The graph traversal pseudocode follows a different, more practical strategy to calculate the propagated scores. It can start at nodes that have non-zero native alert scores and traverse the graph of computer network, avoiding cyclic calculation mistakes. Alternatively, the traversal can start at leaf nodes, or at both leaf and interior nodes, or at some selected subset of all the nodes in the graph. In some implementations, the propagation can be stopped after the propagated score falls below a threshold, as reflected in the pseudocode for graph traversal. In another implementation, the native scores can be propagated from a starting node for a given number of network hops, for example, the scores can be propagated up to five edge or network hops from the starting node. Graph traversal pseudocode presented above is one practical example that applies Equation (1). It is understood that other graph traversal algorithms can be applied to propagate scores.
The cluster formation engine 255 uses the aggregate scores for nodes in the graph of computer network to form clusters of connected nodes. The graph formation engine sorts the nodes in the graph in a descending order of aggregate scores. Starting with the node with a highest aggregate score, the cluster formation engine 255 traverses the graph and adds a neighboring node in a cluster if the aggregate score of the neighboring node is above a selected threshold. When the cluster formation engine 255 reaches a node that has an aggregate score below a set threshold, the chain of connected nodes in the cluster is broken. The threshold can be a predetermined aggregate score or a ratio of scores between nodes, or a combination of both. When using a ratio of scores, the chain of connected nodes can be broken when one node in a pair of connected nodes has a score greater than ten times the other node in the pair of connected nodes. It is understood that when using a ratio of scores, the threshold for breaking the chain of connected nodes can be set at greater at a higher value. For example, the chain of connected nodes can be broken when one node in a pair of connected nodes has a score fifteen, twenty, or twenty five times the score of the other node. Similarly, the threshold for breaking the chain of connected nodes can be set at values lower values. For example, the chain of connected nodes can be broken when one node in a pair of connected nodes has a score five times, three times or two times the score of the other node.
The alert cluster ranker 265 ranks and prioritizes clusters formed by the cluster formation engine 255. Alert cluster ranker 265 calculates cluster scores by summing aggregate scores of nodes in the cluster. The clusters of connected nodes are displayed to the security analyst who can then focus on high ranking clusters first before reviewing the other clusters.
In the following sections, we describe the technology disclosed using two example graphs of computer networks. The examples start with a graph of a computer network in which native scores are assigned to nodes, based on alerts, and weights are applied to edges, based on connection types. The formulation presented in equation 1 is used to propagate native scores of starting nodes to connected nodes. Aggregate scores for nodes are calculated by summing propagated scores and native scores. Finally, clusters are formed. The first example results in one large cluster of connected nodes in the graph. The second example results in two small clusters of connected nodes in the graph.
Two nodes, IP 1.1.1.1 and database 2, in the graph 301 have non-zero native alert score and are thus selected as starting nodes. The node representing IP address 1.1.1.1 has a native alert score of 100 and the node representing the database 2 also has a native alert score of 100. Starting nodes are shown with cross-hatch pattern in the graph. All other nodes in the graph have native scores of zero. Edges representing association type connections, drawn as solid lines, have a weight of 1. Edges representing action type connections, drawn as broken lines, have a weight of 0.9. As described above, association type connection is stronger than action type connection.
A first set of figures (
in equation (1) as 100. Therefore, the contribution of user 1 node to the propagated score of host A node is very small (0.105) because there are a hundred similar users connected to the same host A node.
Propagated scores from IP 1.1.1.1 node in a third iteration are illustrated in a graph 402 in
Continuing with the first example, a second set of figures (
A third iteration of propagation of scores from the database 2 node is shown in
In the following two figures, we illustrate propagated impact of native alert scores on connected, neighboring nodes when each of the two nodes with non-zero alert scores are selected as starting nodes one by one.
Cluster formation is illustrated in a graph 1201 in
The above examples illustrate that propagated score on a visited node depends on the strength of the relationship from the starting node and the number of edges of the same type connected with the visited node. The attenuation in the propagated score is greater if the relationship strength is weak and many edges of the same connection type are connected with the visited node. This attenuation is illustrated in the two examples above when propagating native score from user node to host node. As there are a hundred user nodes connected to the same host node, the host receives a very small amount of propagated score when traversal is from the user node to the host node.
The first step to perform equivalence labelling, according to a method disclosed, is to assign degree labels to nodes in the graph, which aid in determining equivalent nodes. A group of nodes with a same label belong to the same equivalence class and can be collapsed to a single equivalence node. Equivalence labeler 1325 assigns these labels to nodes. In one implementation, the equivalence labeler assigns labels to nodes in an increasing order of degree of connectedness of the nodes. For example, all nodes with degree of 1 in the graph are assigned labels before the nodes with degree of 2 and so on. In such an implementation, the process to assign labels starts with the nodes having a degree of 1 in the graph. The equivalence labeler 1325 assigns labels to nodes with degree of 1 such that nodes with matching labels are in the same group of equivalent nodes. The equivalence labeler 1325 considers the degree of the node, its neighboring node and connection type of the node when assigning labels. Nodes having the same degree, connected to the same neighbor node with the same connection type are given the same label. The label assignment process continues until all equivalent nodes in the graph have been assigned labels.
Efficiency can be improved by limiting application of labels to nodes, based on rules of thumb regarding nodes that are unlikely to be collapsible. In one implementation, the equivalence labeler 1325 assigns labels to nodes up to a degree of 4 connectedness and not for degrees five and greater. In another implementation, labels are assigned up to a degree of 3 connectedness, for equivalence collapsing. In most graphs, nodes with higher degrees of connectedness are less likely to be collapsible. Therefore, limiting the labelling of nodes up to a degree of 4 reduces the computational resources required for this labelling process and also reduces time required to complete the labelling process.
Nodes with same labels can be collapsed into an equivalence node. However, the technology disclosed identifies nodes of high interest to analyst before collapsing equivalent nodes so that nodes of high interest remain visible to the analyst, are not included in a collapse. The node scorer 1335 assigns scores to the nodes. In one implementation, the scores are assigned according to a severity level of the alert generated for the computer network entity. In one implementation, alerts are generated by the security systems, such as firewalls and antivirus, along with a score. The network-based security systems can assign scores to security events or entities related to a security event. Host-based security systems deployed on user endpoints or other computing devices can also score security events. In one implementation, the initial alert scores assigned to network entities by one or more security systems are used to determine a node score by combining it with other factors. An example of such factors is the number of neighboring nodes with edge connections. If there are fewer nodes in the neighborhood of the node being scored, then a high score can be assigned to the node so that the node is not collapsed into an equivalent node. This represents a scenario in which the node being scored is located in a part of the graph which is already sparse. In one implementation, the scores assigned by the security systems are related to a connection between two entities in the computer network. For example, consider an “action” type connection between a user endpoint and a server when user endpoint is attempting to authenticate to a host. Now consider this user endpoint is comprised as an attacker has gained access to it and the attacker is attempting to authenticate to the server without valid credentials. This results in a spike in authentication action from the compromised user endpoint which is observed by the security system. The connection between the user endpoint and the host is then labeled as an alert. The node (representing user endpoint) is connected to an edge (representing authentication action) that is labeled as an alert and therefore, the node is given a high score.
The technology disclosed avoids hiding nodes of high interest in the graph by comparing the scores of the nodes with a threshold. The threshold adjuster 1345 sets a value of the threshold which is compared with node scores to exclude hiding nodes of high interest. The nodes having scores above the threshold are not collapsed into equivalence nodes. The technology disclosed can aggregate the nodes less aggressively by setting a low value of the threshold. This results in a higher number of nodes avoiding collapsing into equivalence nodes. Thus displaying more detail to the analyst in the graph. On the other hand, the technology disclosed can also aggregate more aggressively by setting a high value of the threshold. This results in collapsing of more nodes that have scores lower than the set threshold and results in displaying less detail in the graph because only nodes with high scores that are above the set threshold avoid collapsing into equivalence nodes.
The node pinner 1355 marks a node as “do not collapse”. The nodes that are pinned are not collapsed in equivalence collapsing. Nodes that are important for a particular analysis carried out by the security analyst can be pinned. The node aggregator 1365 traverses through the graph and aggregates nodes with matching labels that belong to the same equivalence group provided their score is below the threshold set by the threshold adjuster. The nodes in each group are then replaced with corresponding equivalence nodes.
The chain collapser 159 implements the second of the two collapsing methods proposed by the technology disclosed. Chain collapsing focuses on collapsing graph structures that are in the form of chains of nodes. Equivalence collapsing does not simplify chains of nodes as all nodes in the chain are not connected to a matching node. The chain labeler 1425 assigns labels to nodes such that all nodes in a chain have the same label. Chain collapsing is applied to simple chains and chains with branches are not considered. The technology disclosed applies chain collapsing to two slightly different cases of chain structures. The first type of chain structure, also referred to as a whisker chain, ends in a leaf node with degree one. The second type of chain is connected at both ends to two other nodes which means that all nodes in the chain have a degree of 2. The technology disclosed can also collapse chains that are variation of the second case in which the starting and the ending nodes are the same. This type of chain is in the form of a loop, with all nodes in the chain having a degree of two and the starting/ending node have a degree greater than two.
The chain labeler 1425 traverses the graph and labels nodes in a chain. In one implementation, to label nodes connected in a chain structure, the chain labeler finds a node with a degree of 2 with a first adjacent node having a degree of 2 and a second adjacent node with degree not equal to 2. The second adjacent node is the end node of the chain structure. If the chain is in the form of a whisker, the second adjacent node has a degree of 1 otherwise, the second adjacent node has a degree equal to or greater than 3. The chain labeler then traverses the nodes in the chain and assigns labels to the nodes, until it reaches a node with a degree equal to or greater than 3 which is the other end of the chain. The chain scorer 1435 scores the chains. In one implementation, the scores are calculated using the number of nodes in the chains.
The threshold adjuster 1455 sets a value of a threshold with which scores of chains are compared before collapsing the chains into single representative chain-collapsed nodes. The node aggregator 1465 collapses nodes in chains to chain-collapsed nodes if the score of the chain is less than the threshold. This allows chains of unusual length excluded from collapsing and being visible to the security analyst. In the following paragraphs, examples of simplification of graph structures using equivalence and chain collapsing, without hiding nodes of high interest, are presented.
The second type of collapsing method proposed by the technology disclosed applies to nodes connected in a chain. The application of this method is presented in
The chains are scored before they are collapsed using chain collapsing method. This is to identify unusually long chains that may represent an anomaly and therefore need to be excluded from collapsing. In one implementation, the chains are scores based on the number of nodes connected in the chain. The three whisker chains 1711, 1713, and 1715 all have three nodes each and therefore, each has a score of 3. The scores are compared with a threshold to determine if the chain is excluded from collapsing. Consider the threshold is set at 10, which results in the three whisker chains 1711, 1713, and 1715 collapsed to respective chain-collapsed nodes 1711A, 1713A, and 1715A shown in a graph 1702. The chain-collapsed nodes are shown with a hatch pattern to differentiate with other nodes in the graph. The scores for chain-collapsed node are presented besides respective chain-collapsed nodes. In this example, each of the three chains has a score of 3. Chain collapsing simplifies the structure of the graph 1701 to the graph 1702.
Chain-collapsed nodes can be further equivalence collapsed as shown in
In the following example, chain collapsing method is applied to a second type of chains which are connected to nodes on both ends.
The seven chains in graph 1802 are collapsed to chain-collapsed nodes 1842A, 1852A, 1862A, 1826A, 1836A, 1866A and 1876A respectively. The scores of chain-collapsed nodes are also shown besides respective chain-collapsed nodes. All chain-collapsed nodes have a score of 2 as they have two nodes in respective chains. Following chain collapsing, equivalence collapsing is applied to the graph 1802 to further simplify the graph. Two groups 1811 and 1817 of equivalent nodes are identified. Resulting graph 1803 shows equivalence nodes 1811A and 1817A.
In one implementation, the alert prioritization engine 158 of
User interface input devices 1938 can include a keyboard; pointing devices such as a mouse, trackball, touchpad, or graphics tablet; a scanner; a touch screen incorporated into the display; audio input devices such as voice recognition systems and microphones; and other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and ways to input information into computer system 1900.
User interface output devices 1976 can include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem can include an LED display, a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem can also provide a non-visual display such as audio output devices. In general, use of the term “output device” is intended to include all possible types of devices and ways to output information from computer system 1900 to the user or to another machine or computer system.
Storage subsystem 1910 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein. Subsystem 1978 can be graphics processing units (GPUs) or field-programmable gate arrays (FPGAs).
Memory subsystem 1922 used in the storage subsystem 1910 can include a number of memories including a main random access memory (RAM) 1932 for storage of instructions and data during program execution and a read only memory (ROM) 1934 in which fixed instructions are stored. A file storage subsystem 1936 can provide persistent storage for program and data files, and can include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations can be stored by file storage subsystem 1936 in the storage subsystem 1910, or in other machines accessible by the processor.
Bus subsystem 1955 provides a mechanism for letting the various components and subsystems of computer system 1900 communicate with each other as intended. Although bus subsystem 1955 is shown schematically as a single bus, alternative implementations of the bus subsystem can use multiple busses.
Computer system 1900 itself can be of varying types including a personal computer, a portable computer, a workstation, a computer terminal, a network computer, a television, a mainframe, a server farm, a widely-distributed set of loosely networked computers, or any other data processing system or user device. Due to the ever-changing nature of computers and networks, the description of computer system 1900 depicted in
The technology disclosed relates to grouping security alerts generated in a computer network and prioritizing grouped security alerts for analysis.
The technology disclosed can be practiced as a system, method, device, product, computer readable media, or article of manufacture. One or more features of an implementation can be combined with the base implementation. Implementations that are not mutually exclusive are taught to be combinable. One or more features of an implementation can be combined with other implementations. This disclosure periodically reminds the user of these options. Omission from some implementations of recitations that repeat these options should not be taken as limiting the combinations taught in the preceding sections—these recitations are hereby incorporated forward by reference into each of the following implementations.
A system implementation of the technology disclosed includes one or more processors coupled to memory. The memory is loaded with computer instructions to group security alerts generated from a computer network and prioritize grouped security alerts for analysis. The system graphs entities in the computer network as nodes connected by one or more edges. The system assigns a connection type to each edge. The connection type represents a relationship type between the nodes connected by the edge. The system assigns a weight to each edge representing a relationship strength between the nodes connected. The system assigns native scores from the security alerts to the nodes or to edges between the nodes. The system includes logic to traverse the graph, starting at the starting nodes with non-zero native scores, visiting the nodes in the graph and propagating the native scores from the starting nodes attenuated by the weights assigned to an edge traversed. The traversing extends for at least a predetermined span from the starting nodes, through and to neighboring nodes connected by the edges. The system normalizes and accumulates propagated scores at visited nodes, summed with the native score assigned to the visited nodes to generate aggregate scores for the visited nodes. The normalizing of the propagated scores at the visited nodes includes attenuating a propagated score based on a number of contributing neighboring nodes of a respective visited node to form a normalized score. The system forms clusters of connected nodes in the graph that have a respective aggregate score above a selected threshold. The clusters are separated from other clusters through nodes that have a respective aggregate score below the selected threshold. Finally, the system ranks and prioritizes clusters for analysis according to the aggregate scores of the nodes in the formed clusters.
The system implementation and other systems disclosed optionally include one or more of the following features. System can also include features described in connection with methods disclosed. In the interest of conciseness, alternative combinations of system features are not individually enumerated. Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.
The nodes in the graph representing entities in the computer network can be connected by one or more directed edges. The nodes the graph can also be connected by directed and bi-directed or undirected edges.
The system includes logic to assign native alert scores for pending alerts to edges between the nodes. The system includes logic to distribute native alert scores from edges to nodes connected to the edges. The edges can include a loop edge connected to a single node. In this case, the system assigns the native alert score from a loop edge to the single node connected to the edge. The connection type assigned to edges can include an association connection type, a communication connection type, a failure connection type, a location connection type, and an action or an operation connection type.
When traversing the graph from the starting node to propagate native alert scores, the predetermined span is up to five edge or node hops from the starting node.
The system propagation of native scores from the starting nodes, through and to neighboring nodes connected by the edges is limited to when the propagated score is above a selected threshold and stops when the propagated score is below the selected threshold.
When normalizing the propagated score at the visited node, the system includes logic to attenuate the propagated score at the visited node in proportion to the number of neighboring nodes connected to the visited node by edges of the same connection type.
The system includes logic to attenuate the propagated score at the visited node by dividing the propagated score by a sum of weights of relationship strengths on edges connected to the visited node.
When forming clusters of connected nodes, the system includes logic to separate clusters by at least one node that has an aggregate score below a selected threshold. In another implementation, the system includes logic to separate clusters of connected nodes by at least one node in a pair of connected nodes that has an aggregate score less than ten times the aggregate score of the other node in the pair of connected nodes. In other implementations, higher values of the threshold can be used. For example, the system can include logic to separate clusters of connected nodes by at least one node in a pair of connected nodes that has an aggregate score less than fifteen times, twenty times or twenty five times the aggregate score of the other node in the pair of connected nodes. Similarly, in other implementations, lower values of threshold can be used. For example, the system can include logic to separate clusters of connected nodes by at least one node in a pair of connected nodes that has an aggregate score less than five times, three times or two times the aggregate score of the other node in the pair of connected nodes.
Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform functions of the system described above. Yet another implementation may include a method performing the functions of the system described above.
A method implementation of the technology disclosed includes graphing entities in the computer network as nodes connected by one or more edges. The method includes assigning a connection type to each edge. The connection type represents a relationship type between the nodes connected by the edge. The method includes, assigning a weight to each edge representing a relationship strength between the nodes connected. The method includes assigning native scores from the security alerts to the nodes or to edges between the nodes. The method includes traversing the graph, starting at the starting nodes with non-zero native scores, visiting the nodes in the graph and propagating the native scores from the starting nodes attenuated by the weights assigned to an edge traversed. The traversing extends for at least a predetermined span from the starting nodes, through and to neighboring nodes connected by the edges. The method includes normalizing and accumulating propagated scores at visited nodes, summed with the native score assigned to the visited nodes to generate aggregate scores for the visited nodes. The normalizing of the propagated scores at the visited nodes includes attenuating a propagated score based on a number of contributing neighboring nodes of a respective visited node to form a normalized score. The method includes forming clusters of connected nodes in the graph that have a respective aggregate score above a selected threshold. The clusters are separated from other clusters through nodes that have a respective aggregate score below the selected threshold. Finally, the method includes ranking and prioritizing clusters for analysis according to the aggregate scores of the nodes in the formed clusters.
Each of the features discussed in this particular implementation section for the system implementation apply equally to this method implementation. As indicated above, all the system features are not repeated here and should be considered repeated by reference.
Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform the method described above. Yet another implementation may include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform the method described above.
Computer readable media (CRM) implementations of the technology disclosed include a non-transitory computer readable storage medium impressed with computer program instructions, when executed on a processor, implement the method described above.
Each of the features discussed in this particular implementation section for the system implementation apply equally to the CRM implementation. As indicated above, all the system features are not repeated here and should be considered repeated by reference.
The technology disclosed relates to clutter reduction during graph presentation for security incident analysis.
The technology disclosed can be practiced as a system, method, device, product, computer readable media, or article of manufacture. One or more features of an implementation can be combined with the base implementation. Implementations that are not mutually exclusive are taught to be combinable. One or more features of an implementation can be combined with other implementations. This disclosure periodically reminds the user of these options. Omission from some implementations of recitations that repeat these options should not be taken as limiting the combinations taught in the preceding sections—these recitations are hereby incorporated forward by reference into each of the following implementations.
A system implementation of the technology disclosed includes one or more processors coupled to memory. The memory is loaded with computer instructions to reduce clutter during graph presentation for security incident analysis of a computer network. The system scores nodes that are of indicated interest for security incident analysis and potentially collapsed by equivalence. The system aggregates and hides equivalent nodes that have matching degrees. The equivalent nodes are connected to matching nodes by matching edge types, and have scores below a first selected threshold. The system leaves interesting nodes having scores above the first selected threshold visible.
The system implementation and other systems disclosed optionally include one or more of the following features. System can also include features described in connection with methods disclosed. In the interest of conciseness, alternative combinations of system features are not individually enumerated. Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.
The nodes in the graph of the computer network can represent network resources in the computer network.
The score for a particular node is increased when the particular node is connected to an edge representing a security incident alert.
During a threat hunting alert analysis, the system increases score for a particular node when the node represents a user entity type. The threat hunting analysis includes displaying nodes, representing users in a computer network, to a security analyst as potential threats.
During malware response alert analysis, the system increases the score for a particular node when the node represents a server type entity.
In response to receiving a node pinning message for a node corresponding to a particular user in a computer network for whom the threat hunting alert was generated, the system increases the score for the pinned node representing the particular user above the first selected threshold.
In response to receiving a node pinning message for a node corresponding to a particular server in a computer network for which the malware response alert was generated, the system increases the score for the pinned node representing the particular server above the first selected threshold.
Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform functions of the system described above. Yet another implementation may include a method performing the functions of the system described above.
A method implementation of the technology disclosed includes scoring nodes that are of indicated interest for security incident analysis and potentially collapsed by equivalence. The method includes aggregating and hiding equivalent nodes that have matching degrees. The equivalent nodes are connected to matching nodes by matching edge types, and have scores below a first selected threshold. The method leaves interesting nodes having scores above the first selected threshold visible.
Each of the features discussed in this particular implementation section for the system implementation apply equally to this method implementation. As indicated above, all the system features are not repeated here and should be considered repeated by reference.
Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform the first method described above. Yet another implementation may include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform the first method described above.
Computer readable media (CRM) implementations of the technology disclosed include a non-transitory computer readable storage medium impressed with computer program instructions, when executed on a processor, implement the method described above.
Each of the features discussed in this particular implementation section for the first system implementation apply equally to the CRM implementation. As indicated above, all the system features are not repeated here and should be considered repeated by reference.
A system implementation of the technology disclosed includes one or more processors coupled to memory. The memory is loaded with computer instructions to reduce clutter during graph presentation for security incident analysis. The system identifies chains of at least three nodes having degrees of 1 or 2, without branching from any node in the chain. The system collapses the identified chains into chain-collapsed single nodes.
The system implementation and other systems disclosed optionally include one or more of the following features. System can also include features described in connection with methods disclosed. In the interest of conciseness, alternative combinations of system features are not individually enumerated. Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.
In one implementation, at least one of the chains is a whisker chain having at least three nodes and ending in a leaf node having a degree of 1.
The system scores a plurality of the chain-collapsed nodes that are of interest for security incident analysis for further equivalence collapsing, to prevent aggregation. The system aggregates and hides chain-collapsed nodes that are connected to matching nodes by matching edge types, and that have scores below a second selected threshold. The interesting chain-collapsed nodes having scores above the second selected threshold are left visible and not collapsed.
The system scores a particular chain-collapsed node by increasing the score of the particular chain-collapsed node when a chain length of the particular chain-collapsed node does not match chain length of chain-collapsed nodes connected to the matching nodes. The chain lengths of chain-collapsed nodes indicate number of nodes in respective chains.
Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform functions of the system described above. Yet another implementation may include a method performing the functions of the system described above.
A method implementation of the technology disclosed includes reducing clutter during graph presentation for security incident analysis. The method includes identifying chains of at least three nodes having degrees of 1 or 2, without branching from any node in the chain. The method includes collapsing the identified chains into chain-collapsed single nodes. The chain collapsed nodes can be further collapsed by applying the equivalence collapsing described above, and any or all of its features.
Each of the features discussed in this particular implementation section for the system implementation apply equally to this method implementation. As indicated above, all the system features referenced from the method are not repeated here and should be considered repeated by reference.
Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform the methods described above and any combination of associated features. Yet another implementation may include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform the methods described above and any combination of associated features. As indicated above, the referenced method features are not repeated here and should be considered repeated by reference. Computer readable media (CRM) implementations of the technology disclosed include a non-transitory computer readable storage medium impressed with computer program instructions, when executed on a processor, implement the method described above.
Each of the features discussed in this particular implementation section for the system implementation apply equally to the CRM implementation. As indicated above, all the system features are not repeated here and should be considered repeated by reference.
This application is a continuation-in-part of U.S. Ser. No. 18/069,146 titled “Security Events Graph For Alert Prioritization,” filed on 20 Dec. 2022 (Atty. Docket No. NSKO 1022-3) which is a continuation of U.S. Ser. No. 16/361,023 titled “Systems and Methods for Alert Prioritization Using Security Events Graph,” filed 21 Mar. 2019, now U.S. Pat. No. 11,539,749, issued 27 Dec. 2022 (Atty. Docket No. NSKO 1022-2) which claims the benefit of U.S. Provisional Patent Application No. 62/683,795 titled “Alert Prioritization Using Graph Algorithms,” filed on 12 Jun. 2018 (Atty. Docket No. NSKO 1022-1). These priority applications are incorporated by reference as if fully set forth herein, just as Ser. Nos. 18/069,146 and 16/361,023 previously were. This application is also a continuation-in-part of U.S. Ser. No. 17/516,689, titled “Systems And Methods For Controlling Declutter of a Security Events Graph,” filed on 1 Nov. 2021, now U.S. Pat. No. 11,856,016, issued 26 Dec. 2023 (Atty. Docket No. NSKO 1024-3) which is a continuation of U.S. patent application Ser. No. 16/361,039 titled “Systems and Methods To Show Detailed Structure in a Security Events Graph,” filed on 21 Mar. 2019, now U.S. Pat. No. 11,165,803, issued 2 Nov. 2021 (Atty. Docket No. NSKO 1024-2) which claims the benefit of U.S. Provisional Patent Application No. 62/683,789, titled “System To Show Detailed Structure In A Moderately Sized Graph,” filed on 12 Jun. 2018 (Atty. Docket No. NSKO 1024-1). Priority application Ser. No. 16/361,039 and 62/683,789 are incorporated by reference as if fully set forth herein, just as they were incorporated in U.S. Ser. Nos. 18/069,146 and 16/361,023. Parts of these priority applications are now bodily incorporated in this application.
Number | Date | Country | |
---|---|---|---|
62683795 | Jun 2018 | US | |
62683789 | Jun 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16361023 | Mar 2019 | US |
Child | 18069146 | US | |
Parent | 16361039 | Mar 2019 | US |
Child | 17516689 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 18069146 | Dec 2022 | US |
Child | 18395379 | US | |
Parent | 17516689 | Nov 2021 | US |
Child | 16361023 | US |