According to Kathleen A. Jackson, INTRUSION DETECTION SYSTEM (IDS) PRODUCT SURVEY, Version 2.1, Los Alamos National Laboratory 1999, Publication No. LA-UR-99-3883, Chapter 1.2, IDS OVERVIEW, intrusion detection systems attempt to detect computer misuse. Misuse is the performance of an action that is not desired by the system owner; one that does not conform to the system's acceptable use and/or security policy. Typically, misuse takes advantage of vulnerabilities attributed to system misconfiguration, poorly engineered software, user neglect or abuse of privileges and to basic design flaws in protocols and operating systems.
Intrusion detection systems analyze activities of internal and/or external users for explicitly forbidden and anomalous behavior. They are based on the assumption that misuse can be detected by monitoring and analyzing network traffic, system audit records, system configuration files or other data sources (see also Dorothy E. Denning, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-13, NO. 2, February 1987, pages 222-232).
The types of methods an intrusion detection system can use to detect misuse can vary. Essentially, there are two main intrusion detection methods known, which are described for example in EP 0 985 995 A1 and U.S. Patent document No. 5,278,901.
The first method uses knowledge accumulated about attacks and looks for evidence of their exploitation. This method, which on a basic level can be compared to virus checking methods, is referred to as knowledge-based, also known as signature-based or pattern-oriented or misuse detection. A knowledge-based intrusion detection system therefore looks for patterns of attacks while monitoring a given data source. As a consequence, attacks for which signatures or patterns are not stored, will not be detected.
According to the second method a reference model is built, that represents the normal behavior or profile of the system being monitored and looks for anomalous behavior, i.e. for deviations from the previously established reference model. Reference models can be built in various ways. For example in S. Forrest, S. A. Hofineyr, A. Somayaji and T. A. Longstaff; A Sense of Self for Unix Processes, Proceedings of the 1996 IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press 1996, pages 120-128, normal process behavior is modeled by means of short sequences of system calls.
The second method is therefore referred to as behavior-based, also known as profile-based or anomaly-based. Behavior-based intrusion detection, which relies on the assumption that the “behavior” of a system will change in the event that an attack is carried out, therefore allows to detect previously unknown attacks, as long as they deviate from the previously established model of normal behavior. Under the condition that the normal behavior of the monitored system does not change, a behavior-based intrusion detection system will remain up-to-date, without having to collect signatures of new attacks.
However, since the behavior of a system normally changes over time, e.g. due to changes in the activities of authorized users or installation of new or updated system elements, without immediate adaptation of the used reference model deviations from the modeled behavior will frequently be detected without any intrusions taking place. Behavior-based intrusion detection systems will therefore normally produce a large number of false alarms (false positives) deriving from non-threatening events.
Knowledge-based intrusion detection systems tend to generate fewer false alarms. However, depending on the quality of the stored knowledge of known attacks and the condition of the monitored system these systems may also produce numerous false alarms which can not easily be handled by human system administrators. For example, some network applications and operating systems may cause numerous ICMP (Internet Control Message Protocol) messages (see Douglas E. Comer, INTERNETWORKING with TCP/IP, PRINCIPLES, PROTOCOLS, AND ARCHITECTURES, 4th EDITION, Prentice Hall 2000, pages 129-144), which a knowledge-based detection system may interpret as an attempt by an attacker to map out a network segment. ICMP-messages not corresponding to normal system behavior may also occur during periods of increased network traffic with local congestions.
It is further known that an intrusion detection system may interpret sniffed data differently than the monitored network elements, see Thomas H. Ptacek, Timothy N. Newsham, Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection, Secure Network Inc., January 1998, which under certain conditions could also lead to false alarms.
False alarms, appearing in large numbers, are a severe problem because investigating them requires time and energy. If the load of false alarms in a system gets high, human system administrators or security personnel might become negligent. In Klaus Julisch, Dealing with False Positives in Intrusion Detection, RAID, 3rd Workshop on Recent Advances in Intrusion Detection, 2000, it is described that filters could be applied in order to remove false alarms. Filters can also use a knowledge-based approach (discarding what are known to be false positives) or a behavior-based approach (discarding what follows a model of normal alarm behavior). Either way, maintaining and updating models or knowledge bases of filters and intrusion detection systems requires further efforts.
It would therefore be desirable to create an improved method and a system for processing alarms triggered by a monitoring system such as an intrusion detection system, a firewall or a network management system in order to efficiently extract relevant information about the state of the monitored system or activities of its users.
It would further be desirable for this method and system to operate in the presence of a large amount of false alarms, which are received at a rate that can not be handled efficiently by human system administrators.
Still further, it would be desirable to receive the results of said data processing procedures, in a short form but with a high quality of information, that can easily be interpreted by human system administrators or automated post processing modules.
In accordance with the present invention there is now provided a method, a computer program element and a system according to claim 1, claim 14 and claim 15.
The method allows to process alarms triggered by a monitoring system such as an intrusion detection system, a firewall or a network management system in order to extract relevant information about the state of the monitored system or activities of its users.
In order to obtain relevant information about the state of the monitored system or activities of its users,
In the event of high rates of alarm messages, possibly containing a high percentage of false alarms, human system administrators will not be confronted with a flood of messages with little significance. Instead, only generalized alarms, which are more meaningful and less in number, are presented to human system administrators. This fosters understanding of alarm root causes and facilitates the conception of an appropriate response to alarms (e.g. by suppressing false alarms in the future, or by repairing a compromised system component).
Key to alarm clustering is the notion of alarm similarity. Different definitions of alarm similarity are possible, but in a preferred embodiment, alarm similarity is defined as the sum of attribute similarities and attribute similarity is preferably defined via taxonomies. Examples of attributes include the alarm source, the alarm destination, the alarm type, and the alarm time. A taxonomy is an “is-a” generalization hierarchy that shows how attribute values can be generalized to more abstract concepts. Finally, two attribute values are all the more similar, the closer they are related by means of their taxonomies.
By way of illustration, a taxonomy on the time attribute might establish the following “is-a” hierarchy:
Given this taxonomy, timestamp t1 is more similar to t2 than to t3. This is because t1 and t2 are related via the concept “workday”. In contrast, t1 and t3 are only related via the concept “day of the week”, which is less specific, thus resulting in a smaller similarity value. Finally, as stated earlier, alarm similarity is defined as the sum of attribute similarities.
Alarm clusters can easily comprise thousands of alarms. Therefore, it is not viable to represent alarm clusters by means of their constituent alarms. Indeed, doing so would mean to overwhelm a recipient with a vast amount of information that is hard to make sense of. To solve this problem, alarm clusters are represented by so-called generalized alarms. Generalized alarms are like ordinary alarms, but their alarm attributes can assume higher-level concepts from the taxonomies. To continue the above example, the time-attribute of a generalized alarm might assume any of the values “monday”, . . . , “sunday”, “workday”, “holiday”, or “day of the week”.
The rationale for clustering similar alarms stems from the observation that a given root cause generally results in similar alarms. Thus, by clustering similar alarms, it is attempted to group alarms that have the same root cause. Finally, generalized alarms provide a convenient vehicle for summarizing similar alarms in a succinct and intuitive manner. The end result is a highly comprehensible, extremely succinct summary of an alarm log that is very adequate for identifying alarm root causes. Identifying alarm root causes is of value as it is the basis for finding an appropriate response to alarms (such as shunning attackers at the firewall, or suppressing false positives in the future, etc.). In this way, the described invention offers an effective and efficient method for managing large amounts of alarms.
Some of the objects and advantages of the present invention have been stated, others will appear when the following description is considered together with the accompanying drawings, in which:
In the examples presented below, alarms are modeled as tuples over a multidimensional space. The dimensions are called alarm attributes or attributes for short. Examples of alarm attributes include the source and destination IP address, the source and destination port, the alarm type which encodes the observed attack, and the timestamp which also includes the date.
Formally, alarms are defined as tuples over the Cartesian product X1≦i≦n dom, where {A1, . . . , An} is the set of attributes and dom is the domain (i.e. the range of possible values) of attribute Ai. Furthermore, for an alarm a and an attribute Ai, the projection a[Ai] is defined as the Ai value of alarm a. Next, an alarm log is modeled as a set of alarms. This model is correct if the alarms of alarm logs are pairwise distinct—an assumption made to keep the notation simple. Unique alarm-IDs can be used to make all alarms pairwise distinct.
Ai shall be an alarm attribute. A tree Ti on the elements of dom is called a taxonomy (or a generalization hierarchy). For two elements x, {circumflex over (x)} ε dom, {circumflex over (x)} is called a parent of x, and x a child of {circumflex over (x)} if there is an edge {circumflex over (x)}→x in Ti. Furthermore, {circumflex over (x)} is called a generalization of x if the taxonomy Ti contains a path from {circumflex over (x)} to x, in symbols: x{circumflex over (x)}. The length of this path is called the distance δx, {circumflex over (x)}) between x and {circumflex over (x)}. δ(x, {circumflex over (x)}) is undefined if x{circumflex over (x)} is not satisfied. Finally, x{circumflex over (x)} is trivially satisfied for x={circumflex over (x)}, and δ(x, {circumflex over (x)}) equals 0 in this case.
By way of illustration,
The domain of IP addresses is the union of “elementary” IP addresses (i.e. the set {p.q.r.s|p, q, r, s ε {0, . . . , 255}}) and “generalized” IP addresses (i.e. the set {FIREWALL, WWW/FTP, DMZ, EXTERN, ANY-IP}).
Analogously, the domain of port numbers is {1, . . . , 65535, PRIV, NON-PRIV, ANY-PORT}.
Next, according to
Furthermore,
Finally, δ(ip1,ip2) is not defined because ip1ip2 is false.
Next, the notation is extended from attributes to alarms. To this end, a, âεX1≦i≦n dom shall denote two alarms. The alarm â is called a generalization of alarm a if a[Ai]â[Ai] holds for all attributes Ai. In this case, aâ.
Furthermore, if aâ holds, then the distance δ(a, â) between the alarms a and â is defined as
If aâ is not satisfied, then δ(a, â) is undefined. Finally, in the case of aâ, a is more specific than â, and â is more abstract than a.
As a convention, the symbols A1, . . . , An are used to stand for alarm attributes. Furthermore, the symbols T1, . . . , Tn are reserved for taxonomies on the respective attributes. Finally, the symbol L will be used to denote an alarm log and the symbol G will be used to denote a cluster log.
Below, similarity is defined. To this end, S⊂L shall denote a set of alarms a. The cover of S is the most specific alarm c,
cεX1≦i≦ndom
to which all alarms a in S can be generalized. Thus, the cover c satisfies ∀aεS:ac, and there is no more specific alarm c′ (c′c) that would also have this property. The cover of S is denoted by cover(S).
For example, according to the taxonomies shown in
Finally, the dissipation of S is defined as
It is verified that Δ({(ip1,80),(ip4,21)})=½*(3+3)=3 (cf. FIGS. 2,3,4). Intuitively, the dissipation measures the average distance between the alarms of S and their cover. The alarms in S are all the more similar, the smaller the value of Δ(S) is. Therefore, it is attempted to minimize dissipation in order to maximize intra-cluster alarm similarity.
Next, the alarm clustering problem is described. To this end, L shall be an alarm log, min-sizeεN, N being the set of natural numbers, an integer, and Ti, i=1, . . . , n, a taxonomy for each attribute Ai in L.
Definition 1 (Alarm Clustering Problem)
(L, min-size, Ti, . . . , Tn.) shall be an (n+2)-tuple with symbols as defined above. The alarm clustering problem is to find a set C⊂L that minimizes the dissipation Δ, subject to the constraint that |C|≧min-size holds. C is called an alarm cluster or cluster for short.
In other words, among all sets C⊂L that satisfy |C|≧min-size, a set with minimum dissipation shall here be found. If there are multiple such sets, then anyone of them can be picked. Once the cluster C has been found, the remaining alarms in L\C can be mined for additional clusters. One might consider to use a different min-size value for L\C, an option that is useful in practice. Further, also another criterion may be defined for the completion of a cluster.
Imposing a minimum size on alarm clusters has two advantages. First, it decreases the risk of clustering small sets of unrelated but coincidentally similar alarms. Second, large clusters are of particular interest because identifying and resolving their root causes has a high payoff. Finally, the decision to maximize similarity as soon as the minimum size has been exceeded minimizes the risk of including unrelated alarms in a cluster.
Clearly, stealthy attacks that trigger fewer than min-size alarms do not yield any clusters. Here it is intended however, to identify a predominant root cause that accounts for a predetermined amount of alarms. By removing the root cause, the number of newly generated alarms can be reduced. This reduction is of advantage as screening the reduced alarm stream for attacks is much more efficient.
For a practical alarm clustering method, the following result is relevant:
Theorem 1: The alarm clustering problem (L, min-size, Ti, . . . , Tn.) is NP-complete. The proof can be obtained by reducing the CLIQUE problem to the alarm clustering problem.
Below, an approximation method for the alarm clustering problem will be described. Before, it is assumed that alarm clusters can be discovered. Then, the question arises how alarm clusters are best presented, e.g. to the system administrator. Alarm clusters can comprise thousands of alarms. Therefore, it is not viable to represent clusters by means of their constituent alarms. Indeed, doing so would mean to overwhelm the receiving system administrator with a vast amount of information that is hard to make sense of. To solve this problem, clusters are represented by their covers. Covers correspond to what is informally called “generalized alarms”.
In order to obtain generalized alarms that are meaningful and indicative of their root cause, it is valuable to take advantage of several or even all alarm attributes. In particular, string and time attributes can contain valuable information, and the following discussion shows how to include these attribute types in this framework. For brevity, the discussion will rely on examples, but the generalizations are clear.
Time attributes are considered first. Typically, one wishes to capture temporal information such as the distinction between weekends and workdays, between business hours and off hours, or between the beginning of the month and the end of the month. To make the clustering method aware of concepts like these, one can use a taxonomy such as the ones in
String attributes are considered next. String attributes can assume arbitrary text values with completely unforeseeable contents. Therefore, the challenge lies in tapping the semantic information of the strings. This problem is solved by means of a feature extraction step that precedes the actual alarm clustering. Features are bits of semantic information that, once extracted, replace the original strings. Thus, each string is replaced by the set of its features. Subset-inclusion defines a natural taxonomy on feature sets. For example, the feature set {f1, f2, f3} can be generalized to the sets {f1, f2}, {f1, f3}, or {f2, f3}, which in turn can be generalized to {f1}, {f2}, or {f3}. The next level is the empty set, which corresponds to “ANY-FEATURE”.
One can select features that capture as much semantic information as possible, using well established techniques that support feature selection.
Given the NP completeness of alarm clustering, an approximation method has been developed as follows. An approximation method for the problem (L, min-size, Ti, . . . , Tn.) finds a cluster C⊂L, that satisfies a predetermined criterion of |C|≧min-size, but does not necessarily minimize Δ. The closer an approximation method pushes Δ to its minimum, the better.
The proposed approximation method is a variant of attribute oriented induction (AOI). The modification according to the invention over known AOI is twofold: First, attributes are generalized more conservatively than by known AOI. Second, a different termination criterion is used, which is reminiscent of density-based clustering.
To begin with, the proposed approximation method directly constructs the generalized alarm c that constitutes the algorithm's output. In other words, the method does not make the detour over first finding an alarm cluster and then deriving its cover. The method starts with the alarm log L, and repeatedly generalizes the alarms a in L. Generalizing the alarms in L is done by choosing an attribute Ai and replacing the Ai values of all alarms by their parents in Ti. This process continues until an alarm c has been found to which at least min-size of the original alarms a can be generalized. This alarm constitutes the output of the method. Below, the resulting method is shown.
In more detail, line 1 of table 1 makes a copy of the initial alarm log L. This is done because the initial alarm log L is used in line 4. Below, the copy of the alarm log L is called cluster log G since it will contain generalized alarms c that cover clusters C of alarms a contained in the alarm log L. The alarm log L therefore contains the initial unchanged alarms a while the cluster log contains covers or generalized alarms c that may change during the generalization process.
In line 5, the method terminates when a generalized alarm c has been found to which the predetermined criterion applies, i.e. here at least min-size alarms aεL can be generalized. If the method does not terminate, then the generalization step (lines 8 and 9) is executed. Here, selecting an attribute Ai is guided by the following heuristic:
For each attribute Ai, fiεN, with N being the set of natural numbers, shall be maximum with the property that there is an alarm c*εG such that a[Ai]c*[Ai] holds for fi of the original alarms a Î L. If fi is smaller than min-size, then it is clear that one will not find a solution without generalizing Ai and, therefore, select Ai for generalization. This will not eliminate the optimal solution from the search space. If, on the other hand, fi≧min-size holds for all attributes, then the attribute Ai with the smallest fi value is selected.
Although further heuristics are applicable, it has been found that the above heuristic works well in practice, and it is the heuristic of the preferred embodiment.
Based on the above, one could conceive a completely different approximation method, for example one that is based on partitioning or hierarchical clustering. The above method is advantageous for its simplicity, scalability, and noise tolerance.
cεX1≦i≦ndom
to which all alarms a in S can be generalized. The cluster log G therefore contains generalized alarms c, each with a size field indicating the number of alarms a covered in the alarm log L.
Before an attribute of an alarm is selected for generalization as indicated in line 7 of the alarm clustering method, generalized alarms c are preferably created for alarms that are identical. The section of the alarm log L shown in
However, as long as z<min-size, an attribute Ai is selected which is generalized for each alarm aεG. As shown in
Another example is given in
In the example of
For IP addresses and port numbers, the taxonomies in
Each line of the cluster log G describes one generalized alarm c indicating in the “Size” column the size of the covered cluster C. The size of the cluster is the number of covered alarms. The AT column shows the Alarm Types, for which mnemonic names are provided below the table. Within the cluster log G, “ANY” is generically written for attributes that have been generalized to the root of their taxonomy Ti. It is worth noting that only alarm types 1 and 2 have context attributes. Therefore, the context attribute is undefined for all the other alarm types. Also, the port attributes are occasionally undefined. For example, the ICMP protocol has no notion of ports. As a consequence, the port attributes of alarm type 5 are undefined. Finally, the names ip1, ip2, . . . refer to the clients and servers in
The clusters in cluster log G shown in
What has been described above is merely illustrative of the application of the principles of the present invention. Other arrangements can be implemented by those skilled in the art without departing from the spirit and scope of protection of the present invention. In particular, the application of the inventive method is not restricted to processing alarms sensed by an intrusion detection system. The method can be implemented in any kind of decision support application, that processes amounts of data.
The method can be implemented by means of a computer program element operating in a system 20 as shown in
Number | Date | Country | Kind |
---|---|---|---|
01811155.9 | Nov 2001 | EP | regional |
This application is a continuation of U.S. Ser. No. 10/287,132, filed Nov. 1, 2002, the entire contents of which are incorporated herein by reference. The present invention generally relates to a method, a computer program element and a system for processing alarms that have been triggered by a monitoring system such as an intrusion detection system, a firewall or a network management system. The present invention specifically relates to a method and a system for processing alarms triggered by a host or network intrusion detection system, operating by means of behavior-based or knowledge-based detection, in order to extract information about the state of the monitored system or activities of its users. More particularly, the present invention relates to a method and a system for processing alarms, possibly containing a high percentage of false alarms, which are received at a rate that can not be handled efficiently by human system administrators. This invention is related to an invention disclosed in copending U.S. patent application Ser. No. 10/286,708 entitled “METHOD, COMPUTER PROGRAM ELEMENT AND A SYSTEM FOR PROCESSING ALARMS TRIGGERED BY A MONITORING SYSTEM”, filed in the name of International Business Machines Corporation, claiming as priority EP patent appl. EP 01811155.9 filed on Nov. 29, 2001, that is herewith incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 10287132 | Nov 2002 | US |
Child | 12142497 | US |