The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses.
Starting with
An advantage of this innovative technique is that it uses a statistical model for the system operation. This model can learn from available training data 110 used by training module 100 to establish the model. It can also be trained once more training data, such as suspect events and intrusion types 170, becomes available. It can further be continuously trained with additional data to adapt itself towards any migration of the normal system activities 130.
Additionally, the technique can be applied to different types of logged data. It can also be applied to port scanning to analyze network activities. It further applies to operating system security as well as network security issues.
As mentioned above, this innovative technique can detect abnormal operations by using the system log files. The system log utilities record all the system and network activities. Any individual system activities such as opening a new session can look benign. However, a combination of seemingly harmless activities can imply a malicious attack. These attacks generally follow certain patterns, especially known attacks. Due to the large quantities of system log data, a statistical model can be used. In other words, the system activities can be modeled using a Markov model of a log file based on the fact that the current event mainly depends on the event that just happened. For example, if an attacker just failed to gain access to the system, he is likely to try again. Additionally, there is a correlation among the different log files in the system. Research has illustrated that different attacks have a distinctive pattern in how they show up in different log files. This insight provides us with another dimension that can be modeled statistically. For example, if an event belongs to one specific attack, the probability that the event is recorded in file B is high given it is recorded in file A.
The parameters of the Markov process and the correlation among the different log files can be determined using the standard training techniques using available log data. The Markov model, also termed a double Markov chain or double Markov process, represents known or generalized attack scenarios. Based on the pre-trained parameters and the observation sequence obtained from various log files under examination, the probability of an attack occurring given this observation sequence is calculated. If this probability is high, an attack is suspected.
In case of an intrusion notification by this system, the system administrator can review in detail the flagged log data and decide if the system should be isolated from other machines on the network. For maximum security, the system should take itself off the network before notifying the system administrator.
The intrusion detection approach according the various embodiments can be advantageous over previous approaches in one or more of several ways. For example, some embodiments of the present approach can look at multiple log files and the correlation among the different log files. Thus, some embodiments of the present approach can yield a double Markov chain. Also, some embodiments of the present approach can look for an intrusion directly. In other words, the transitional, initial and conditional probabilities can be trained using abnormal activities. Even though new viruses constantly emerge, they normally bear a striking resemblance at least in some of the activities when reflected by system call level logging. Thus, this new model can reflect the typical behavior of an intrusion. Additionally, the model can be retrained every time a new intrusion is detected. Further, some embodiments of the present approach can perform initial state identification as pre-processing by using the time stamp in the log files. For any events that are separated by a large time interval, it is possible to consider them as separate sequences of operations. Additionally or alternatively, it is possible to screen the events by looking for the possible initial state, such as a login or port scanning, to filter out the most obvious normal activities and decrease the overhead on the system. Finally, some embodiments of the present approach can consider that the frequency of one single operation, such as repeated login and port scanning in a very short time, indicates possible intrusions. Thus, it is possible to utilize this consideration as part of a pre-processing procedure.
The statistical model takes advantage of the fact that the system log files record all the activities in a computer system. This record includes the entire login, network activities, etc. Statistical methods can be used to model the system activities. In particular, a double Markov chain can be used to model the system activities via the system log files. Using this statistical model, it is possible to detect the abnormalities in a system on the fly.
Regarding the Markov model, a statistical process X is a Markov process, if and only if the probability
P(Xn+1|X0,X1, . . . ,Xn)=P(Xn+1|Xn)
i.e., the probability of Xn+1's occurring depends only on Xn.
Markov modeling of log data takes advantage of the fact that the system activities have the Markovian property. For example, the likelihood of a user login after system boot up is much greater than any other activity. Each specific event, e.g., login, contains system operations that relate to each other, particularly the current operation and the one right after. These operations form a Markov chain. Similarly, in the event of an attack, the activities can also be considered as a Markov chain. An attacker will likely to do port scanning. Once he manages to gain access to a machine, he will try to set up an account, possibly with superuser privilege, and open a backdoor for future use.
There are multiple log files associated with a system. Each of these log files monitors the related activities in the system. While one of the log files logs the system activity in one aspect, there is normally at least one of the rest log files records some activity to one specific interest. For example, /var/log/messages logs all the system activities in Linux kernel; /var/log/boot.log only logs the booting activities. The relationship of these log files in regards to one specific event can be considered as a Markov chain.
Let us denote an event as Λ, O as the observation sequence. The probability of seeing the observation sequence O given the event Λ is,
P(O|Λ)=P(O11O12 . . . O1n
Where n1, n2, . . . nm are the number of operations recoded in the log files while m is the number of log files P12, . . . Pm
P(Oi1Oi2 . . . Oi,n
The observation sequence consists of the system operations associated with each event. In each log file, different events can be segmented using the time stamp associated with each system operation. If the two consecutive operations happen within a small interval of time, these operations are considered to belong to the same event. If there is a considerable temporal gap between two consecutive operations, these two operations belong to different events. The first operation marks the ending of the event. The characteristic of normal users and intruders is that the normal user has access to and familiarity with the system, and thus works in a more relaxed manner. On the other hand, the intruder needs access to the system and works in an unfamiliar environment. The intruder also needs to work quickly in order to avoid detection. Thus, the system operations should be closely spaced temporally.
In terms of training, the aim is to detect intrusions as soon as possible. Training is performed on known abnormal system log data. These log files are parsed according to the time stamp information as just discussed. In our model, the transitional probabilities between different log files need to be estimated using the log files. Within each log file, the conditional probabilities of the next operation given an operation need to be calculated as well as the initial probability of P(Oi1). When additional data is available, even after the Markov model is established, retraining or modification can be made to the model.
Pre-processing can be performed based on an initial state of the observation sequence. The observation sequence is obtained by grouping the system operations by the time stamps. If two consecutive system operations are separated by a larger time interval, we consider these two operations belong to two separate events. The initial state of the observation sequence is the first system operation that starts a new event.
Possible intrusions normally start with limited variety of system operations. A pre-processing can be implemented to eliminate the events that are highly unlikely to be intrusions. This pre-processing can greatly reduce the overhead to the system. One or more trained Markov models can be used for this purpose. Any zero initial possibility can be interpreted as an indication of that system call is unlikely to start an intrusion.
In some embodiments, pre-processing can also be based on the frequency of one operation. For example, a repeated operation condensed in time can be an indication of possible intrusions. In particular, a repeated failed login or a reported port scanning can indicate some malicious events. This behavior can be caught up with the Markov model. However, since it is easier to distinguish, screening the repeated malicious pattern as part of a pre-processing can reduce the overhead to the system.
Turning now to
Given one or more initial models, intrusion detection can then be performed at step 310. For example, the system log is routinely scanned and data processed periodically at step 360. Events are grouped by time stamps at step 370. Preprocessing is performed at step 380 to reduce system overhead. For each new system event that is under examination after the models have been established, the conditional probabilities P(O|Λj), j=1, 2, . . . N, where N is the number of Markov models, are calculated at step 390. The maximum of P(O|Λj), j=1, 2, . . . , N is examined at decision step 400. If this probability is over some threshold, we consider this event to belong to the specific model, possibly a known attack. If the event is classified as one of the known attacks, immediate protection steps are taken at step 330 while notifying the system administrator. Once a new event or a new attack is identified, new training for the model can be performed at step 340 to enable the system to automatically detect and protect from the new attack scenario. If desired, models can also be updated for normal system activity at step 350 when an intrusion is not detected.
There are common features associated with most attacks. For example, the attacker performs port scanning on multiple ports consecutively. This activity is recorded in system log files such as /var/log/messages as well as the network log files. In practice, the conditional probability of an intruder performing a port scan given a previous port scan operation is reasonably larger than a port scan followed by other operations. Also, the transitional probability from the /var/log/messages to the network log is also reasonably high in this case. For a typical Trojan virus, the operations can be summarized as the following: the attacker remotely gains root privilege; the attacker creates an account with superuser privilege; sessions are opened for the newly created account; ftp an attack toolkit from another system.
A typical Markov model for the system log file such as /var/log/messages is illustrated in
These activities are recorded in one or more log files. The above example illustrates the Markov model for one log file; similar models can be obtained for all the different log files. The relationships among the different log files are represented using the set of probabilities P12, . . . Pm
As we have discussed earlier, the high correlation between the operations and the log files can be modeled using a Markov process. Using known attack data, we can train a Markov model for this scenario. With the trained model, log files are segmented using time stamp information or by predetermined window sizes. This segmented group of operations is then passed to the model to see if it fits the model. If the probability of belonging to the attack model is high, the system administrator is notified.
In conclusion, the proposed technique utilizes the Markov model as the statistical model in modeling a running computer system. This model can be updated as more training data becomes available. It can be applied to various system logging data and use these data to detect any abnormality in a running system. Once a potential problem is identified, the system administer can be notified. The administrator can decide if the system needs to be isolated from the network. The system can also be configured such that if a potential problem is found, it will automatically be taken off-line to prevent further damage to the overall system.
The same algorithm applies to various data. One such example can be the port activity data obtained from routine porting scanning. Using the port data, potential break-ins can be detected before it causes severe damage to the entire network. When applying this algorithm to other computer data, only the observation sequence needs to be defined accordingly, as to reflect the characteristics of that specific application.
It should be readily understood that various embodiments the intrusion detection technique can be combined in various ways. For example, one way embodiments of the intrusion detection technique can be combined is to take different countermeasures based on dangerousness of a recognized attack pattern and/or a level of confidence with which an attack pattern is recognized. In such cases, the system can take itself offline if a dangerous attack pattern is recognized with a high degree of probability exceeding a first threshold selected to reflect near certainty that the attack is taking place. Yet, the system can merely flag suspect log data and notify the administrator if the degree of probability falls below the first threshold but above a second threshold selected to reflect mere possibility that the attack is taking place. Moreover, it is possible that a less dangerous attack can have the maximum probability, while a more dangerous attack can still have a sufficient probability to warrant countermeasures. In this case, there is a lack of confidence that a particular attack is taking place, but the countermeasures can still be applied based on either or both of the attack patterns being recognized. For example, the system can be taken offline, the suspect log data flagged, and the system administrator notified that both types of attacks are possible. It is envisioned that the countermeasures taken and the criteria for taking the countermeasures can be specified by the system administrator. Moreover, if the routine benign behavior repeatedly trips a possible intrusion, the system administrator's negative feedback that no intrusion took place can be used to retrain the double markov model.