The rapid detection of security threats is critical for organizations to prevent the compromise of their computer systems, data, networks and applications Organizations, whether commercial, educational or governmental, and other enterprises store and transfer the majority of their data in digital form in computer systems and databases. Much of this data is valuable confidential commercial information or private information about individual employees or members that is not intended for public view, and any exposure or manipulation of this data could cause the organization and individuals great financial or reputational damage. Organizations are consistently challenged by threats aimed at stealing, exposing or manipulating this digital data. A large number of these attacks, as reported by the news media, have involved fraud, data breach, intellectual property theft or national security. Some attackers who may have been backed by nation states or organizations with political agendas have taken to more sinister attacks aimed at gaining control or damaging critical infrastructures.
Organizations typically employ a multi-layered network topology to separate various components of their IT infrastructure from the Internet. Workstations and servers are generally protected from direct access via the Internet or other external networks by a web proxy server; Internet traffic is typically terminated at “demilitarized network zones” (DMZ); and the incoming traffic is filtered through a firewall. External attackers normally attempt to penetrate an organization's defenses that are set up at the organization's network perimeter, and many security solutions exist to address external attacks. However, once external attackers breach the perimeter and get onto the internal network, they typically operate under the guise of an internal user by either hijacking an existing user's account or by creating a new user. Internal attackers are more insidious and more difficult to defend against because they are users of the organization's computer network systems. They have legitimate IT accounts, and their unauthorized or illicit activities may generally fall within their areas of responsibility, but exceed what is normal behavior. Attacks may even involve a nexus between external and internal attackers. For instance, detecting illicit activity by an insider customer service representative such as granting a customer an inappropriately large refund may be very difficult to detect.
Most security solutions primarily utilize signatures of known attacks to identify and alert on similar attacks. In order to define signatures for any new threat, the underlying components of the associated threat vector must be studied in detail and signatures of these threat vectors must be made available to a threat detection system. There are several major shortcomings of these signature-based threat detection approaches. The development of signatures for new threats requires an in-depth analysis on an infected system, which is time consuming and resource intensive, and may be too slow to address quickly evolving threats. Signatures do not adapt themselves to changes in threat vectors. Moreover, signature-based approaches are ineffective for zero-day attacks that exploit previously unknown vulnerabilities, and are not available for detecting insider threats originating from within an organization. Identifying insider attacks typically involves constructing various profiles for the normal behaviors of insiders, detecting anomalous deviations from these profiles, and estimating, the threat risk of these anomalies. However, constructing profiles that accurately characterize normal insider behavior is difficult and is not an exact art. For example, many profiles are constructed using statistical approaches for observables that are assumed incorrectly to be normally distributed when they are not. Using such profiles for detecting behavior anomalies can produce erroneous results and lead to many false positive alerts that overwhelm security analysts. Balancing between the risk of missing an actual threat, by using high confidence levels for detection to minimize false positives and using an overly permissive approach that floods security analysts with alerts is a difficult trade-off.
There is a need for systems and methods that address these and other anomaly detection problems in protecting organizations from data breaches and other losses. In particular, there is a need for proactive, reliable adaptive defense capabilities for detecting anomalous activity within an organization's IT infrastructure to identify threats while minimizing false positive alerts. It is to these ends that this invention is directed.
The invention provides a system and method for automatic creation of adaptive behavioral profiles for observables associated with resource states and events in a computer network (IT) infrastructure of an enterprise and for detecting anomalies that represent potential malicious activity and threats as deviations from normal behavior. Separate profiles may be created for each behavioral indicator, as well as for each time series of measurements, and aggregated to create an overall behavioral profile. An anomaly probability is determined from the behavioral profile and used to evaluate the data values of observables. Outlier data values which deviate from normal behavior by more than a predetermined probability threshold are identified for risk analysis as possible threats while inliers within the range of normal behavior are used to update the behavioral profile. Behavioral profiles are created for behavioral indicators based upon observables measured over predetermined time periods using algorithms employing statistical analysis approaches that work for any type of data distribution, and profiles are adapted over time using data aging to more closely represent current behavior. Algorithm parameters for creating profiles are based on the type of data, i.e., its metadata.
The invention is particularly well adapted to adaptive profile generation and anomaly detection for risk assessment in computer network infrastructures of enterprises, and will be described in that environment. It will be appreciated, however, that this is illustrative on only one utility of the invention, and that the invention has greater applicability and utility in other contexts.
The invention affords a machine learning system and method that comprise a computer of an organization's computer network infrastructure and executable instructions stored in a computer readable non-transitory medium that control the computer to create behavioral profiles and anomaly probability characteristics based upon a time series of observable events and/or network resource states for evaluating activities to detect anomalies. As will be described in more detail, a behavioral profile may be created for each behavioral indicator of an activity for any entity whether it is a person, computer system or application. Identity aggregation monitors all entities associated with an activity to capture entity activities that otherwise could be obscured or masked by multiple entity identifiers or aliases. A profile is created to comprise a condensed cyclical representation of past behavior, organized according to the time series the behavior represents. For instance, a simple daily profile comprises a statistical description of data for any given day, while a day of the week profile is a collection of seven daily profiles, one for each day of the week. The statistical description depends upon the distribution of data. It may be uniform, as for a range of data, unimodal having a single peak, or multimodal having multiple peaks. Regardless of how complex the data distribution is, the invention enables automated creation of behavioral profiles for a practically unlimited number of observations. Adaptive profile aging enables affords incremental real time updates to profiles to accommodate changing behavioral patterns.
Peer group analysis 108 identifies functionally similar groups of actors (users or resources) based on their attributes as provided by the inventory systems and predefined grouping rules. For example, users can be grouped by their job title, organizational hierarchy, or location, or any combination of attributes that indicate similarity of job function. Systems and devices can be grouped by the function they perform (e.g., database, application, or web server), network location (e.g., DMZ or other network segment), or organizational environment (e.g., production, test, development, etc.) Peer groups may be further refined by observing similarities in access patterns, based on granted access entitlements or actual logged resource access. It is desirable to accurately identify peer groups to ensure low false positive rates in detecting access or behavior outliers.
Behavioral profiles 122 (
In accordance with the invention, a behavioral profile is created at 122 for each behavioral indicator 120, whether it be for an individual, a peer group, an actor or a resource, which is then used to create a normalized anomaly probability for detecting anomalies 124 (
The invention employs an algorithmic process that automates the creation of a behavioral profile by reducing a large set of observations, regardless of distribution, to a small set of statistical parameters, and continuously updates and adapts the profile using current data to improve anomaly detection. The behavioral profile establishes a baseline for what is considered to be normal behavior, and the anomalies are detected as deviations from that normal behavior. In accordance with a preferred embodiment, the invention uses Gaussian kernel density estimation to build the behavioral profile, although other analytical approaches may be used as well.
As shown on the plot of
where h is the kernel bandwidth that controls how much blur or noise is introduced. The bandwidth, h, may be selected based upon the type of data i.e., the metadata that describes the type characteristics of the data. The minimum bandwidth may be selected based upon the maximum resolution of the data, e.g., event counts comprise discrete numbers and have an increment of 1. Accordingly, for count, type data, 1 should be the minimum bandwidth. For data with an unbounded range, such as the count of bytes in a transaction, the bandwidth is preferably linearly dependent ort the count value to maintain a consistent error range. For instance, to allow a 10% variation in the data, the bandwidth should increase as 0.1 of value, i.e., h=1+0.1ν. For data having a bounded range, such as event frequency, the bandwidth should preferably be constant and may be determined by the actual bandwidth using, for example, the medium absolute deviation (MAD) and Silverman's rule, but preferably is not less than the 10% variation. Assuming a midrange frequency of 0.5, the minimum bandwidth would be 0.05:
where N is the number of data points and MAD is the medium absolute deviation
The invention preferably employs an adaptive profile aging process which adapts a profile by weighting it in favor of the most recent data, where older observations are eventually phased out after a selected aging period. This allows the profile to mutate over time to maintain currency with changing behavior. The profile aging process of the invention preferably adapts a profile by using an exponential decay factor to gradually forget old data while adapting to the new behavior. The exponential decay factor may be determined from a function of the form: N(t)=N02−t/h, where h is the desired half-life of the decay at a time at which the weight of the original behavior decreases by half. Preferably, the adaptive profile aging process is performed at the beginning of each cycle, where the process multiplies the previous profile by the decay factor to deemphasize older data before any new observations are added.
As indicated above, anomalies may be detected by determining their deviations from associated normal behavioral profiles. Since profiles are not normalized, profiles of different behavioral indicators cannot be directly compared with one another, which complicates characterizing deviations of different indicators. Accordingly, the invention introduces a new approach to detecting anomalies that normalizes deviations from a profile by defining and employing an anomaly probability function which measures the probability that a deviation from the normal behavioral profile is an anomaly. In accordance with the invention, the anomaly probability function P(v) may be defined as a Lorentz function of the behavioral profile:
where φ*(v) is the behavioral profile, and k is the number of observations at which the probability is 0.5. The anomaly probability function has a value between 0 and 1 that indicates the probability that a deviation is an anomaly. The anomaly probability function produces a characteristic profile that is substantially a normalized inverse of the behavioral profile, as can be seen in
An anomaly of individual behavioral indicator, Pl, may be defined as a deviation from the normal profile, and is the opposite of normality (PN=1−Pl), which may be measured as a statistical probability of the new observation coming from the same population as described by the behavioral profile. Therefore, if the observation fits the profile within a statistically determined margin of error, then PN=1 and Pl=0. If the observation deviates significantly from the profile, then PN goes to 0 and Pl approaches 1. An individual anomaly may then be compared to the corresponding peer group anomaly, PG. If the observed behavior is normal for peers (PG is low), and the effective anomaly may be discounted to reduce false positives:
and the effective anomaly may be compared to the resource profile. In this case, any resource anomaly, PR, will be an amplifying factor, i.e.:
{circumflex over (P)}=1−(1−
As will be appreciated from the foregoing, an anomaly detection process using adaptive behavioral profiles in accordance with the invention enables automated anomaly detection in real time. It is easy to implement, is computationally efficient and is readily adaptable to different purposes. It has wide applicability to both internal and external activities and events of individuals, groups, and resources within a computer network of an organization. Moreover, the process may be used for fine-grained as well as large scale detection of anomalies, has good accuracy, and affords low false positive rates.
While the foregoing has been with respect to particular embodiments of the invention, it will be appreciated that changes to these embodiments may be made without departing from the principles of the invention, the scope of which is defined by the appended claims.
This application claims the benefit of U.S. application Ser. No. 62/110,031, filed Jan. 30, 2015, the disclosure of which is incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
6850252 | Hoffberg | Feb 2005 | B1 |
8171545 | Cooley | May 2012 | B1 |
8418249 | Nucci | Apr 2013 | B1 |
9451019 | Herz | Sep 2016 | B2 |
20070028110 | Brennan | Feb 2007 | A1 |
20070028291 | Brennan | Feb 2007 | A1 |
20070028302 | Brennan | Feb 2007 | A1 |
20070028303 | Brennan | Feb 2007 | A1 |
20070028304 | Brennan | Feb 2007 | A1 |
20070240207 | Belakhdar | Oct 2007 | A1 |
20080189786 | Tang | Aug 2008 | A1 |
20080271143 | Stephens | Oct 2008 | A1 |
20120137367 | Dupont | May 2012 | A1 |
20130059607 | Herz | Mar 2013 | A1 |
20130066860 | Kilpinen | Mar 2013 | A1 |
20130067106 | Lakshminarayan | Mar 2013 | A1 |
20130179078 | Griffon | Jul 2013 | A1 |
20130204531 | McManus | Aug 2013 | A1 |
20140047544 | Jakobsson | Feb 2014 | A1 |
20140059456 | van der Elzen | Feb 2014 | A1 |
20140108314 | Chen et al. | Apr 2014 | A1 |
20140268133 | McManus | Sep 2014 | A1 |
20150033305 | Shear | Jan 2015 | A1 |
20150100244 | Hannum | Apr 2015 | A1 |
20150205954 | Jou | Jul 2015 | A1 |
20160014159 | Schrecker | Jan 2016 | A1 |
Number | Date | Country |
---|---|---|
2014152469 | Sep 2014 | WO |
Entry |
---|
Perdisci, R. “McPAD: A Multiple Classifier System for Accurate Payload Detection”, Elsevier Science, Oct. 23, 2008. |
Wang, K. “Anomalous Payload-Based Network Intrusion Detection”, Columbia University, RAID Sep. 2004. |
Number | Date | Country | |
---|---|---|---|
20160226901 A1 | Aug 2016 | US |
Number | Date | Country | |
---|---|---|---|
62110031 | Jan 2015 | US |