This invention relates generally to user behavior analytics systems that detect cyber threats and more specifically to determining the risk associated with a user's first-time access of an entity in a computer network.
From theft of corporate account information to targeted campaigns against governments, major network and data breaches have become larger in scale and more serious in recent years. The need to identify network security threats has never been greater. Traditionally enterprises rely on matching data against known blacklist signatures or take approaches based on hand-written rules. Such methods do not have enough coverage and are ineffective against signature-less attacks from either an ill-intentioned insider whose traffic is legitimate or an external adversary who conducts lateral movement activity.
User Behavior Analytics (UBA) has emerged in the cybersecurity industry as a viable approach to detect anomalies of user behaviors by machine learning and statistical analysis. Commercial security monitoring systems for UBA maintain a database of individual behavior profiles for users and groups. Statistical indicators are designed to detect anomalies against the profiles for alerting. Scores from triggered indicators in a user session are then fused for a final score for prioritization. An example of a UBA cybersecurity monitoring system is described in U.S. Pat. No. 9,798,883 issued on Oct. 24, 2017 and titled “System, Method, and Computer Program for Detecting and Assessing Security Risks in a Network,” the contents of which are incorporated by reference herein.
A critically important criterion for a successful commercial UBA system is the ability to explain its alerts. For this reason, state-of-the-art commercial UBA systems widely use a class of detection indicators on whether a user accessed a network entity for the first time against the built user profile; for example, whether a user accesses an entity such as a server, a network zone, a cloud application, or an endpoint process for the first time with respect to her history. Similarly, corporate insider threat detection systems employ numerous features focusing on user or entity access behaviors that are new or for the first time. Alerts from these indicators correlate well with malicious insider or compromised account activities like lateral movement. However, since user behavior is highly dynamic on the network, some legitimate user activities will trigger such alerts which are therefore false positives. Therefore, there is demand for a cyber security monitoring system that is able to ascertain the risk associated with a first-time access event and reduce false positive alerts without compromising cyber security.
The present disclosure describes a system, method, and computer program for determining the risk associated with a first-time, user-to-entity access event in a computer network. A “first-time, user-to-entity access event” or a “first-time access event” is the first time a user is observed to access an entity, such as a server, network zone, cloud application, or endpoint process, in a computer network. The method is performed by a computer system, such as a UBA system or a user-and-entity analytics system (UEBA system) that detects cyber threats in a network and performs risk assessments of user network activity. For purposes of this disclosure, the term “UEBA system” may be either a cybersecurity UEBA system or a UBA system.
In response to receiving an alert that a user has accessed a network entity (e.g., a host) for the first time, the UEBA system uses a factorization machine to calculate an affinity measure between the accessing user and the accessed entity. The affinity measure is based on the user's historical access patterns in the network, as wells as the user's context data and entity context data.
In certain embodiments, the affinity measure is used to filter first-time access alerts. In such case, the UEBA system compares the calculated user-to-entity affinity measure to an affinity threshold. The threshold is based on an ROC curve constructed from affinity score distributions from legitimate first-time access events and affinity score distributions from randomly sampled, never-observed user-entity pairs.
If the user-to-entity affinity measure does not satisfy the affinity threshold, it means that user's access of the entity was unexpected (i.e., not predicted) and, thus, a potential cyber threat. In such case, the UEBA system uses the first-time access event to increase a cybersecurity risk assessment associated with the user's network activity. For example, the system may add points to a risk score associated with the user's current logon session. If the user-to-entity affinity measure satisfies the affinity threshold, it means that the user's access of the entity was predictable. In such case, the first-time access alert is treated as a false positive (with respect to being malicious activity), and the UEBA system disregards first-time access event in performing a risk assessment of the user's network activity. For example, the system does not add points to a risk score associated with the user's logon session due to the first-time access event.
In certain embodiments, the user-to-entity affinity measure is used to weight risk score points associated with a first-time access event. For example, the risk score points associated with a first-time access event may be weighted by the affinity score in calculating the risk score associated with a period of user network activity.
The present disclosure describes a system, method, and computer program for determining the risk associated with a first-time, user-to-entity access event in a computer network. The method is executed by a UEBA system that detects cyber threats in a network and performs risk assessments of user network activity. An example of performing a risk assessment is assigning a risk score to a period of user network activity, such as a user logon session.
As described in more detail below, in response to receiving a first-time access alert, the UEBA system uses a factorization machine to determine the affinity between a user and an accessed entity, and thus determine the risk associated with the access event. The affinity is represented by a numerical user-to-entity affinity measure score (“the affinity measure”), which is based on the accessing user's historical access patterns in the network, as wells as the accessing user's context data and accessed entity's context data (i.e., the input to the factorization machine is the user's historical access patterns, the user's context data, and the accessed entity's context data). The affinity measure effectively represents the degree to which the access event was expected or predicted. The stronger the affinity between the user and an entity, the less risk associated with the event. The affinity measures for first-time access events may be used to filter first-time access alerts or weight first-time access alerts in performing risk assessments of user network activity (e.g., calculating a risk score for a user logon session). The result is that many false-positive first-time access alerts are suppressed and not factored (or not factored heavily) into the risk assessments.
User and entity context data are attributes of the user and entity, respectfully. Example of user context data includes the user's peer group, the user's network zone, the user's time zone, and other static user or peer data available in a directory service (e.g., Active Directory) or elsewhere. Example of entity context data includes peer group associated with the entity and labels associated with the entity, such as “critical” or “non-critical.”
In one embodiment, the affinity measure values are normalized. For example, the affinity measure may be a numerical value between 0-2, wherein 0 represents an illegitimate access and 2 represents a strong affinity (i.e., frequent access in the past).
After calculating the user-to-entity affinity measure, the UEBA system compares the calculated user-to-entity affinity measure to an affinity threshold (step 130). As described in more detail below with respect to
If the user-to-entity measure does not satisfy the affinity threshold, it means that user's access of the entity was unexpected (i.e., not predicted) and, thus, a potential cyber threat. In such case, the UEBA system uses the first-access event to elevate a cybersecurity risk assessment of the user's network activity (step 140). For example, the first-time access event may have triggered a rule associated with points for a cybersecurity risk score for the user's current logon session, and such risk points are included in the user's risk score.
If the user-to-entity affinity measure satisfies the affinity threshold, it means that the user's access of the entity was predictable. In such case, the first-time access alert is treated as a false positive (with respect to being malicious activity), and the UEBA system disregards the first-time access event in performing a cybersecurity risk assessment of the user's network activity (i.e., it does not elevate the risk assessment in response to the first-time access event) (step 150). For example, in calculating a cybersecurity risk score for a user's current logon session, any risk points associated with a rule triggered by the first-time access event are not factored into the risk score.
In steps 130-150, the user-to-entity affinity measure is used to filter first-time access event alerts in performing cybersecurity risk assessments. However, in an alternate embodiment, the user-to-entity affinity measure is used to weight risk score points associated with a first-time access event. For example, risk score points associated with a rule triggered by a first-time access event may be weighted based on the corresponding user-to-entity affinity measure.
Every observed access event in the training data is assumed to be a legitimate access event. The challenge is finding illegitimate access events to use for training the factorization machine as these events are not observed in training. This is different than training factorization machines used for retail recommendations in which the values predicted (e.g., 1-5 stars for a movie) are values that are observed in the training data.
To address the issue of lack of observed illegitimate access in the training data, a random sampling of never observed user-entity accesses are treated as illegitimate accesses for training purposes. For example, the entities accessed by two different groups (e.g., engineering and HR) in an organization during a period of time are identified. Any entities accessed by at least one user in one group and never accessed by any user in the other group are also identified. This is done for both groups. Users in the first group (e.g., the engineering group) are randomly paired with entities accessed by the second group (e.g., HR) and never the first group. This is done for both groups. For each such random, never-observed user-entity pairing, the system creates an input feature vector that includes the applicable user's access pattern data, the user's context data, and the applicable entity's context data (step 230).
For each input feature vector x, the system assigns a target value y that indicates frequency of access for the user and entity in the vector, wherein feature vectors for never-observed user-entity accesses are assigned a target value that represents an illegitimate access (step 240). In one embodiment, y is a value between 0-2, where 0 represents no previous access and 2 represents “frequent” accesses in a period of time. For example, in such embodiment, user-entity pairs with two or more access in a period of time (e.g., two or more times within thirty days) are associated with a y value of 2, and user-entity pairs with one access in the period of time are associated with a y value of 1. The never-observed user/entity pairs are associated with a y value of zero.
The system trains the factorization machine by applying factorization machine algorithm to the input feature vectors and corresponding target values (step 250). In one embodiment, such algorithm is represented by the following equation:
where:
In one embodiment, to optimize hyper-parameters of the factorization machine, such as k in the list above, the training data is randomly split in two parts: 80% for the training dataset and 20% for the validation dataset. The training data set is used to train the factorization machine in accordance with the above-described training method, and the validation dataset is used to find the optimal hypermeters, such as k, for the factorization machine.
The methods described with respect to
The UEBA system 400 is in communication with computer system 440, which trains the Factorization Machine 420 and calculates the affinity threshold.
Those skilled in the art will appreciate that a UEBA system has other modules not shown in
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosure is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
This application claims the benefit of U.S. Provisional Application No. 62/458,496 filed on Feb. 13, 2017, and titled “Computing User-to-Entity Affinity for User and Entity Behavior Analytics Applications,” the contents of which are incorporated by reference herein as if fully disclosed herein.
Number | Name | Date | Kind |
---|---|---|---|
5941947 | Brown et al. | Aug 1999 | A |
6223985 | DeLude | May 2001 | B1 |
6594481 | Johnson et al. | Jul 2003 | B1 |
7668776 | Ahles | Feb 2010 | B1 |
8326788 | Allen et al. | Dec 2012 | B2 |
8443443 | Nordstrom et al. | May 2013 | B2 |
8479302 | Lin | Jul 2013 | B1 |
8539088 | Zheng | Sep 2013 | B2 |
8583781 | Raleigh | Nov 2013 | B2 |
8606913 | Lin | Dec 2013 | B2 |
8676273 | Fujisaki | Mar 2014 | B1 |
8881289 | Basavapatna et al. | Nov 2014 | B2 |
9055093 | Borders | Jun 2015 | B2 |
9081958 | Ramzan et al. | Jul 2015 | B2 |
9189623 | Lin et al. | Nov 2015 | B1 |
9680938 | Gil et al. | Jun 2017 | B1 |
9692765 | Choi et al. | Jun 2017 | B2 |
9760240 | Maheshwari et al. | Sep 2017 | B2 |
9779253 | Mahaffey et al. | Oct 2017 | B2 |
9798883 | Gil | Oct 2017 | B1 |
9843596 | Averbuch et al. | Dec 2017 | B1 |
9898604 | Fang et al. | Feb 2018 | B2 |
10095871 | Gil et al. | Oct 2018 | B2 |
10178108 | Lin et al. | Jan 2019 | B1 |
10419470 | Segev et al. | Sep 2019 | B1 |
10467631 | Dhurandhar et al. | Nov 2019 | B2 |
10474828 | Gil et al. | Nov 2019 | B2 |
10496815 | Steiman | Dec 2019 | B1 |
20020107926 | Lee | Aug 2002 | A1 |
20030147512 | Abburi | Aug 2003 | A1 |
20040073569 | Knott et al. | Apr 2004 | A1 |
20060090198 | Aaron | Apr 2006 | A1 |
20070156771 | Hurley et al. | Jul 2007 | A1 |
20070282778 | Chan et al. | Dec 2007 | A1 |
20080040802 | Pierson et al. | Feb 2008 | A1 |
20080170690 | Tysowski | Jul 2008 | A1 |
20080301780 | Ellison et al. | Dec 2008 | A1 |
20090144095 | Shahi et al. | Jun 2009 | A1 |
20090171752 | Galvin | Jul 2009 | A1 |
20090293121 | Bigus et al. | Nov 2009 | A1 |
20100125911 | Bhaskaran | May 2010 | A1 |
20100269175 | Stolfo et al. | Oct 2010 | A1 |
20120278021 | Lin et al. | Nov 2012 | A1 |
20120316835 | Maeda et al. | Dec 2012 | A1 |
20120316981 | Hoover et al. | Dec 2012 | A1 |
20130080631 | Lin | Mar 2013 | A1 |
20130117554 | Ylonen | May 2013 | A1 |
20130197998 | Buhrmann | Aug 2013 | A1 |
20130227643 | Mccoog | Aug 2013 | A1 |
20130305357 | Ayyagari et al. | Nov 2013 | A1 |
20130340028 | Rajagopal et al. | Dec 2013 | A1 |
20140315519 | Nielsen | Oct 2014 | A1 |
20150046969 | Abuelsaad et al. | Feb 2015 | A1 |
20150121503 | Xiong | Apr 2015 | A1 |
20150205944 | Turgeman | Jul 2015 | A1 |
20150339477 | Abrams | Nov 2015 | A1 |
20150341379 | Lefebvre et al. | Nov 2015 | A1 |
20160005044 | Moss | Jan 2016 | A1 |
20160021117 | Harmon et al. | Jan 2016 | A1 |
20160306965 | Iyer et al. | Oct 2016 | A1 |
20160364427 | Wedgeworth, III | Dec 2016 | A1 |
20170019506 | Lee | Jan 2017 | A1 |
20170024135 | Christodorescu | Jan 2017 | A1 |
20170155652 | Most | Jun 2017 | A1 |
20170161451 | Weinstein et al. | Jun 2017 | A1 |
20170213025 | Srivastav et al. | Jul 2017 | A1 |
20170236081 | Grady Smith et al. | Aug 2017 | A1 |
20170318034 | Holland et al. | Nov 2017 | A1 |
20180004961 | Gil et al. | Jan 2018 | A1 |
20180048530 | Nikitaki et al. | Feb 2018 | A1 |
20180069893 | Amit | Mar 2018 | A1 |
20180144139 | Cheng | May 2018 | A1 |
20180165554 | Zhang et al. | Jun 2018 | A1 |
20190034641 | Gil et al. | Jan 2019 | A1 |
20190334784 | Kvernvik et al. | Oct 2019 | A1 |
20200021607 | Muddu et al. | Jan 2020 | A1 |
20200082098 | Gil et al. | Mar 2020 | A1 |
Entry |
---|
Cooley et al., “Web mining: information and pattern discovery on the World Wide Web”, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, Date of Conference: Nov. 3-8, 1997. |
Ioannidis, Yannis, “The History of Histograms (abridged)”, Proceedings of the 29th VLDB Conference (2003), pp. 1-12. |
Chen, Jinghui, et al., “Outlier Detection with Autoencoder Ensembles”, Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90-98. |
DatumBox Blog, “Machine Learning Tutorial: The Naïve Bayes Text Classifier”, DatumBox Machine Learning Blog and Software Development News, Jan. 2014, pp. 1-11. |
Freeman, David, et al., “Who are you? A Statistical Approach to Measuring User Authenticity”, NDSS, Feb. 2016, pp. 1-15. |
Malik, Hassan, et al., “Automatic Training Data Cleaning for Text Classification”, 11th IEEE International Conference on Data Mining Workshops, 2011, pp. 442-449. |
Wang, Alex Hai, “Don't Follow Me Spam Detection in Twitter”, International Conference on Security and Cryptography, 2010, pp. 1-10. |
Number | Date | Country | |
---|---|---|---|
62458496 | Feb 2017 | US |