This invention relates generally to security analytics in computer networks, and more specifically to detecting unmanaged and unauthorized assets in a computer network by using a recurrent neural network to identify anomalously-named assets.
Devices unknown to corporate information technology (IT) teams pose security threats. Whether they are legitimate, but unmanaged devices, or unauthorized rogue devices, they represent a security blind spot, as they are potential entry points for malware or adversarial actions. In 2016, the world discovered the Mirai Botnet, which targeted Internet-of-Things (IoT) devices that are generally not managed by businesses. The rapid growth of bring-your-own-device (BYOD) initiatives invites security risks as employees, contractors, and partners bring unvetted devices to corporate networks. Unknown devices are not limited to the physical hardware of users' laptops or employees' smartphones. With compromised accounts, adversaries can create unmonitored virtual machine (VM) at will for malicious purposes and delete the VMs afterwards to hide their tracks. These devices present an attack surface from multiple points. The risks are comprised intellectual property, leaked sensitive data, and a tarnished company reputation.
Current approaches in device management range from deployment of mobile device management (MDM) tools to cloud access security broker (CASB) enforcement. Nonetheless, these solutions are costly and require administration as well as compliance. And they do not address devices brought in by nonemployees or virtual machines created and used in a malicious way. Reducing the security risk from unknown physical or virtual devices is multifaceted. A key first step toward reducing risk from unknown devices is to recognize and identify their presence.
In a large corporate network, managed devices adhere to some official naming conventions. In practice, groups of unmanaged devices may have their own unofficial naming conventions that are unknown to the IT department. Some such groups belong to internal departments outside of formal control policy; some are from legacy systems or domains; some belong to external vendors or partners; and some are communication devices brought in by employees. Outside of these, unmanaged or unauthorized device with arbitrary names without naming peers are the most interesting, as they are anomalous. An example is a freely-named VM created via compromised credentials.
There is demand for a system that can detect anomalously-named devices on a network. Such a system can be part of a comprehensive risk detection system. Known industry solutions rely on policies to manage controlled devices. We are unaware of any prior work that investigates the presence of unknown devices from device name only.
The present disclosure describes a system, method, and computer program for detecting unmanaged and unauthorized assets on an IT network by identifying anomalously-named assets. A recurrent neural network (RNN), such as a long short-term network (LSTM) or bidirectional RNN, is trained to identify patterns in asset names in a network. The RNN learns the character distribution patterns of the names of all observed assets in the training data, effectively capturing the hidden naming structures followed by a majority of assets on the network. The RNN is then used to identify and flag assets with names that deviate from the hidden asset naming structures. Specifically, the RNN is used to measure the reconstruction errors of input asset name strings. Asset names with high reconstruction errors are anomalous since they cannot be explained by learned naming structures. These identified assets make up an initial pool of potentially unmanaged and unauthorized assets.
In certain embodiments, the initial pool is then filtered to remove assets with attributes that mitigate the cybersecurity risk associated with them. For example, the filtering may remove assets who names may deviate from network-wide naming conventions, but are consistent with naming conventions in their peer group.
The assets in the pool that remain after filtering are associated with a higher cybersecurity risk, as these assets are likely unmanaged and unauthorized assets. In certain embodiments, this means that these assets are presented in a user interface for administrative review. In addition or alternatively, in a system that computes a cybersecurity risk score for user sessions, the presence of these assets for the first time in a user session elevates the risk score for the session.
In one embodiment, a method for detecting anomalously-named assets comprises the following:
The present disclosure describes a system, method, and computer program for detecting unmanaged and unauthorized assets on an IT network by identifying anomalously-named assets. Examples of assets are virtual machines and physical devices, such as computers, printers, and smart phones. The method is performed by a computer system (the “system”), such as a computer system that detects cyber threats in a network. The system may be a user behavior analytics (UBA) system or a user-and-entity behavior analytics system (UEBA). An example of a UBA/UEBA cybersecurity monitoring system is described in U.S. Pat. No. 9,798,883 issued on Oct. 24, 2017 and titled “System, Method, and Computer Program for Detecting and Assessing Security Risks in a Network,” the contents of which are incorporated by reference herein.
1. Identify Assets with Anomalous Names
The system applies the set of input vectors to an RNN that has been trained to identify patterns in asset names in the IT network (step 120). As described below, during training the RNN learns the character distribution patterns of the names of all observed assets in the training data, effectively capturing the hidden naming structures followed by a majority of assets on the network.
The RNN comprises an encoder and a decoder. The system uses the RNN encoder to compress the set of input vectors into a single latent vector that is representative of the asset name and that is generated based on patterns in asset names in the IT network learned by the RNN during training (step 130). The system then uses the decoder, the single latent vector, and the set of input vectors to reconstruct the asset name. Specifically, the decoder receives the single latent vector output of the encoder as its initial state (step 140) With the state initialized by the single latent vector, the set of input vectors is then applied to the decoder to reconstruct the asset name (step 150).
In one embodiment, the RNN is a seq2seq long short-term memory network (LSTM), and the asset name is reconstructed one character at a time with teacher forcing method in which the set of input vectors, offset by one time step, is applied to the LSTM decoder. In other words, the LSTM decoder predicts a character at time t given a character at time t−1 and the state of the LSTM decoder at time t. In an alternate embodiment, the RNN is a bidirectional recurrent neural network.
The system receives the reconstructed asset name from the decoder and determines a degree of reconstruction error between the reconstructed asset name and the original asset name (steps 160, 170). The system ascertains whether the reconstruction error is above a threshold (e.g., top 1% largest error) (step 180). If the reconstruction error is above the threshold, the asset is flagged as anomalous (step 185). Otherwise, the system concludes that the asset name is not anomalous (190). In one embodiment, the system computes the categorical cross-entropy loss values of the asset name character sequences, and flags the top r percent (e.g., top 1%) of asset names with the largest loss as the initial candidates of anomalous asset names to review.
2. Filtering Assets Flagged as Anomalous and Associating Elevated Risk with Anomalously-Named Assets that Pass Filtering
As illustrated in
Assets that satisfy the filter criteria are de-flagged as being anomalous (step 230, 240). The system provides an indicator of an elevated cybersecurity risk for anomalously-named assets that pass through the filter criteria, as these assets are likely to be unmanaged and unauthorized assets (step 230, 250). For example, these assets may be presented in a user interface for administrative review. The displayed assets may be ranked based on reconstruction error (i.e., the higher the reconstruction error, the higher the rank). In addition or alternatively, whether an asset has an anomalous name may be used by a UBA/UBEA as input to a risk rule. Anomalously-named assets that pass the filter criteria may trigger a risk rule, resulting in a higher risk score for the applicable user or asset session. For example, the system may add points to a user session risk score if it is the first time the system is seeing an asset in the network and it has an anomalous name.
3. Training the RNN to Identify Patterns in Asset Names
The RNN is trained is to identify patterns in asset names by performing steps 110-170 with respect to a training data set and training the RNN to minimize the reconstruction errors. For example, the training data set may be extracting asset names from a window (e.g., 3 months) of domain controller logs in which user-to-asset authentications are recorded. One portion (e.g., 80%) of the training data set is used for training, and one portion (e.g., 20%) is used for parameter validation. To prepare the input data for RNN training, asset names may be fixed to a length n, where n is the length of the top 99% quantile of all asset names in the environment.
During training, the decoder receives the ground truth at the current time step as the input at the next time step in the teacher forcing method. At the output of the decoder, a densely connected layer is used to predict the sequential characters one by one. The loss function is specified as categorical cross-entropy since the prediction of each character is considered multi-class classification. This results in an RNN that learns the character distribution patterns of the names of all observed assets in the training data, effectively capturing the hidden naming structures followed by a majority of assets on the network.
The methods described with respect to
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosure is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
This application claims the benefit of U.S. Provisional Application No. 62/672,379 filed on May 16, 2018, and titled “Detecting Unmanaged and Unauthorized Devises on the Network with Long Short-Term Memory Network,” the contents of which are incorporated by reference herein as if fully disclosed herein.
Number | Name | Date | Kind |
---|---|---|---|
5941947 | Brown et al. | Aug 1999 | A |
6223985 | DeLude | May 2001 | B1 |
6594481 | Johnson et al. | Jul 2003 | B1 |
7181768 | Ghosh | Feb 2007 | B1 |
7624277 | Simard | Nov 2009 | B1 |
7668776 | Ahles | Feb 2010 | B1 |
8326788 | Allen et al. | Dec 2012 | B2 |
8443443 | Nordstrom et al. | May 2013 | B2 |
8479302 | Lin | Jul 2013 | B1 |
8484230 | Harnett et al. | Jul 2013 | B2 |
8539088 | Zheng | Sep 2013 | B2 |
8583781 | Raleigh | Nov 2013 | B2 |
8606913 | Lin | Dec 2013 | B2 |
8676273 | Fujisake | Mar 2014 | B1 |
8850570 | Ramzan | Sep 2014 | B1 |
8881289 | Basavapatna et al. | Nov 2014 | B2 |
9055093 | Borders | Jun 2015 | B2 |
9081958 | Ramzan et al. | Jul 2015 | B2 |
9185095 | Moritz et al. | Nov 2015 | B1 |
9189623 | Lin et al. | Nov 2015 | B1 |
9202052 | Fang et al. | Dec 2015 | B1 |
9680938 | Gil et al. | Jun 2017 | B1 |
9690938 | Saxe | Jun 2017 | B1 |
9692765 | Choi et al. | Jun 2017 | B2 |
9760240 | Maheshwari et al. | Sep 2017 | B2 |
9779253 | Mahaffey et al. | Oct 2017 | B2 |
9798883 | Gil et al. | Oct 2017 | B1 |
9843596 | Averbuch et al. | Dec 2017 | B1 |
9898604 | Fang et al. | Feb 2018 | B2 |
10063582 | Feng et al. | Aug 2018 | B1 |
10095871 | Gil et al. | Oct 2018 | B2 |
10178108 | Lin et al. | Jan 2019 | B1 |
10354015 | Kalchbrenner | Jul 2019 | B2 |
10397272 | Bruss | Aug 2019 | B1 |
10419470 | Segev et al. | Sep 2019 | B1 |
10445311 | Saurabh et al. | Oct 2019 | B1 |
10467631 | Dhurandhar et al. | Nov 2019 | B2 |
10474828 | Gil et al. | Nov 2019 | B2 |
10496815 | Steiman et al. | Dec 2019 | B1 |
10621343 | Maciejak | Apr 2020 | B1 |
10645109 | Lin et al. | May 2020 | B1 |
10685293 | Heimann | Jun 2020 | B1 |
10803183 | Gil et al. | Oct 2020 | B2 |
10819724 | Amiri | Oct 2020 | B2 |
10841338 | Lin et al. | Nov 2020 | B1 |
10887325 | Lin et al. | Jan 2021 | B1 |
10944777 | Lin et al. | Mar 2021 | B2 |
11017173 | Lu | May 2021 | B1 |
11080483 | Islam | Aug 2021 | B1 |
11080591 | van den Oord | Aug 2021 | B2 |
11140167 | Lin et al. | Oct 2021 | B1 |
20020107926 | Lee | Aug 2002 | A1 |
20030147512 | Abburi | Aug 2003 | A1 |
20040073569 | Knott et al. | Apr 2004 | A1 |
20060090198 | Aaron | Apr 2006 | A1 |
20070156771 | Hurley | Jul 2007 | A1 |
20070282778 | Chan et al. | Dec 2007 | A1 |
20080028467 | Kommareddy et al. | Jan 2008 | A1 |
20080040802 | Pierson et al. | Feb 2008 | A1 |
20080170690 | Tysowski | Jul 2008 | A1 |
20080262990 | Kapoor | Oct 2008 | A1 |
20080301780 | Ellison et al. | Dec 2008 | A1 |
20090144095 | Shahi et al. | Jun 2009 | A1 |
20090171752 | Galvin et al. | Jul 2009 | A1 |
20090293121 | Bigus et al. | Nov 2009 | A1 |
20100125911 | Bhaskaran | May 2010 | A1 |
20100269175 | Stolfo et al. | Oct 2010 | A1 |
20100284282 | Golic | Nov 2010 | A1 |
20110167495 | Antonakakis | Jul 2011 | A1 |
20120278021 | Lin et al. | Nov 2012 | A1 |
20120316835 | Maeda et al. | Dec 2012 | A1 |
20120316981 | Hoover et al. | Dec 2012 | A1 |
20130080631 | Lin | Mar 2013 | A1 |
20130117554 | Ylonen | May 2013 | A1 |
20130197998 | Buhrmann et al. | Aug 2013 | A1 |
20130227643 | Mccoog et al. | Aug 2013 | A1 |
20130305357 | Ayyagari et al. | Nov 2013 | A1 |
20130340028 | Rajagopal et al. | Dec 2013 | A1 |
20140007238 | Magee | Jan 2014 | A1 |
20140090058 | Ward | Mar 2014 | A1 |
20140101759 | Antonakakis | Apr 2014 | A1 |
20140315519 | Nielsen | Oct 2014 | A1 |
20150026027 | Priess et al. | Jan 2015 | A1 |
20150039543 | Athmanathan | Feb 2015 | A1 |
20150046969 | Abuelsaad et al. | Feb 2015 | A1 |
20150121503 | Xiong | Apr 2015 | A1 |
20150205944 | Turgeman | Jul 2015 | A1 |
20150215325 | Ogawa | Jul 2015 | A1 |
20150339477 | Abrams et al. | Nov 2015 | A1 |
20150341379 | Lefebvre et al. | Nov 2015 | A1 |
20150363691 | Gocek | Dec 2015 | A1 |
20160005044 | Moss et al. | Jan 2016 | A1 |
20160021117 | Harmon et al. | Jan 2016 | A1 |
20160063397 | Ylipaavalniemi et al. | Mar 2016 | A1 |
20160292592 | Patthak et al. | Oct 2016 | A1 |
20160306965 | Iyer et al. | Oct 2016 | A1 |
20160364427 | Wedgeworth, III | Dec 2016 | A1 |
20170019506 | Lee et al. | Jan 2017 | A1 |
20170024135 | Christodorescu et al. | Jan 2017 | A1 |
20170127016 | Yu | May 2017 | A1 |
20170155652 | Most et al. | Jun 2017 | A1 |
20170161451 | Weinstein et al. | Jun 2017 | A1 |
20170213025 | Srivastav et al. | Jul 2017 | A1 |
20170236081 | Grady Smith et al. | Aug 2017 | A1 |
20170264679 | Chen et al. | Sep 2017 | A1 |
20170318034 | Holland et al. | Nov 2017 | A1 |
20170323636 | Xiao | Nov 2017 | A1 |
20180004961 | Gil et al. | Jan 2018 | A1 |
20180048530 | Nikitaki et al. | Feb 2018 | A1 |
20180063168 | Sofka | Mar 2018 | A1 |
20180069893 | Amit et al. | Mar 2018 | A1 |
20180075343 | van den Oord | Mar 2018 | A1 |
20180089304 | Vizer et al. | Mar 2018 | A1 |
20180097822 | Huang | Apr 2018 | A1 |
20180144139 | Cheng et al. | May 2018 | A1 |
20180157963 | Salti | Jun 2018 | A1 |
20180165554 | Zhang | Jun 2018 | A1 |
20180190280 | Cui | Jul 2018 | A1 |
20180234443 | Wolkov et al. | Aug 2018 | A1 |
20180248895 | Watson et al. | Aug 2018 | A1 |
20180285340 | Murphy | Oct 2018 | A1 |
20180288063 | Koottayi et al. | Oct 2018 | A1 |
20180288086 | Amiri | Oct 2018 | A1 |
20180307994 | Cheng et al. | Oct 2018 | A1 |
20180322368 | Zhang | Nov 2018 | A1 |
20190014149 | Cleveland | Jan 2019 | A1 |
20190028496 | Fenoglio | Jan 2019 | A1 |
20190034641 | Gil et al. | Jan 2019 | A1 |
20190066185 | More | Feb 2019 | A1 |
20190080225 | Agarwal | Mar 2019 | A1 |
20190089721 | Pereira | Mar 2019 | A1 |
20190103091 | Chen | Apr 2019 | A1 |
20190114419 | Chistyakov | Apr 2019 | A1 |
20190124045 | Zong et al. | Apr 2019 | A1 |
20190132629 | Kendrick | May 2019 | A1 |
20190149565 | Hagi | May 2019 | A1 |
20190171655 | Psota | Jun 2019 | A1 |
20190182280 | La Marca | Jun 2019 | A1 |
20190205750 | Zheng | Jul 2019 | A1 |
20190213247 | Pala et al. | Jul 2019 | A1 |
20190244603 | Angkititrakul | Aug 2019 | A1 |
20190303703 | Kumar | Oct 2019 | A1 |
20190318100 | Bhatia | Oct 2019 | A1 |
20190334784 | Kvernvik et al. | Oct 2019 | A1 |
20190349400 | Bruss | Nov 2019 | A1 |
20190378051 | Widmann | Dec 2019 | A1 |
20200021607 | Muddu et al. | Jan 2020 | A1 |
20200082098 | Gil et al. | Mar 2020 | A1 |
20200228557 | Lin et al. | Jul 2020 | A1 |
20200302118 | Cheng | Sep 2020 | A1 |
20200327886 | Shalaby | Oct 2020 | A1 |
20210089884 | Macready | Mar 2021 | A1 |
20210125050 | Wang | Apr 2021 | A1 |
20210182612 | Zeng | Jun 2021 | A1 |
20210232768 | Ling | Jul 2021 | A1 |
20220006814 | Lin et al. | Jan 2022 | A1 |
Entry |
---|
Adrian Taylor et al., Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks, IEEE (Year: 2016). |
Jonathan Goh et al., Anomaly Detection in Cyber Physical Systems using Recurrent Neural Networks, IEEE (Year: 2017). |
Adrian Taylor, Anomaly-based detection of malicious activity in in-vehicle networks, Ph. D Thesis, University of Ottawa (Year: 2017). |
Alejandro Correa Bahnsen et al., Classifying Phishing URLs Using Recurrent Neural Networks, IEEE (Year: 2017). |
Jihyun Kim et al., Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection, IEEE (Year: 2016). |
Shuhao Wang et al., Session-Based Fraud Detection in Online E-Commerce Transactions Using Recurrent Neural Networks, springer (Year: 2017). |
Ke Zhang et al., Automated IT System Failure Prediction: A Deep Learning Approach, IEEE (Year: 2016). |
Malik, Hassan, et al., “Automatic Training Data Cleaning for Text Classification”, 11th IEEE International Conference an Data Mining Workshops, 2011, pp. 442-449. |
Chen, Jinghui, et al., “Outlier Detection with Autoencoder Ensembles”, Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90-98. |
Ioannidis, Yannis, “The History of Histograms (abridged)”, Proceedings of the 29th VLDB Conference (2003), pp. 1-12. |
DatumBox Blog, “Machine Learning Tutorial: The Naïve Bayes Text Classifier”, DatumBox Machine Learning Blog and Software Development News, Jan. 2014, pp. 1-11. |
Freeman, David, et al., “Who are you? A Statistical Approach to Measuring User Authenticity”, NDSS, Feb. 2016, pp. 1-15. |
Wang, Alex Hai, “Don't Follow Me Spam Detection in Twitter”, International Conference on Security and Cryptography, 2010, pp. 1-10. |
Cooley, R., et al., “Web Mining: Information and Pattern Discovery on the World Wide Web”, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, Nov. 3-8, 1997, pp. 558-567. |
Poh, Norman, et al., “EER of Fixed and Trainable Fusion Classifiers: A Theoretical Study with Application to Biometric Authentication Tasks”, Multiple Classifier Systems, MCS 2005, Lecture Notes in Computer Science, vol. 3541, pp. 1-11. |
Number | Date | Country | |
---|---|---|---|
62672379 | May 2018 | US |