This document relates generally to computer network intrusion analysis and more particularly to unsupervised modeling systems and methods for analyzing computer network intrusions.
As the Internet has become more widely used, it has also created new risks for corporations and other types of organizations. Breaches of computer security by hackers and intruders and the potential for compromising sensitive information are a very real and serious threat. The risk is further compounded by the difficulty in determining from the many daily accesses of an organization's network or networks what constitutes legitimate network accesses (e.g., employees accessing their organization's network for work-related purposes, etc.) versus what constitutes malicious network accesses. Examples of malicious network accesses include attempts to access sensitive and confidential information that is contained within an organization's network (e.g., for fraudulent purposes) as well as an attempt to place a virus within the network. Current approaches have difficulty in discerning legitimate network accesses from malicious network accesses.
In accordance with the teachings provided herein, systems and methods for operation upon data processing devices are provided for analyzing activities associated with accesses of a computer network. As an example, a computer-implemented method and system can be configured to receive data related to the activities associated with the accesses of a computer network. The network activities data are segmented into a plurality of network activities segments. The segmented data is used to generate predictive models are for analyzing activities associated with computer networks.
As another example, a computer-implemented method and system can be configured to receive data related to the activities associated with accesses of a computer network. The network activities data are segmented into a plurality of network activities segments. For each of the network activities segments, an anomaly detection predictive model is generated, wherein the model generation includes generating, for a network activity segment, a predictive model that is a model of the segment. The generated predictive models are for use in analyzing the activities associated with the computer network or other computer networks.
Malicious network accessing 42 can take many different forms, such as a single intrusion to access or steal assets 60 that are available via the network 50. As another example, a malicious network accessing 42 can involve low and slow intrusions. These attacks (or portscans), usually performed by skilled intruders, are characterized by their lengthy duration (possibly weeks or months at a time), precision, and methodical execution. Usually these attacks are intended to gather information about points of weakness in the network for future pervasive and possibly more malicious purposes.
The assets 60 may be a primary goal of the malicious network accessing 42. The assets 60 can include confidential or sensitive information as well as access to proprietary applications that are on the network 50.
To assist in detecting malicious network accessing 42, process 70 captures network event information regarding any accesses of the network 50, such as accesses from origins external to the network 50. Process 80 analyzes the network events and provides intrusion detection analysis data 90 for use in sifting through the many network events to locate potential malicious network accessing(s) 42.
With reference to
For each of the network activities segments, a predictive model is generated at process 230 to assist in predicting anomalous network behavior within the model's associated network activity segment. The models generated via process 230 can be unsupervised learning models, such as compression neural networks (e.g., nonlinear replicator neural networks which are generally known in the art and discussed in such references as: S. Hawkins, H. X. He, G. J. Williams, and R. A. Baxter “Outlier detection using replicator neural networks,” Proceedings of the Fifth International Conference and Data Warehousing and Knowledge Discovery 2002; G. J. Williams, R. A. Baxter, H. X. He, S. Hawkins, and L. Gu “A comparative study of RNN for outlier detection in data mining,” Proceedings of the 2002 IEEE International Conference on Data Mining; and O. Abdel-Wahhab and M. Fahmy, “Image compression using multi-layer neural networks,” Proceedings of the 2nd IEEE Symposium on Computers and Communications (ISCC 1997)).
A predictive model produces indicators (e.g., segment scores 240) for its associated segment. The scores 240 are used in analyzing which activities associated with the computer network may constitute anomalous behavior. A segment score is indicative of how anomalous a computer network access may be. As an example, the scoring mechanism can be configured such that the higher the score the more likely a particular network activity is anomalous and possibly constituting a malicious network access. In this way, the scoring mechanism is also an indicator of the degree of uncertainty of how anomalous a computer network access may be.
The scores 240 generated by the models can then be used in many different ways for the detection of network intrusions. As an illustration,
To assist in model improvement, the scores 240 indicate which activities should be scrutinized to determine whether they are malicious activities. This determination of what activities should be acted upon based upon the scores 240 is performed at process 300. After the actions 310 are completed (e.g., manually and/or using additional detection tools to investigate the true nature of an anomalous network event), the outcomes of taking the action are analyzed at process 320. Process 320 generates analysis results 330, such as determining which network events that triggered relatively high anomalous scores actually turned out to be a legitimate network event or malicious network event. The analysis results 330 are funneled back into the model building process so that the models can be retrained and improved via process 340.
After the segments 400 have been generated, model generation operations 410 construct models 420 for each other the segments 400. It should be understood that similar to the other processing flows described herein, the steps and the order of the steps in the flow of this figure may be altered, modified, removed and/or augmented and still achieve the desired outcome. As an illustration, The model generation operations 410 may be performed in parallel or in serial fashion depending upon the desired performance goals.
Each of the models 420 acts as a predictor of anomalous activity for that model's respective segment 400. For example, a first model is generated in order to act as a predictor for a first segment, a second model is generated in order to act as a predictor for a second segment, etc.
The completion of the training of the models 420 results in network activity segment scores 430 being generated. The scores act as an indicator of how anomalous a particular entry is within the segment. For example, the first model generates network activity scores for the first segment's entries that indicate for the entries how anomalous an entry is.
With reference to
To continue this example,
A diverse set of variables can form the basis for segmentation. For example, static or non-time based variables can be used as the basis for forming segments. A static variable can be network access at a single point in time. In addition to or in place of, time-based variables can be used as a basis for forming segments for which predictive models are to be generated. A time-based variable can include stringing together network accesses that occur over a period of time and that originate from the same entity.
The use of time-based derived variables within a network intrusion detection system is illustrated in
As illustrated in
While examples have been used to disclose the invention, including the best mode, and also to enable any person skilled in the art to make and use the invention, the patentable scope of the invention is defined by claims, and may include other examples that occur to those skilled in the art. Accordingly the examples disclosed herein are to be considered non-limiting. As an illustration, the systems and methods disclosed herein may be implemented on various types of computer architectures, such as for example on a networked system, on a single general purpose computer, etc. As an illustration,
The users 1032 can interact with the network intrusion analysis system 1034 through a number of ways, such over one or more networks 1036. A server 1038 accessible through the network(s) 1036 can host the network intrusion analysis system 1034. The same server or different servers can contain various software instructions 1035 (e.g., instructions for segmenting the network activities data, instructions for generating anomaly detection predictive models, etc.) or modules of the network intrusion analysis system 1034. Data store(s) 1040 can store the data to be analyzed as well as any intermediate or final data calculations and data results.
The network intrusion analysis system 1034 can be a web-based analysis and reporting tool that provides users flexibility and functionality for performing network intrusion problem identification. Moreover, the network intrusion analysis system 1034 can be used separately or in conjunction with other software programs, such as other network intrusion detection techniques.
It should be understood that the network intrusion analysis system 1034 can be implemented in other ways, such as on a stand-alone computer for access by a user as shown in
It is further noted that the systems and methods may include data signals conveyed via networks (e.g., local area network, wide area network, internet, combinations thereof, etc.), fiber optic medium, carrier waves, wireless networks, etc. for communication with one or more data processing devices. The data signals can carry any or all of the data disclosed herein that is provided to or from a device.
Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform methods described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
The systems' and methods' data (e.g., associations, mappings, etc.) may be stored and implemented in one or more different types of computer-implemented ways, such as different types of storage devices and programming constructs (e.g., data stores, RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
The systems and methods may be provided on many different types of computer-readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions (e.g., software) for use in execution by a processor to perform the methods' operations and implement the systems described herein.
The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
It should be understood that as used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of “and” and “or” include both the conjunctive and disjunctive and may be used interchangeably unless the context expressly dictates otherwise; the phrase “exclusive or” may be used to indicate situation where only the disjunctive meaning may apply.
This application claims priority to and the benefit of U.S. Application Ser. No. 60/902,378, (entitled “Computer-Implemented Modeling Systems and Methods for analyzing Computer Network Intrusions” and filed on Feb. 20, 2007), of which the entire disclosure (including any and all figures) is incorporated herein by reference. This application contains subject matter that may be considered related to subject matter disclosed in: U.S. Application Ser. No. 60/902,380, (entitled “Computer-Implemented Semi-supervised Learning Systems And Methods” and filed on Feb. 20, 2007); U.S. Application Ser. No. 60/902,379, (entitled “Computer-Implemented Systems and Methods For Action Determination” and filed on Feb. 20, 2007); U.S. Application Ser. No. 60/902,381, (entitled “Computer-Implemented Guided Learning Systems and Methods for Constructing Predictive Models” and filed on Feb. 20, 2007); U.S. Application Ser. No. 60/786,039 (entitled “Computer-Implemented Predictive Model Generation Systems And Methods” and filed on Mar. 24, 2006); U.S. Application Ser. No. 60/786,038 (entitled “Computer-Implemented Data Storage For Predictive Model Systems” and filed on Mar. 24, 2006); and to U.S. Provisional Application Ser. No. 60/786,040 (entitled “Computer-Implemented Predictive Model Scoring Systems And Methods” and filed on Mar. 24, 2006); of which the entire disclosures (including any and all figures) of all of these applications are incorporated herein by reference.
This invention was made with US Naval Research Laboratory support under N00173-06-P-2001 awarded by the US Naval Research Laboratory. The Government has certain rights in the invention.
Number | Name | Date | Kind |
---|---|---|---|
5335291 | Kramer et al. | Aug 1994 | A |
5519319 | Smith et al. | May 1996 | A |
5650722 | Smith et al. | Jul 1997 | A |
5675253 | Smith et al. | Oct 1997 | A |
5677955 | Doggett et al. | Oct 1997 | A |
5761442 | Barr et al. | Jun 1998 | A |
5819226 | Gopinathan et al. | Oct 1998 | A |
5884289 | Anderson et al. | Mar 1999 | A |
6029154 | Pettitt | Feb 2000 | A |
6047268 | Bartoli et al. | Apr 2000 | A |
6064990 | Goldsmith | May 2000 | A |
6122624 | Tetro et al. | Sep 2000 | A |
6125349 | Maher | Sep 2000 | A |
6170744 | Lee et al. | Jan 2001 | B1 |
6330546 | Gopinathan et al. | Dec 2001 | B1 |
6388592 | Natarajan | May 2002 | B1 |
6453206 | Soraghan et al. | Sep 2002 | B1 |
6516056 | Justice et al. | Feb 2003 | B1 |
6549861 | Mark et al. | Apr 2003 | B1 |
6570968 | Marchand et al. | May 2003 | B1 |
6601049 | Cooper | Jul 2003 | B1 |
6631212 | Luo et al. | Oct 2003 | B1 |
6650779 | Vachtesvanos et al. | Nov 2003 | B2 |
6675145 | Yehia et al. | Jan 2004 | B1 |
6678640 | Ishida et al. | Jan 2004 | B2 |
7117191 | Gavan et al. | Oct 2006 | B2 |
7191150 | Shao et al. | Mar 2007 | B1 |
7269516 | Brunner et al. | Sep 2007 | B2 |
7403922 | Lewis et al. | Jul 2008 | B1 |
7455226 | Hammond et al. | Nov 2008 | B1 |
7461048 | Teverovskiy et al. | Dec 2008 | B2 |
7467119 | Saidi et al. | Dec 2008 | B2 |
7480640 | Elad et al. | Jan 2009 | B1 |
7536348 | Shao et al. | May 2009 | B2 |
7562058 | Pinto et al. | Jul 2009 | B2 |
7580798 | Brunner et al. | Aug 2009 | B2 |
7788195 | Subramanian et al. | Aug 2010 | B1 |
20020138417 | Lawrence | Sep 2002 | A1 |
20020194119 | Wright et al. | Dec 2002 | A1 |
20030093366 | Halper et al. | May 2003 | A1 |
20030097330 | Hillmer et al. | May 2003 | A1 |
20030191709 | Elston et al. | Oct 2003 | A1 |
20040039688 | Sulkowski et al. | Feb 2004 | A1 |
20050055373 | Forman | Mar 2005 | A1 |
20050131873 | Fan et al. | Jun 2005 | A1 |
20060020814 | Lieblich et al. | Jan 2006 | A1 |
20060181411 | Fast et al. | Aug 2006 | A1 |
20060218169 | Steinberg et al. | Sep 2006 | A1 |
20070192167 | Lei et al. | Aug 2007 | A1 |
20070239606 | Eisen | Oct 2007 | A1 |
20080134236 | Iijima et al. | Jun 2008 | A1 |
20090192855 | Subramanian et al. | Jul 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
60902378 | Feb 2007 | US |