The present invention relates to cyber penetration testing, including “Red Team” testing.
Attacks on computer systems are becoming more frequent and the attackers are becoming more sophisticated. These attackers generally exploit security weaknesses or vulnerabilities in these systems in order to gain access to them. However, access may even be gained because of risky or improper end-user behavior.
Organizations which have or operate computer systems may employ penetration testing (a “pen test”) in order to look for system security weaknesses. These pen tests are authorized simulated system attacks and other evaluations of the system which are conducted to determine the security of the system, including to look for security weaknesses.
Pen testing may take various forms. For example, one type of penetration testing is “red-team” testing. In this testing, a group of white-hat hackers may test an organization's defenses, including to determine vulnerabilities to the organization's system. Of course, penetration testing might be conducted by an individual and may have various levels of complexity.
One commonality to existing penetration testing is that it is generally manually executed. One or more testers manually execute (via their computer(s)) attacks on the target system. Of course, this has a number of drawbacks including that the penetration testing may be slow, may not always be consistently implemented, may not be adequately recorded and the like.
Some attempts have been made to at least partially automate aspects of pen testing. For example, REDSystems using data models to automatically generate exploits (e.g. DeepHack in Def Con 25, Mayhem from DARPA cyber grand challenge) exist, however these systems lack the disclosed functionality. One such model known as DeepHack learns to generate exploits but acquired its training data from variations on tools such as sqlmap. The disclosed invention provides the ability to source training data and labels from human testers on an ongoing basis and use Machine Learning functionality to create dynamic models based on the action of the trainers and trainees during cyber attack training sessions.
Prior art systems incorporating exploit generation only work on program binaries and do not extend to the full scope of an engagement based on a tester's real-time activity.
Other prior art platforms for Red Teaming testers such as Cobalt Strike have reporting features, but the reports lack Machine Learning functionality to classify or cluster commands that a tester has entered during a training session.
Additionally, prior art systems lack the mechanisms to aid the tester in his or her work in actually going through an engagement by suggesting commands to enter during a training session. For example, the product Faraday does not utilize Machine Learning or related functionality for classifications or other aspects of report generation.
Additionally, prior art systems lack the mechanisms to allow classification or labeling of a type (or types) of a tool which a tester is using in his or her work during a penetration testing session. Such classification would allow evaluators to easily see which types of tools are being used by the penetration testers.
Therefore, it would be advantageous if a system and method could be developed to allow such classification or labeling of a type of a tool which a tester is using in his or her work during a penetration testing session.
One aspect of the invention relates to a system incorporating a plurality of methods to collect and use crowd-sourced penetration tester data, i.e. data from one or more hackers that attack an organization's digital infrastructure as an attacker would in order to test the organization's defenses, and tester feedback to train machine learning models which further aid in documenting their training session work by automatically logging, classifying or clustering engagements or parts of engagements and suggest commands or hints for an tester to run during certain types of engagement training exercises, based on what the system has learned from previous tester training activities.
Another aspect of the invention is a system which automatically builds models able to operate autonomously and perform certain penetration testing activities, allowing testers to narrow their focus to efforts on tasks which only humans can perform, thus creating a dynamic and focused system driven training environment.
Another aspect of the invention is systems and methods configured for classifying unknown cybersecurity tools used in penetration testing based upon monitored penetration testing of a penetration tester testing a target computing system using at least one penetration testing tool. The method captures raw log data associated with the penetration testing relative to the target computing system, parsing the raw log data into a graph having nodes, each node corresponding to an actor or a resource in the raw log data, connects the nodes with edges, each of the edges corresponding to an action of the actor or resource in the raw log data, determines features of the nodes and edges from the graph, and classifies the nodes of the graph into one or more of a plurality of testing tool type categories used in the penetration testing based on the determined features of the nodes and edges.
Further objects, features, and advantages of the present invention over the prior art will become apparent from the detailed description of the drawings which follows, when considered with the attached figures.
In the following description, numerous specific details are set forth in order to provide a more thorough description of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known features have not been described in detail so as not to obscure the invention.
One embodiment of the invention is a system which creates an environment for aiding cyber penetration testing (including Red Team) activities and crowd-sourcing of offensive security tradecraft and methods for automating aspects of network security evaluations. In a preferred embodiment, this environment consists of a set of tester virtual machines (VMs) running Kali Linux or similar digital forensics and penetration testing distributions, all connected to one or more physical server(s) which can host and provide the computing power to process large amounts of data and perform machine learning/modeling tasks.
Another embodiment of the invention is a cyber testing system providing each tester virtual machine (VM) with one or more graphical user interfaces (GUI) which provide a one-stop platform for penetration testing activities (e.g. independent entity network security evaluations/assessments). Additionally, besides the Kali Linux command line terminal (and all the pre-loaded offensive security tools in Kali Linux), the testing system provides the tester with a web browser, a specialized task management dashboard for a team leader to assign activities to team members, a detailed session analysis tool for reporting and helping with automatic documentation of an tester's session including classification or clustering of engagements or parts of engagements, a dynamic area for team members to simultaneously collaborate, and an innovative cyber tool to automate the launching of attacks.
As depicted in
The penetration testers target the target system using one or more system-generated tester virtual machines (VMs) 106. These tester VMs 106 may be supported or implemented via one or more servers or the like of the tester system and are preferably instrumented to capture syslog, auditd, terminal commands, and network traffic (pcap) data as the penetration testers work. Regardless of how many instances of tester VMs are running and where they are being used, the raw log data from all of these VMs is captured and stored for processing (as described below) in order to provide the specific training session data needed to train models created by the disclosed system and methods which learn offensive security tradecraft. In one embodiment, the log data is stored in one or more databases 108 or memories associated with at least one processing server 110 of the tester system (which may be the same server(s) which support the tester VMs or may be one or more different servers). The server 110 may be, for example, a supercomputer which provides high performance data processing and includes a machine-learning function. Of course, the one or more servers or other computing devices of the tester system may have various configurations. In general, these devices include at least one processor or controller for executing machine-readable code or “software”, at least one memory for storing the machine-readable code, one or more communication interfaces, and one or more input/output devices.
One aspect of the invention is machine-readable code, such as stored in a memory associated with the testing system server, which is configured to implement the functionality/methods described below.
Aspects of a method in accordance with the invention will be described with reference to
In building the training data set, a tester would work through some tasks, then navigate to a session analysis area of the GUI where he/she would document the work by providing tags or labels on the engagements or parts of engagements. Ideally, for building a starter training set, the tasks a tester performs would be relatively well-defined or structured, and the testers would be experienced and have similar level of proficiency. While the penetration tester system (including the tester VMs) may be configured to capture different types of raw data as described above, in one embodiment, the data may be focused on tester terminal commands, i.e. the commands typed in by a human tester. This data is preferably captured by the tester VMs and then stored in the one or more databases associated with the tester system server.
Sequences of such terminal commands (or variations on terminal commands, to be described later) are extracted from the logs, such as via the tester system server, and are used as representatives of tradecraft. The assumption in using terminal commands is that the sequence of commands that a tester issues during a particular type of engagement should be different (in general) from the sequences of commands typical of a non-tester (such as a non-Red Team user), or from the sequences of commands a tester with a different type of task would issue. Features like the types of programs, the order in which programs are used, the values of their parameters like arguments, flags, paths, etc., capture the tester's activities and are sufficient to characterize a type of engagement (or part of engagement) and differentiate one engagement from another.
While the preferred embodiment of the system and methods focuses on engagements which can be captured almost entirely with terminal commands, an alternative embodiment in the form of a subsystem or system modules may further be integrated into the preferred embodiment to handle other types of attacks, e.g. an application where the tester interacts with it using mouse clicks, rather than typing commands or an application which uses input not entirely captured through terminal commands.
As an example to further explain how the tester system operates, the first step 202 is to collect and label log data (sequences of tester terminal commands) used to train models appropriate for modeling sequences, e.g. Hidden Markov Models (HMMs) or recurrent neural networks (RNN) of the long short term memory (LSTM) variety models. The tester system server creates the models and then stores them. Once trained models are deployed (the models would reside on the tester system server, along with the raw data and the scripts to process that data), testers are aided in their training documentation process as follows: after a tester completes his/her work and navigates to the session analysis tool, the trained models automatically populate tags or labels on the engagements or parts of engagements, having learned from being trained on previous testers' data.
Additionally, the tester system incorporates a feedback system. If the models misclassified or were unable to classify the tester's work, the tester is able to manually change the tags or labels to improve accuracy (they could select from the known list of labels, or an “other” option, and this would update the label field). This feedback incorporated within the system may be used in a future round of re-training models to improve the models or to create new models.
In other embodiments, the tester system may accumulate some large number of engagement sequences which have been reviewed by a tester and classified as “other.” Based on a predetermined time or volume threshold, the system applies sequence clustering on such engagements.
For large enough/significant clusters, the tester system may train new models on the sequences in those clusters, then deploy the models to the training environment, adding to the current ones, within the larger system.
At intervals, the system tester may decide to review, using appropriate distance measures, the distance between elements within clusters, and the distance between clusters themselves to determine its current accuracy. If there is too much variation within one cluster, the tester system enables the tester to make an accuracy determination. If the system indicates two clusters are very similar, the tester is notified it may be more appropriate to combine the clusters.
As depicted in
For example, this would mean the models could generate sequences of tester terminal commands or variations on such commands which is one advantage of focusing on the terminal commands as the type of data the system uses to characterize engagements.
A new or inexperienced tester tasked with a known type of engagement could call upon the disclosed tester system for assistance, asking the model to generate a sequence of terminal commands for a given type of engagement as an example.
Over time with labeled data from other testers, the model used by the system will learn the most probable sequence of commands for a given type of engagement and could display it for the tester who is learning. This approach is an advantage over having to manually browse through many specific examples, and subsequent trainings of the models would allow for changes in the tradecraft.
Further embodiments of the tester system provide the ability to execute a generated sequence of commands automatically incorporating a model to produce a more generalized or templated version of commands, requiring some tester input, such as inputting a target IP address, or flags.
To this end, for suitable engagements, the tester system prompts the tester when needed, but otherwise uses the sequence of commands generated by the model to call modular scripts which can take the tester's specific input, run the program/system call in the command generated by the model, record its output, and use that output as potential input for the next script which can execute the next program in the generated sequence of commands, thereby semi-automating the generation of attacks as shown in
Initial Data Capture and Processing
The raw log data, such as auditd data containing terminal commands, is captured by the tester system and parsed from the raw format to a format which can be used to create tables for further analysis or modeling. In the disclosed invention, sequences of full user commands (including parameters such as flags and arguments) which a tester issues during an engagement are extracted by the system.
On the tester VMs, auditd is configured by the system such that terminal commands and commands from within other applications such as Metasploit are available. This is an important feature of the system, as a full sequence of user commands cannot be obtained if logging is not enabled and such commands are not integrated with the Kali terminal commands.
One example of the system's initial extract-transform-load (ETL) process is as follows:
1. Audit raw data is in key-value pairs written into the auditd log.
2. Data parsed into an interchange.
3. Data posted to where data scientists can query the data and write scripts to further process the data into the format they need and do feature engineering for modeling.
Details of the Model within the System
The system model uses the captured auditd data, which contains the terminal commands.
The system further collects data from testers who have run through some training engagements of a certain type and have labeled their sessions as such and that these labels appear as a field in the data. The system uses the labeled data to train the model.
The summarized system process starting from parsed auditd data to information that can help a tester is as follows:
Obtain processed auditd data from database.
Scripts for post-processing and feature engineering.
Build model
Deploy model
Incorporate trained model into the overall tester platform. Aid less experienced testers via the GUI on the tester VM.
Receive feedback and provide more model supervision for improving the model.
In accordance with other aspects of the invention, embodiments of systems and methods of the invention transform lines of raw audit records into graphs (having vertices and edges). This representation of the data then allows querying and traversing of the graphs to compute features which can then be used in a model to classify tools that the testers (i.e. pen testers) are using. The predictive model uses the data to compute new features which the predictive model uses to classify/label the type of tool(s) a pen tester is using during an engagement. For example, the predictive model could classify the tool(s) into categories/labels such as: information gathering, sniffing and spoofing, vulnerability analysis, password cracking, etc. as further explained below.
The penetration testers target the target system 102 using one or more system-generated tester virtual machines (VMs) 106. These tester VMs 106 may be supported or implemented via one or more servers or the like of the tester system and are preferably instrumented to capture syslog, audit records, terminal commands, and network traffic (pcap) data as the penetration testers work. Regardless of how many instances of tester VMs are running and where they are being used, the raw log data from all of these VMs is captured and stored for processing (as described below) in order to provide the specific training session data needed to classify the type of tools used by a tester in the disclosed system and methods. In one embodiment, the log data is stored in one or more databases 108 or memories associated with at least one processing server 110 of the tester system (which may be the same server(s) which supports the tester VMs or may be one or more different servers). The server may be, for example, a supercomputer which provides high performance data processing and includes a machine-learning function. Of course, the one or more servers or other computing devices of the tester system may have various configurations. In general, these devices include at least one processor or controller for executing machine-readable code or “software”, at least one memory for storing the machine-readable code, one or more communication interfaces, and one or more input/output devices.
One aspect of the invention is machine-readable code, such as stored in a memory associated with the testing system server, which is configured to implement the functionality/methods described below.
While the penetration tester system (including the tester VMs) may be configured to capture different types of raw data (log data) as described above, in one embodiment, the data may be focused on tester terminal commands, i.e. the commands typed in by a human tester. This data is preferably captured by the tester VMs 106 and then stored in the one or more databases associated with the tester system server 108.
While the preferred embodiments of the system and methods focuses on engagements which can be captured almost entirely with terminal commands, an alternative embodiment in the form of a subsystem or system modules may further be integrated into the preferred embodiment to handle other types of attacks, e.g. an application where the tester interacts with it using mouse clicks, rather than typing commands or an application which uses input not entirely captured through terminal commands.
The audit records, such as auditd containing terminal commands, is captured by the tester system. The raw data is merged according to its type and the audit bundle in which it arrives. The audit records capture operating system calls in key-value format. Records generated by the same audit event are bundled together; membership to the same audit event is indicated by sharing a time stamp and audit ID. Then, relationships between events that precede and succeed the event in question are created.
For example, the following are three audit records that comprise a single audit event, and become merged together. Each audit record consists of several fields separated by a comma and represented as key value pairs. All audit records start with the type field, which determines the other fields the record contains. Audit records also contain a msg field, which has a timestamp and audit ID. Having the same timestamp and audit ID indicates the audit records are from the same system event, and thus these will be merged together.
Embodiments of the systems and methods of the invention then use a script which can be run by a processor in computer 110, which script is configured to parse the merged audit records and transform the parsed data into a graph data model, which can be stored into a graph database. Transformation to a graph model consists first of identifying the actors, actions, and resources in these merged audit records; and secondly of associating properties to these actors, actions, and resources. Actors take actions that cause events to happen, and actors may utilize resources. In a graph data model, actors and resources are nodes; actions are edges between these nodes. Actions connect an actor to another actor or resource (but never one resource to another). Additionally, these nodes and edges have properties associated with them. Since the audit records are deterministically emitted by auditd according to the system call that generated them, we can create another deterministic methodology for converting audit records into the actors, actions, and resources of interest. This deterministic methodology is informed by the domain and problem at hand; all or less of the audit record fields may be included in the transformation to satisfy the processing speed and space constraints of the system. The methodology must be defined for each audit record type that is of interest.
The following is an example of four audit records, merged together as described above:
We can identify three actors in this example: a command and executable indicated by the comm and exe fields on the SYSCALL record; the process invoked by this command, which is indicated by the pid field on the SYSCALL record; and the parent process of this command, indicated by the ppid field on the SYSCALL record. We can identify two resources in this example: a socket, indicated by the saddr field on the SOCKADDR record; and a working directory indicated by the cwd field on the CWD record.
The actions connecting these actors and resources yield the following edges between nodes:
The saddr field of the SOCKETADDR audit record defines the address of the socket resource; a different saddr would indicate a separate resource. Thus, saddr is intrinsically part of a socket resource and becomes a property of the socket resource node. The same holds for the cwd field of the CWD audit record: it defines the resource and thus becomes a property of the working directory resource node. Likewise, the comm and exe fields are properties of the command actor node; the pid field is a property of the process actor node; the ppid field is a property of the parent process actor node.
The exit and success fields pertain to a single invocation and thus are properties of the action edge connecting the command actor and the process actor.
As more audit records are processed, actors, resources and actions are added to the graph. Actors and resources will occur in multiple audit record events and thus appear in multiple merged audit records. An actor or resource that appears in multiple audit record events is represented by a single node in the graph.
In the case of a system that supports testers on multiple computers, actors and resources from different computers are never the same. That is, a working directory resource of “/var/spool/cron” from machine A is a different resource node from an audit event with that same working directory but generated by machine B. Thus, the host computer is a defining property of any resource or actor in a collection system with multiple computers.
An actor exists within a temporal context. Operating systems define processes by their process IDs, yet these process IDs are reused over time. As new processes are created on the computer, they are assigned sequential, increasing process IDs. When the processlD reaches a limit defined by the computer, the assigned process IDs wraps around to 1. Further, the process audit records include the ses field, which defines the session from which the process was invoked. These behaviors of the computer lead to the following situations in which a single process ID refers to a different actor:
The context of Command actors includes their associated process actor. An audit event with the same command but a different process actor refers to a different command actor.
In a simplified case, two resources are the same if they have the same properties and are from the same computer that generated audit event.
Temporal context can be added to resources as well. For example, we may wish to model that socket resources can change over time. We should then define the parameters in which a socket resource is considered consistent—e.g., we expect any socket address observed within the same day to refer to the same resource. Under this definition, then a socket resource is a new node in the graph is its audit record timestamp is more than 24 hours away from a socket resource node with the same address and host.
When merged audit records refer to actors and resources already in the graph, new edges containing the properties of the associated actions are created between the existing nodes. The transformation process to generate these edges is the same as if these were never-before-seen actors and resources.
Properties may be added to actor nodes as more audit records are processed. This may be because more event information is available later, e.g., an audit record for a process ending would at a termination timestamp property to the process actor.
The merged audit records may not be processed in the order in which they were generated by the operating system if the merged audit records are processed in a distributed or multithreaded environment.
The command actor nodes are classified into a category of penetration testing tools. The tool type category could be one or more of the following:
The data represented in the graph model is transformed into a feature vector to be used as input to the predictive model that classifies the penetration testing tool. The features generated may change in order to improve model performance. Features that are not useful in one setting may no longer be calculated. If more data is able to be collected, then new features may be based on that new data. The feature vector contains information from the following feature family categories:
Examples of the features are from the above feature families:
These are from the families above and are described/calculated from the nmap command in the included examples.
1. Properties of the command node:
2. Properties of edges of the command node:
3. Properties of adjacent nodes:
4. Properties of reachable nodes:
5. Properties of prior commands from operator:
6. Properties of future commands:
7. Properties of this session:
The script further identifies actions connecting the nodes to yield the edges: edge [e:232] between nodes [n:115] and [n:232], edge [e:235] between nodes [n:232] and [n:235], and edge [e:242] between nodes [n:115] and [n:242]. Then the script identifies properties that it associates with each edge and node as follows, which may be included in graph 602, although not shown in
Additional socket nodes [n:246] and [n:250] (although not shown in
In step 706, features of the nodes are determined from the graph. In step 708 pairs of the nodes of the graph are classified into one or more of a plurality of testing tool type categories used in the penetration testing based on the determined features of the nodes.
The systems and methods of the embodiments of the invention provide automatic classification of the unknown type of tool used by a penetration tester. This is especially useful when the penetration tester is using a non-standard or custom penetration tool, because the system can still classify even such a non-standard penetration tool.
It will be understood that the above described arrangements systems and methods are merely illustrative of applications of the principles of this invention and many other embodiments and modifications may be made without departing from the spirit and scope of the invention as defined in the claims.
This application is a non-provisional of and claims priority to U.S. Provisional Application Ser. No. 62/574,637, filed Oct. 19, 2017. Said prior application is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62574637 | Oct 2017 | US |