The disclosure relates in general to a method and a system for establishing application whitelisting.
Recently the topic of network security becoming more and more important. With the increasing amount of distributed applications hosted in the data centers, the need for automatic malware and intrusion detection is growing. Application whitelisting recently has mostly been human-defined, while in distributed applications consisting of thousands of nodes, the important way is to create an automatic system for creating such rules.
A distributed application is software that is executed or run on multiple computers within a network. These distributed applications interact in order to achieve a specific goal or task. Traditional applications relied on a single system to run them. Even in the client-server model, the application software had to run on either the client, or on the server that the client was accessing.
A whitelist is a list of items that are granted access to a certain system or protocol. When a whitelist is used, all entities are denied access, except those included in the whitelist. Traditionally whitelists are defined by the system administrator. While it is working well for the small systems and distributed applications, with the increase of nodes it is much easier to make a mistake or miss one of the rules which will lead to the application malfunctioning.
The disclosure is directed to a method and a system for distributed application whitelisting using topology information.
According to one embodiment, a method for establishing application whitelisting includes: collecting inter-thread traffic logs sent from at least one server, wherein a plurality of distributed applications are hosted in the at least one server; discovering topology information in a green room environment based on the inter-thread traffic logs; creating a set of whitelisting rules based on the topology information; and enforcing the set of whitelisting rules.
According to another embodiment, a system for establishing application whitelisting includes: at least one server, wherein a plurality of distributed applications are hosted in the at least one server; and an analytic engine coupled to the at least one server for collecting inter-thread traffic logs sent from the at least one server. The analytic engine is configured for: discovering topology information in a green room environment based on the inter-thread traffic logs; creating a set of whitelisting rules based on the topology information; and enforcing the set of whitelisting rules.
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. In other instances, well-known structures and devices are schematically shown in order to simplify the drawing.
Technical terms of the disclosure are based on general definition in the technical field of the disclosure. If the disclosure describes or explains one or some terms, definition of the terms is based on the description or explanation of the disclosure. Each of the disclosed embodiments has one or more technical features. In possible implementation, one skilled person in the art would selectively implement part or all technical features of any embodiment of the disclosure or selectively combine part or all technical features of the embodiments of the disclosure.
In embodiments of the application, the method and the system relates to an automatic approach of defining whitelisting rules and threat levels for distributed application system. In embodiments of the application, the method and the system relates for discovering distributed application dependency map. In embodiments of the application, the method and the system relates for converting dependency map into the set of whitelisting rules. In embodiments of the application, the method and the system relates for enforcing whitelisting rules focusing on reducing false-positives.
The analytic engine 110 collects inter-thread traffic logs sent from the servers 120 and 130. The inter-thread traffic logs records thread traffic about execution of the applications 141, 142 and 143.
In one embodiment of the application, the analytic engine 110 analyzes the inter-thread traffic logs to execute three stages process: discovering topology information (topology information being for example but not limited by application dependency mapping (ADM)) in the green room environment based on the inter-thread traffic logs; creating a set of whitelisting rules based on the topology information or the green room ADM; and enforcing the set of the whitelisting rules while minimizing false-positive alarms. Green room environment denotes an isolated and secured working space with access control. The space is clean free from attacks of malware and virus. In the space we are able to collect nominal behaviors of applications to establish ground truth for application whitelisting.
Application dependency mapping (ADM) creates relationships between interdependent applications. ADM identifies: a plurality of devices (for example, the servers 120 and 130) that are communicating with one another; the TCP IP ports these devices use for communication; and the processes that are running on these devices.
One approach in one embodiment of the application looks into the thread-level execution of the connections. The interception at system call enables detection and deployment of changes. Logging the traffic at inter-thread level ensures the generation of accurate application dependencies.
Following explains how to create a set of whitelisting rules by converting the ADM into the set of whitelisting rules in one embodiment of the application. For each record in the application dependency map, one embodiment of the application creates a firewall rule (a set of whitelisting rules) including a plurality of nodes each having attribute including an application name information and a destination port information.
As shown in
About whitelisting rules enforcement, after the original whitelisting rules are modified to match the distributed application in the production environment (in the real operation), the embodiment of the application starts blocking each connection that is not on the white list. When some of the connections are blocked, there could be two cases: the connection is trustworthy but this is not seen during the green room environment observation.
This could be some rare occurring event, e.g. monthly backup. Another case is when the connection is not trustworthy, such cases can occur when malware is present in the system.
In step 510, a full graph matching is performed by comparing the green room ADM with the real operation ADM. In step 515, based on the comparison result, it is determined whether the green room ADM is matched with the real operation ADM or not.
For example, by comparing the green room ADM in
In details, in comparing the green room ADM with the real operation ADM, each node in the ADM is compared. In comparing the green room ADM in
On the contrary, in comparing the green room ADM in
In step 515, when it is determined that the green room ADM is matched with the real operation ADM, the flow determines that the green room and the real operation ADM are equivalent (i.e. no false-positives) in step 520. By so, no false-positive errors and no false-negative errors occur in the embodiment of the application. In the application, a false positive error is an event that the system in one embodiment of the application identifies as an attack when in fact it isn't; and a false negative error is an event that the system in one embodiment of the application identifies as legitimate when it fact it isn't.
In step 515, when it is determined that the green room ADM is not matched with the real operation ADM, the flow goes to step 525. In step 525, a sub-graph matching is performed on the green room ADM and the real operation ADM to find any incomplete edge of the real operation ADM. For example, in step 525, the sub-graph matching is performed on the green room ADM in
In step 530, it is determined whether the green room ADM and the real operation ADM are equivalent by determining whether the incomplete edge is legitimate or not.
As shown in
That is to say, in one embodiment of the application, even though a connection request (for example, from the application app2 to the application app3) is not on the original topology (for example but not limited by, the green room ADM) but the connection is made on the same thread in the application app2 after receiving connection request (for example, from the application app1 to the application app2), it is allowed. Thus, whether the connection request is allowed or not is based on whether the connection is made on the same thread or not.
When it is determined that the green room ADM and the real operation ADM are not equivalent by determining that the incomplete edge is not legitimate in step 530, the flow goes to step 535 to decide that the green room ADM and the real operation ADM are in-equivalent (i.e. the real operation ADM are not legitimate).
On the contrary, when it is determined that the green room ADM and the real operation ADM are equivalent by determining that the incomplete edge is legitimate in step 530, the flow goes to step 540 to perform incomplete edge handling to update the green room ADM based on the legitimate incomplete edge and intelligent distributed applications whitelisting based on the green room ADM.
In step 545, whether it is an attack is determined.
On the contrary, when it is determined that the connection is not an attack in step 545, the flow goes to step 555 to identify the connection is legitimate and the green room ADM is updated.
In one embodiment of the application, it allows some communications outside of whitelist to go through and later confirms their validity by determining whether they are on the same thread, e.g., if seemingly not legitimate communication from the application app1 to the application app2 is followed by a legitimate communication from the application app2 to the application app3.
The purpose of embodiments of the application is to provide an automatic security system that allows certain network connections that are considered legal while others are examined first and depending on the threat level to determine whether the network connections are either blocked, allowed, or whether to trigger the alarm. The main focus of embodiments of the application is to reduce both human interactions with the system as well as false-positive errors.
In brief, in embodiments of the application, a distributed application is software that runs across multiple computers within a network at the same time and can be stored on servers or with cloud computing. A distributed application is first examined in the green room environment to determine the relationship between each node of the applications. The topology and application dependency map (ADM) are formed using gathered information. Using the application dependency map (ADM), a set of whitelisting rules are formed to enforce only valid connections. This information is later used when a distributed application is placed in the real environment. The application dependency map (ADM) is used to identify each node of the distributed application. After each node is identified, a set of rules are whitelisting modified to match the new environment (the real operation). When there is a new connection that is not originally discovered in the green room environment, the application dependency map (ADM) is used to measure its validity. If the new connection is determined as being validity, the new connection is used to update the green room ADM.
The application introduces an automatic system for both whitelisting rules creation and enforcement. The application is to automate not only whitelisting rules creation but also introduce smart whitelisting rules enforcement, where not every single connection outside of whitelist is blocked, but rather examined first and the threat level is identified.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims and their equivalents.