This application claims the benefit of Korean Patent Application No. 10-2009-0126884, filed with the Korean Intellectual Property Office on Dec. 18, 2009, and Korean Patent Application No. 10-2009-0126905, filed with the Korean Intellectual Property Office on Dec. 18, 2009, the disclosure of which is incorporated herein by reference in its entirety.
1. Technical Field
The present invention relates to a system and method for modeling activity patterns of network traffic to detect botnets, more particularly to a method and system that can classify the communication activities for each client to model network activity by differentiating the protocols of the collected network traffic based on destination and patterning the subgroups for the respective protocols.
2. Description of the Related Art
A bot, which is short for robot, refers to a personal computer (PC) that is infected by malicious software. A botnet refers to a form of network in which many such computers infected by bots are connected together. A botnet may be remotely manipulated by a bot master to be used in various malicious activity such as DDoS attacks, theft of personal information, phishing, distributing malicious code, dispatching spam mail, etc. A botnet can be classified according to the protocol used by the botnet.
Attacks incurred through botnets are continuously increasing, and the methods employed for such attacks are increasing in variety. Instead of triggering errors in an Internet service through a DDoS attack, some bots may trigger errors in a personal system or may illegally acquire personal information. There is no lack of examples in which the illegal acquirement of user information, such as ID's and passwords, banking information, etc., was used in cybercrimes. Moreover, whereas a hacking attack of the past may have been for a hacker to show off one's capabilities or to compete with other hackers in a community, a hacking attack using a botnet may be used repeatedly by a group of hackers in a cooperative manner for monetary gains.
However, as botnets employ cutting edge technology, such as regular updates, runtime packer technology, self-modifying codes, command channel encryption, etc., it is becoming more difficult to detect and avoid botnets. What makes the problem more serious is that the source codes for botnets are open to the public, so that thousands of variations have been created, and the code for a botnet can easily be generated or controlled through of a user interface, so that people who do not have professional knowledge or technical expertise may make and misuse botnets. Bot zombies which compose a botnet may be distributed across networks of Internet service providers all over the world, and even the bot C&C (command and control server) that controls the bot zombies can be relocated to different networks.
As such, there are currently many research efforts that focus on the serious problems caused by botnets. However, it is difficult to identify the overall composition and distribution of a botnet simply by detecting the botnet as found in the network of a particular Internet service provider, and considering the great number of variations, etc., there is a need for a method for detecting a botnet more easily.
An aspect of the invention is to provide a system and a method for modeling activity patterns of network traffic that can effectively detect a botnet.
To achieve the objective above, an aspect of the invention provides a system for modeling activity patterns of network traffic to detect botnets that includes: a botnet traffic collector sensor configured to collect traffic within a network and classify the traffic according to destination; and a botnet detector system configured to detect a botnet based on botnet traffic collected by the botnet traffic collector sensor. The botnet detector system can arrange the traffic classified according to destination into groups for different time periods and can detect a botnet group having a particular access pattern exceeding a threshold number. The botnet traffic collector sensor can include: a traffic information collector module configured to collect traffic by capturing packets of a monitored network according to a collecting policy using a packet capturing tool; a traffic information manager module configured to classify information received from the traffic information collector module, receive and parse traffic information, process group data, and store/manage the traffic information in a database; a traffic information transmitter module configured to differentiate the traffic information parsed at the traffic information manager module into a transmission header and transmission data, package the data, and transmit the data by way of a transmission channel; and a sensor policy manager module configured to transmit settings/status information of a classification tool, a traffic information manager tool, and data transmission cycle information to the traffic information collector module, the traffic information manager module, and the traffic information transmitter module. The traffic information manager module can classify patterns of the collected network traffic into transmission control protocols (TCP) and user datagram protocols (UDP). The traffic information manager module can classify the transmission control protocols (TCP) into hypertext transport protocols (HTTP), simple mail transfer protocols (SMTP), and other transmission control protocols besides the hypertext transport protocols and the simple mail transfer protocols, and can classify the hypertext transport protocols into “requests” for pages and “responses” from servers to user requests. For a simple mail transfer protocol (SMTP), the simple mail transfer protocol communication itself can be used as the pattern data, and for a user data protocol (UDP), the user datagram protocol communication itself can be determined as the pattern data. The “request” can be classified into a host portion, which is the domain of the target of the request for a web server resource, a page portion, which includes information on a particular page desired by the host, and a referrer portion, which includes information on steps preceding a website currently accessed. The traffic information manager module can classify the user datagram protocols (UDP) into a domain name server (DNS) and other user datagram protocols besides the domain name server.
Another aspect of the invention provides a method for modeling activity patterns of network traffic to detect botnets that includes: collecting traffic; classifying protocols of the collected traffic; and modeling activities for the classified traffic. The operation of classifying the collected traffic can include: arranging the collected traffic into client sets according to destination; and extracting feature elements of the traffic arranged into client sets according to destination. The operation of arranging the collected traffic into client sets according to destination can include: storing access records of the collected traffic; and arranging the collected traffic into client sets according to destination.
Yet another aspect of the invention provides a method for modeling activity patterns of network traffic to detect botnets that includes: collecting traffic; generating group information for the collected traffic; and determining a botnet group based on the group information, where the group information includes group data and a group matrix, the group data including information on a plurality of sources for a single destination, and the group matrix including stored data obtained after analyzing an IP count according to an access activity pattern occurring in the group data. Here, the operation of generating the group information for the collected traffic can include: classifying the collected traffic according to protocol. The operation of classifying the collected traffic according to protocol can include: arranging the collected traffic into client sets according to destination. The operation of determining the botnet group based on the group information can include: managing group matrices; and, if a particular access pattern exceeds a threshold number for each of the group matrices, selecting the corresponding group as an analysis target group. The operation of managing the group matrices can include: generating a group matrix if the group matrix does not exist; updating the group matrix if the group matrix does exist; and deleting the group matrix if the group matrix has not been updated for a particular duration or by a particular proportion. The method can further include an operation of analyzing client similarity with respect to a particular access pattern for the group matrices selected as analysis targets. The operation of analyzing client similarity can include, if the client similarity with respect to a particular access pattern for the group matrices is greater than a particular value for the group matrices of which the similarity is compared, among the group matrices selected as analysis targets, then determining that the group matrices of which the similarity is compared belong to a same botnet group.
Additional aspects and advantages of the present invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
A detailed description of certain embodiments of the invention will be provided below with reference to the appended drawings. However, the invention is not limited to the embodiments disclosed below and can be implemented in various forms, as the embodiments are intended simply for complete disclosure of the invention and for complete understanding of the invention by those of ordinary skill in the art. In the appended drawings, like numerals refer to like components.
As illustrated in
As illustrated in
The traffic information collector module, as illustrated in
The traffic information manager module, as illustrated in
Table 1 illustrates network traffic pattern data for a system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention. Also,
Referring to Table 1, an embodiment of the invention may classify network traffic patterns mainly into transmission control protocols (hereinafter abbreviated as “TCP”), by which a transmitting side and a receiving side can communicate with each other, and user datagram protocols (hereinafter abbreviated as “UDP”), by which data is transferred in one direction when information is exchanged. Also, referring to Table 1 and
Table 2 illustrates a basis for access pattern modeling in a system for modeling activity patterns of network traffic to detect botnets.
Referring to Table 2, an embodiment of the invention may further differentiate the protocols classified in Table 1 according to network traffic pattern. A fixed indicator, such as T1, T2, U1, etc., may be given for the main categories, and patterns may be expressed for the sub-categories correspondingly. The sub-categories for TCP's HTTP “Request”, which may be used to analyze the patterns of traffic for HTTP “Requests”, can include a host portion, which is the domain of the target of a request for a web server resource, a page portion, which includes information on a particular page desired by the host, and a referrer portion, which includes information on the preceding steps of a website currently accessed. Accordingly, there may be three data fields, to include Host ID, Page ID, and Referrer. For the TCP's HTTP “Responses”, the traffic patterning may be performed using the reply codes for the corresponding servers. The patterning for UDP's DNS queries may be performed using the domain names, while the patterning for the UDP's DNS answers may be performed using the IP addresses receives as replies.
Table 3 illustrates a pattern element data table for sub-categories in a system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention.
Referring to Table 3, since it is likely that the host domain data for HTTP accesses and the domain data for DNS queries may overlap, the two types of data may share a single table. A host list is inserted as essential data in response to a HTTP request and may include domain names. A domain list is data included in a question regarding a DNS query and may include names of domains to which questions may be directed.
Table 4 is a page list in a system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention.
Referring to Table 4, a page list may be expressed according to a HTTP request. The page list may include file names indicating detailed pages to request which server resources the corresponding domain (host) will use.
Table 5 illustrates a referrer list in a system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention.
Referring to Table 5, a referrer list may include information regarding which links an object followed before arriving at the current page, with reference to a HTTP request. The referrer list may include uniform resource locator (hereinafter abbreviated as “URL”) information.
Table 6 illustrates a status code list in a system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention.
Referring to Table 6, status codes may include pattern data regarding a HTTP response and may be response codes indicating how the corresponding server processed a user's request for web server resources. As response codes, the status codes can also reveal the service status of the server. While various response codes can be implemented, this embodiment has been illustrated using an example in which codes for just the first digit, from among three digit numbers, are stored and used as pattern data.
Table 7 illustrates a query IP list in a in a system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention.
Referring to Table 7, a query IP list may include data regarding responses to DNS queries, i.e. to “Answer” traffic patterns. The query IP list may include information on the IP of the domains to which the questions are directed.
Using the indicators and ID described above, an embodiment of the invention can model the activity patterns of the network traffic. For example, “T1.2.1” may represent an action of accessing Daum by directly inputting the address, while “T1.1.2.2” may represent an action of accessing Naver by searching on Google and clicking Further, “T2.3” may represent a redirection connection, and “T2.5” may represent a server access error.
As illustrated in
As illustrated in
The botnet detection system may be provided within the network of an Internet service provider to detect botnets that are active within the network of the Internet service provider, based on the traffic information collected by the traffic collector sensors. More than one of such botnet detection systems can be included in the Internet service provider's network. Also, as illustrated in
As illustrated in
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
The detection information generation module may generate information regarding a botnet group determined by the suspected group comparative analysis module. Here, the information regarding the botnet group can include the IP of the clients, the activity of the botnet, etc.
The botnet composition analyzer module (BCA), as illustrated in
The botnet activity analyzer module (BAA) may analyze the attack activity of botnet groups and whether or not there was proliferation or migration of the botnet groups.
The detection log management module (DLM) may manage the logs of the composition information and activity information of the botnet groups and may include a composition information database and an activity information database for botnet groups.
The policy management module (PM) may establish the policies for the modules executed within the botnet monitoring/security management system. Also, the policy management module (PM) may establish a detection policy for the botnet detection system registered in the botnet monitoring/security management system. It may also establish a traffic information collector sensor policy by way of the registered botnet detection system.
The botnet monitoring/security management system may exchange various settings and status information with a monitoring system, and may receive group activity information and peer bot information, perform traffic classification, perform composition and activity analysis, and then store the results in a database. The composition and activity analysis information stored in the database may be transmitted back to the monitoring system.
As described above, an aspect of the invention can provide a system for modeling activity patterns of network traffic to detect botnets, where the system can classify the communication activities for each client to model network activity by differentiating the protocols of the collected network traffic based on destination and patterning the subgroups for the respective protocols. Also, an aspect of the invention can provide a system that can classify those servers that are estimated to be C&C servers into download and upload, spam servers and command control servers, within a botnet group detected by modeling network activity, i.e. analyzing network-based activity patterns. Furthermore, an aspect of the invention can provide a system that can detect botnet groups by way of a group information management function, for generating an activity pattern-based group matrix based on group data, and a mutual similarity analysis, performed on groups suspected to be botnets from the group information.
A description will now be provided of a method for modeling activity patterns of network traffic to detect botnets according to a first disclosed embodiment of the invention, with reference to the drawings. In the descriptions that follow, those descriptions that are redundant from the description of the system for modeling activity patterns of network traffic to detect botnets set forth above may be omitted or abridged.
As illustrated in
In the operation of collecting traffic (S1), the traffic data of a network may be collected according to a collection policy using a packet capturing tool. For this, traffic information collector sensors may be included in a multiple number of networks, collecting traffic information according to a traffic collection policy established by a botnet monitoring and security management system.
In the operation of classifying protocols (S2), the traffic collected in the operation of collecting traffic may be classified according to protocol. The operation of classifying protocols may include arranging the collected traffic into client sets according to destination (S2-1) and extracting feature elements of the traffic (S2-2).
In the operation of arranging into client sets according to destination (S2-1), the protocols collected in the operation of collecting traffic may be analyzed and arranged into client sets having the same destination. This operation of arranging into client sets according to destination (S2-1) may include storing the collected access records (S2-1-1) and arranging into client sets (S2-1-2).
In the operation of storing the collected access records (S2-1-1), the access records collected by the traffic information collector sensors may be stored, at the same time storing the access records collected over a certain time segment.
In the operation of arranging into client sets (S2-1-2), the collected traffic information may be analyzed and differentiated according to protocol, and then arranged into client sets. As described above with reference to the system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention, the protocols can be classified mainly into TCP and UDP, where the TCP may be classified into HTTP, SMTP, and other TCP. Also, the UDP may be classified into DNS and other UDP. In analyzing the protocols, the actual contents of the traffic may be analyzed and differentiated, and the group data may be arranged based on IP and port, i.e. the address of the destination.
In the operation of extracting feature characteristics of the traffic (S2-2), the header and contents of the classified protocol packets may be analyzed to extract feature characteristics of the traffic.
In the operation of modeling the activities for the traffic (S3), the headers of the TCP/IP layer and the IPv4 header from among the extracted feature characteristics of the traffic may be analyzed, to model the activities for the traffic. Afterwards, the modeled activity information for the traffic can be used in detecting botnets.
As described above, this embodiment of the invention can provide a method for modeling activity patterns of network traffic to detect botnets, where the method can classify the communication activities for each client to model network activity by differentiating the protocols of the collected network traffic based on destination and patterning the subgroups for the respective protocols. The embodiment can also provide a method that can classify those servers that are estimated to be C&C servers into download and upload, spam servers and command control servers, within a botnet group detected by modeling network activity, i.e. analyzing network-based activity patterns. Furthermore, the embodiment can provide a method that can detect botnet groups by way of a group information management function, for generating an activity pattern-based group matrix based on group data, and a mutual similarity analysis, performed on groups suspected to be botnets from the group information.
A description will now be provided of a method for modeling activity patterns of network traffic to detect botnets according to a second disclosed embodiment of the invention, with reference to the drawings. In the descriptions that follow, those descriptions that are redundant from the description of the method for modeling activity patterns of network traffic to detect botnets according to the first disclosed embodiment of the invention set forth above may be omitted or abridged.
As illustrated in
In the operation of collecting traffic (S1), the traffic data of a network may be collected according to a collection policy using a packet capturing tool. For this, traffic information collector sensors may be included in a multiple number of networks, collecting traffic information according to a traffic collection policy established by a botnet monitoring and security management system.
In the operation of collecting traffic (S2), the collected traffic may be grouped. For this, the operation of collecting traffic (S2) may include classifying protocols (S2-1).
In the operation of classifying protocols (S2-1), the traffic collected in the operation of collecting traffic may be classified according to protocol. The operation of classifying protocols may include arranging the collected traffic into client sets according to destination (S2-1-1).
In the operation of arranging into client sets according to destination (S2-1-1), the protocols collected in the operation of collecting traffic may be analyzed and arranged into client sets having the same destination. This operation of arranging into client sets according to destination (S2-1-1) may include storing the collected access records (S2-1-1-1) and arranging into client sets (S2-1-1-2).
In the operation of storing the collected access records (S2-1-1-1), the access records collected by the traffic information collector sensors may be stored, at the same time storing the access records collected over a certain time segment.
In the operation of arranging into client sets (S2-1-1-2), the collected traffic information may be analyzed and differentiated according to protocol, and then arranged into client sets. As described above with reference to the system for modeling activity patterns of network traffic to detect botnets according to an embodiment of the invention, the protocols can be classified mainly into TCP and UDP, where the TCP may be classified into HTTP, SMTP, and other TCP. Also, the UDP may be classified into DNS and other UDP. In analyzing the protocols, the actual contents of the traffic may be analyzed and differentiated, and the group data may be arranged based on IP and port, i.e. the address of the destination.
In the operation of determining botnet groups (S3), the groups classified as suspected groups may be analyzed with respect to similarity, to determine botnet groups. This operation of determining botnet groups may include managing group matrices (S3- 1), selecting analysis targets (S3-2), and analyzing group similarity (S3-3).
In the operation of managing group matrices (S3-1), the matrices for the group data transmitted from the traffic information collector module, i.e. the group matrices, may be managed. Here, managing group matrices refers to generating, updating, and deleting group matrices, and thus the operation of managing group matrices may include operations for generating group matrices (S3-1-1), updating group matrices (S3-1-2), and deleting group matrices (S3-1-3).
In the operation of generating group matrices (S3-1-1), group matrices may be generated for new groups. That is, for a new group that did not exist before, there is no group matrix, and thus a new group matrix may be generated.
In the operation of updating group matrices (S3-1-2), if a group did exist before, the matrix for the existing group may be updated. In the operation of deleting group matrices (S3-1-3), if there are no actions by a group's clients for a certain amount of time, then the group matrix may be deleted, according to the group matrix management algorithm.
In the operation of selecting analysis targets (S3-2), after the group matrices are updated, if a particular access pattern exceeds a threshold number for each of the group matrices, then the corresponding group may be selected as an analysis target group.
In the operation of analyzing similarity (S3-3), the similarity of the clients may be analyzed for the aggregate of groups selected as analysis targets. If the similarity is above a certain level, for example, 80%, then the similarity may be analyzed for the detailed client list with reference to a particular, characteristic access pattern. Also, if the client similarity to the particular access pattern is above a certain level, for example, 80%, then the two corresponding groups may be determined to be of the same botnet.
As described above, this embodiment can provide a method that can detect botnet groups by way of a group information management function, for generating an activity pattern-based group matrix based on group data, and a mutual similarity analysis, performed on groups suspected to be botnets from the group information.
While the present invention has been described above with reference to particular drawings and embodiments, those skilled in the art will understand that numerous variations and modifications can be conceived without departing from the spirit of the present invention as disclosed by the scope of claims appended below.
Number | Date | Country | Kind |
---|---|---|---|
10-2009-0126884 | Dec 2009 | KR | national |
10-2009-0126905 | Dec 2009 | KR | national |