The present invention claims priorities of Korean Patent Application No. 10-2011-0023391, filed on Mar. 16, 2011, which is incorporated herein by reference.
The present invention relates to expression and detection of a malicious code, and more particularly, an apparatus and a method for detecting a malicious code by visualizing a form, a structure and a characteristic of a malicious file to generate a graph thereof and visualizing a specific executable file to form a graph thereof and then measuring similarities between the graphs to determine that the executable file has a malicious code.
Computer viruses have been developed into various types, starting from a file infecting virus to a worm virus using a network for rapid spreading and a Trojan horse virus for data leakage. The threat of these malicious codes is on an increasing trend year to year. Even from the technical perspective, the risk of the malicious codes is more increasing, thus actually making computer users feel uneasy. To solve this problem, various approaches to protect computer systems from threatening of new malicious codes are being actively studied.
Most of anti-virus software known to date use a file-based diagnosis, which is a method using a signature in a specific format, so it is called as a signature-based or string scanning method. Since such signature-based diagnosis targets on only a specific portion or unique portion of a file sorted as a malicious code for scanning, mis-detection or non-detection can be minimized. Further, upon file scanning, the comparison of only specific portions of files allows for fast scanning. However, this method can merely handle malicious codes that have been already known, and thus, it is unable to cope with new forms of malicious codes that have been unknown yet.
One of detection methods developed for overcoming the limitation of the signature-based diagnosis is a heuristic detection technique. This designates instructions of general malicious codes, e.g., file writing in a specific folder and a specific registry change, as heuristic signatures and compares the heuristic signatures with instructions for files to be scanned. The heuristic detection technique is classified into a method actually executed in a virtual operating system, and a method of scanning and comparing files themselves without execution.
Besides, an operation code (OPcode) instruction comparison method for a common code section of malicious files is often used. These methods are able to detect even unknown malicious codes but should actually previously collect information regarding instructions within files, which may be easy to cause system load during execution. Thus, an analysis technique for minimizing the load while executing an efficient detection for unknown malicious codes is required.
In view of the above, the present invention provides an apparatus and a method for detecting a malicious code by visualizing a form, a structure and a characteristic of a malicious file to generate a graph thereof by a malicious code visualization device and visualizing a specific executable file to form a graph thereof by a malicious code determination device and then measuring similarities between the graphs to determine that the executable file has a malicious code.
In accordance with an aspect of the present invention, there is provided a malicious code visualization device including: a string extracting unit for unpacking a file containing a malicious code depending on whether or not the file is in a packed status, and extracting at least two strings from the file; an entropy calculating unit for calculating an entropy for each of the extracted strings; and a graph generating unit for setting the strings to nodes, respectively, setting directionalities of the nodes based on a connection among the respective strings, and setting colors of the nodes based on the entropies for the strings to generate a graph for the file.
In accordance with another aspect of the present invention, there is provided a malicious code determination device using a malicious code database that stores graphs for files containing malicious codes. The device includes: a data extracting unit for extracting strings from a certain executable file and calculating entropies for the strings; a data indicating unit for setting the strings to nodes, setting directionalities of the nodes based on a connection among the respective strings, and setting colors of the nodes based on the entropies for the strings to generate a graph for the executable file; and an analyzing unit for comparing the graph for the executable file with the graphs stored in the malicious code database to determine whether or not the executable file has a malicious code.
In accordance with still another aspect of the present invention, there is provided an apparatus for detecting a malicious code including: a malicious code visualization device for generating a graph for a malicious file by using strings in the malicious file, a connection among the strings and entropies for the strings, and establishing a malicious code database with the generated graph for the malicious file; and a malicious code determination device for generating a graph for a specific executable file and comparing the graph for the executable file with graphs for malicious files stored in the malicious code database to detect a malicious code in the executable file.
In accordance with still another aspect of the present invention, there is provided a method for detecting a malicious code including: generating a graph for a malicious file by using strings in the malicious file, a connection among the strings and entropies for the strings, and establishing a malicious code database with the generated graph for the malicious file; generating a graph for the executable file and comparing the graph for the executable file with graphs for malicious files stored in the malicious code database to detect a malicious code in the executable file.
The objects and features of the present invention will become apparent from the following description of embodiments, given in conjunction with the accompanying drawings, in which:
Hereinafter, an apparatus and a method for detecting malicious code in accordance with embodiments of the present invention will be described in detail with the accompanying drawings.
The malicious code detecting apparatus 10 includes: a malicious code visualization device 100; a malicious code database 200 and a malicious code determination device 300.
The malicious code visualization device 100 visualizes an executable file having a malicious code (i.e., a malicious file) as a graph and establishes the malicious code database 200 by storing the graph therein.
The malicious code determination device 300 generates a graph of an executable file to be determined whether it has a malicious code or not and compares the graph of the executable file with graphs stored in the malicious code DB 200, thereby determining whether the executable file has the malicious code or not.
Hereinafter, detailed configurations of the malicious code visualization device 100 and the malicious code determination device 300 will be described.
As shown in
Depending on whether or not an executable file containing a malicious code is in a packed status, the string extracting unit 102 may unpack the executable file when the file is in the packed status. Then, the string extracting unit 102 extracts at least two strings from the unpacked file. Herein, the strings include instructions for executing the executable file and show a sequence thereof. The strings extracted by the string extracting unit 102 are provided to the entropy calculating unit 104.
The entropy calculating unit 104 calculates an entropy for each string to forward the same to the graph generating unit 106. The entropy may include a length, a pattern, a frequency or the like of each string.
The graph generating unit 106 sets the strings to nodes, sets directionalities thereof based on a connection among the strings and determines a color of each node based on the entropy to generate a graph. In other words, as shown in
The thusly-generated graph for each malicious executable file is stored in the malicious code database 200.
In accordance with the embodiment of the present invention, the executable files containing malicious codes can be expressed by visualizing a form, a structure, a characteristic or the like thereof, thereby facilitating indication of a structure, a form, a behavior or the like of the malicious executable files for easy understanding.
As shown in
Depending on whether or not a certain executable file is in a packed status, the data extracting unit 302 unpacks a packed executable file and extracts strings from the unpacked executable file. Then the data extracting unit 302 calculates entropies for the respective extracted strings. Herein, the entropy includes a length, a pattern, a frequency or the like of each string.
The data indicating unit 304, as shown in
The data extracting unit 302 and the data indicating unit 304 may be implemented by the malicious code visualization device 100 as shown in
The analyzing unit 306 compares the graph generated by the data indicating unit 304 with the data (graphs) stored in the malicious code database 200. When a graph having similarity with the graph corresponding to the executable file more than a preset threshold value is present in the malicious code database 200, the analyzing unit 306 determines that the executable file has a malicious code. Thus, the analyzing unit 306 can detect an existence of a malicious code in the executable file.
Further, when it is detected that the malicious code is present in the executable file, the analyzing unit 306 updates the data stored in the malicious code database 200 by using the graph for the executable file. In other words, the analyzing unit 306 updates the graph (i.e., the graph having similarity more than a threshold value with the graph for the executable file) within the malicious code database 200 by using the graph for the executable file or add the graph for the executable file to the malicious code database 200.
Hereinafter, a process in which the malicious code detecting apparatus 10 with the foregoing configuration detects a malicious code and updates the malicious code database will be described with reference to
First, the malicious code visualization device 100 is used to generate graphs for executable files containing malicious codes, and establishes a malicious code database 200 by using the generated graphs in step S400.
Upon receipt of an executable file in step S402, the data extracting unit 302 in the malicious code determination unit 300 extracts strings from the executable file and calculates entropies for the extracted strings in steps S404 and S406. Here, when the executable file is in a packed status, the data extracting unit 302 extracts the strings after unpacking the packed executable file, and calculates the entropies, such as length, pattern, frequency, or the like of the strings. The calculated entropies and the strings may be forwarded to the data indicating unit 304.
The data indicating unit 304 sets the strings to nodes, sets directionalities (arrows) of the nodes based on a connection among the strings, determines a color of each node based on the entropy and generates a graph for the executable file in step S408. The generated graph is provided to the analyzing unit 306.
Thereafter, the analyzing unit 306 compares the graph for the executable file with malicious code graphs stored in the malicious code database 200 to calculate similarities therebetween in step S410.
Next, the analyzing unit 306 determines whether or not there is a graph has similarity with the graph for the executable file more than a preset threshold value in the malicious code database 200 in step S412.
If there is such graph as a result of the determination in step S412, the analyzing unit 320 determines that the executable file has a malicious code and updates the malicious code database 200 by using the graph for the executable file in step S414. With this, the malicious code in the executable file is detected.
In accordance with the malicious code detecting method of the embodiment of the present invention, information regarding an executable file can be visualized, and similarities among the graph for the executable file and graphs for malicious files stored in the malicious code database 200 can be measured based on the visualized information, thereby detecting a malicious code, which results in facilitating determination of malicious code patterns.
In addition, in accordance with the present invention, executable files containing malicious codes can be expressed by visualizing a form, a structure, a characteristic or the like of the executable files, thereby facilitating indication of a structure, a form, a behavior or the like of the malicious executable files.
While the invention has been shown and described with respect to the specific embodiments, it will be understood by those skilled in the art that various changes and modification may be made without departing from the scope of the invention as defined in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2011-0023391 | Mar 2011 | KR | national |