While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
Referring to
Processor 104 is representative of any of various types of processors such as an x86 processor, a PowerPC processor or a SPARC processor. Similarly, main memory 102 is representative of any of various types of memory, including DRAM, SRAM, EDO RAM, Rambus RAM, etc.
I/O interface 112 is operational to transfer data between processor 104 and/or main memory 102 and one or more internal or external components such as hard disk drive 114, network interface 116 and removable storage 118, as desired. For example, I/O interface 112 may embody a PCI bridge operable to transfer data from processor 104 and/or main memory 102 to one or more PCI devices. I/O interface 112 may additionally or alternatively provide an interface to devices of other types, such as SCSI devices and/or Fibre channel devices.
Hard disk drive 114 may be a non-volatile memory such as a magnetic media. Network interface 116 may be any type of network adapter, such as Ethernet, fiber optic, or coaxial adapters. Removable storage 118 is representative of a disk drive, optical media drive, tape drive, or other type of storage media, as desired.
In addition to the various depicted hardware components, computer system 100 may additionally include various software components. For example,
As will be described in further detail below, malicious code detector 160 represents a software module configured to execute a method for detecting malicious code in the form of embedded machine code in a benign type data file. Application 170 represents one embodiment of an application program capable of opening or loading a data file according to the methods described herein. Computer system 100 may also include one or more data files 175, of which at least some may be benign type data files, in which malicious code may be embedded.
Referring to
Since a benign type of data file is a data file in which the presence of executable code is not expected under any normal circumstances, or in which executable code does not serve any logical purpose in relation to the data content of the file, the presence of any encoded executable code in a benign file type data file may be interpreted as an indication of the file being at least suspicious, if not malicious. The presence of encoded machine code instructions in a benign file type of data file, which, when executed by a microprocessor, would result in a transfer of process control, may also be interpreted as an indication of the file containing malicious code.
It is noted that there is a finite statistical probability for finding a single encoding 204 corresponding to a machine code instruction in a benign data file. However the probability of finding a set of encoded machine code instructions (including any associated operands) in a benign type data file that does not contain embedded malicious code can be assumed sufficiently small enough to preclude false positives in detecting malicious code.
As shown in
The methods described herein involve various embodiments for detecting malicious code by analyzing the contents of a data file. One aspect of an implementation includes checking a benign data file type for suspicious executable content. Another aspect of an implementation is checking the data file in a manner causing minimal performance impact, because some operations involved with a thorough analysis may require substantial computational processing power. One implementation that addresses each of these aspects is embodied by a two stage detection, as will be discussed in detail below.
One exemplary embodiment of a two-stage method for detecting malicious code is illustrated in flowchart form in
In a first detection stage, the benign data file type may be scanned for the presence of any binary encodings corresponding to a logical set of instructions. A logical set of instructions is a minimum defined set of consecutive instructions that make logical sense. In one instance, a logical set of instructions is defined by a reference table. In another case, a logical set is the presence of one or more instructions. If any encoding corresponding to a logical set of instructions is found in the benign data file type, then this serves as an indication that the file is at least suspicious, if not malicious. In this manner, all files that are not suspicious may be more easily and efficiently filtered, and allowed for further processing, storage, transmission as desired.
The first detection stage is implemented in steps 308 and 310 of
In step 310, a determination is made if any encodings corresponding to instructions of executable machine code have been detected in the data file, which could render the data file suspicious for containing malicious code. In one embodiment, the determination step 310 may be combined with the disassembly step 308, for example by terminating as soon as a valid encoded instruction, or a logical set of instructions as described above, is detected. The encoded machine code instructions may include operational codes, (representing individual commands) and their respective operands. In other embodiments, various specific implementations of individual method steps, or combinations of steps, for ascertaining that a data file is suspicious, i.e., potentially malicious, may be adopted for the first stage.
If in step 310 no encoding corresponding to machine code is found, then the method continues to step 316, where the application may be allowed to load the file. In various embodiments, as discussed above, step 316 may be replaced with or include other actions related to normal processing of the data file, such as informing a user of the result, certifying the file as clean, recording the performing of the scan, transferring the file over the network, etc. In one embodiment, a file that is not found to be suspicious (or malicious) according to the methods described herein may be certified as a benign data file.
The positive determination in step 310 marks the begin of the second stage, which may include a further, more rigorous analysis of the suspicious data file for determining if an indication of maliciousness is present in the file. Since the second stage may involve analyses that are more extensive and specific to a given situation (i.e., the combination of platform, system, application, network, microprocessor, etc.), the processing required in the second stage may consume more resources, such as time and computing power. Therefore, performing the second, more detailed stage only on suspicious data files detected in the first stage may improve the overall efficiency of the method. Other methods that divide the detection procedure between the first and second stages, or combine them in a single unified operation, may also be practiced in various embodiments.
As mentioned above, if an encoding corresponding to machine code is found in step 310, then the file may be considered suspicious. In this case, an additional analysis may be conducted in step 312. The analysis in step 312 may be a more detailed and specific analysis according to various embodiments of the described methods. For example, the detected logical sets of instructions may be compared with a reference table of machine code instructions, to determine if the code is malicious. The additional analyses in step 312 may also or alternatively ascertain whether a detected logical set of instructions would result in either a control transfer (like jmp, jz, call, etc.) or an invocation of an operating system API procedure, when executed by a microprocessor. The presence of encoding found in a benign file type corresponding to such logically executable code sections may indicate that, if execution control were to be transferred to this location in the file, then an exploit could be triggered. If such a potential result is indicated, then the suspicion level of the data file may be further raised to malicious.
The detected sets of instructions may or may not be complete exploit code and may refer to further code sections for loading additional machine code instructions required for the exploit to exhibit actual malicious behavior. However, a detailed analysis of such subsequent code sections is not necessarily required for detecting the exploit. In many cases of exploits discovered so far (e.g., for WMF vulnerability, JPEG vulnerability etc.), it has been found that the initially detected section of instructions completely contained the exploit code. It is noted that even if the malicious code is polymorphic, it could still be detected from encodings corresponding to any logical set of instructions, which are an inherent anomaly in a benign data file type. In one embodiment, detection of maliciousness may be optimized by accommodating a certain spatial coherency of the machine code instructions during a search of the entire file at a binary level. When encodings corresponding to a logically significant set of instructions are found at a location in the data file, a section before and after that location may be marked for further scrutiny.
Other attributes of the executable code may be determined and evaluated for their respective maliciousness. The analysis in step 312 may depend in complexity and duration upon the results of previous steps in the analysis, such as the number of encoded instructions found in step 310.
The method of
If in step 314 the decision is yes, the data file can be considered malicious and found to contain a serious threat of an exploit. In step 318, the file may be designated as malicious and thus subject to any action appropriate for malicious files, depending on the administration of the host system. Such actions may include quarantine, deletion, or destruction in the form of total erasure. The actions may also include user notification and acknowledgement of the status and specific malicious content found in the data file. Other actions commensurate with the handling of files containing a detected exploit may be performed in result of step 318, in various embodiments.
An additional result of step 314 may be the discovery and recording of newly discovered machine code instructions, either malicious or not malicious, that were detected in the data file. These newly discovered machine code instructions may be added to a reference table or some other body of knowledge, for example, to provide faster indication for future iterations of analysis 312 of potential maliciousness, if the same code instructions are detected again. Thus the method shown in
In addition to the aspect of a two stage analysis, as shown in
1. Restrict file type—As previously mentioned, the type of data files considered suitable (for example, benign type data files) for the detection methods described herein may be restricted. In one embodiment, only file types that are possibly loaded by certain applications for viewing and processing are selected.
2. Check files on the basis of application association and known application vulnerability—If a vulnerability has been found or disclosed in a given application, then the detection for malicious code may be further restricted to only those files that can be processed by a vulnerable application.
3. Check files on the basis of source vector/origin—If the file has arrived on the host via more reliable mediums like CD/Floppy the priority awarded to such files as compared to files that arrive via network transport could be raised or lowered, as required. In one example, data files hosted on an intranet network shared location are deemed less suspicious than files downloaded via the Internet.
4. Check files on the basis of age—Newer files created after a certain date may be more suspect than older files. The duration a file has been resident on the system relative to how many times it has been opened or accessed may provide a further indication of suspiciousness. For example, if the file has been residing on the system for long but has so far never been accessed/opened, it its more suspect than files that have previously been accessed.
5. Check streaming content by delayed buffering—In order to perform the detection on streaming content, a scan may be performed in the buffering logic such that the hosting application can only read sections that have been scanned clean.
6. Check files in transit at network layer—Any of the above mentioned checks being performed on the host system may be implemented as scans at the network level, only limited by the computing power of the network component and the rate of data in transit. In one embodiment, the detection method is implemented in an FPGA processing unit that is a component in a network device. The network device may be any device involved with the transmission of data files across a network. In the FPGA implementation, the detection procedure may be updated over a network interface to the FPGA from a remote location, which may also include updating the reference table of known logical sets of machine code instructions.
7. Check files in transit at gateway level—In applications such as email, FTP, file share etc. the detection methods described herein may be performed on a server before the files are made available for download to other users and clients. Other methods and restrictions may be applied for optimizing the performance of detection procedures in various embodiments.
It is noted that some key benefits of the approaches described above include the ability to detect malicious code irrespective of the fact that the target application (i.e., the application program that is going to load or process the file) is vulnerable or patched. Also in some cases embodiments of the described methods may detect both the malicious code and an unknown or undisclosed vulnerability in a target application. By detecting the malicious code, the mechanism of an undiscovered vulnerability in an application program may be documented, and may thus provide a basis for patching the vulnerability to that exploit.
It is further noted that any of the embodiments described above may further include receiving, sending or storing instructions and/or data that implement the operations described above in conjunction with
The embodiments described herein may also be implemented by an information handling system comprising a memory, a first processor, and computer-readable code stored on said memory and processable by said first processor. A system implementing the methods described herein may be configured in various embodiments to perform a detection scan in real-time, with fixed scan periods, in response to an event (such as receiving a data file), or may be scheduled to work in the background at periodic intervals.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.