In one sense, malware includes unwanted software that is installed on a computer. Malware may be hostile, intrusive, or annoying. It may be designed to infiltrate or damage a computer system without the owner's informed consent. Malware can be relatively benign or severely disruptive. Some malware can spread from computer to computer via networks or the use of removable computer-readable media. Some malware attempts to remain hidden from user inspection while other malware becomes obvious immediately.
The number of malware continues to grow at a phenomenal rate. Vendors that produce malware detection and removal products are continually updating the list of malware their products can detect and remove. Guarding against malware is an ongoing challenge.
Briefly, aspects of the subject matter described herein relate to malware detection using code analysis and behavior monitoring. In aspects, an anti-malware engine performs static analysis on program code and monitors behavior of the program code that is exhibited when the program code executes in a virtual and/or non-virtual environment. The anti-malware engine combines the results of both types of malware detection to determine whether the program code includes malware. The anti-malware engine may use feedback from one or more of the malware detection mechanism to direct additional malware detection (e.g., static and/or behavior detection) for the program code.
This Summary is provided to briefly identify some aspects of the subject matter that is further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The phrase “subject matter described herein” refers to subject matter described in the Detailed Description unless the context clearly indicates otherwise. The term “aspects” is to be read as “at least one aspect.” Identifying aspects of the subject matter described in the Detailed Description is not intended to identify key or essential features of the claimed subject matter.
The aspects described above and other aspects of the subject matter described herein are illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Aspects of the subject matter described herein are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with aspects of the subject matter described herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microcontroller-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Aspects of the subject matter described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. Aspects of the subject matter described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 110 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVDs) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 110. Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160 or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
As mentioned previously, malware is a significant problem to computer systems. In one embodiment, malware may include computer viruses, worms, Trojan horses, spyware, unwanted adware, other malicious or unwanted software, and the like. In another embodiment, malware may include software that presents material that is considered to be obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable.
The primary mechanism by which an anti-malware product (antivirus or antispyware) detect malware is by matching the binary code of the malware against a “signature.” The signature may be as simple as a hash of the binary. However, this approach may be defeated by malware authors through modifications to the binary.
Malware authors also make it more difficult for anti-malware software to detect the malware by packing (encoding) the binary, encrypting the binary, rearranging parts of the binary, some combination of the above, and the like. A packed binary may have millions or more variations and may be packed and/or encrypted multiple times. When an encoded malware executes, it will unpack itself and then execute the malicious code. Anti-malware vendors may counter this attack by introducing emulation, which allows a computer to emulate the malware in a virtual environment to unpack itself. After the malware is unpacked, malware detection software may then use a signature to match the original malware.
A virtual environment is an environment that is simulated or emulated by a computer. The virtual environment may simulate or emulate a physical machine. This machine that is simulated or emulated is sometimes called a virtual machine. A virtual machine is a machine that, to software executing on the virtual machine, appears to be a physical machine. The software may save files in a virtual storage device such as virtual hard drive, virtual floppy disk, and the like, may read files from a virtual CD, may communicate via a virtual network adapter, and so forth.
More than one virtual machine may be hosted on a single computer. That is, two or more virtual machines may execute on a single physical computer. To software executing in each virtual machine, the virtual machine appears to have its own hardware even though the virtual machines hosted on a single computer may physically share one or more physical devices with each other and with the hosting operating system.
Emulation has its limits. For example, it may not be possible to emulate an operating system environment perfectly. In addition, resource and time constraints may prevent anti-malware products from emulating every binary thoroughly.
The service 210 hosts an anti-malware engine 230 that determines whether a program (e.g., the program 215) is malware. In making this determination, the anti-malware engine 230 may use static properties of the program and behavior of the program. Static properties include properties of a program that can be determined without executing the program. Some exemplary static properties include the libraries to which a program links, the name of the program, its size, its version number, APIs a program imports, references to APIs (e.g., API calling code) included in the program, a hash of a portion or the entire program, an encryption algorithm, if any, by which the program has been encrypted, metadata about the program, a pattern included in the program, and the like.
For example, the encryption algorithm by which a program has been encrypted may increase the confidence that the program is malware if the encryption algorithm is often used for other malware. As another example, malware often has an irregular version number. Thus, having an irregular version number may increase the confidence that the program is malware.
The set of properties that defines a particular malware is sometimes referred to as a signature of the malware. A set of properties may also define a set of more than one malware. In this case, the signature of the set of properties may indicate that a malware of the set is present. Malware signatures may be stored in the malware signature set 231 which may be updated periodically.
Behavior of a program includes what a program does when it is executed. Behavior may include injection into another process, sending data to the network, downloading other programs, modifying the registry (e.g., adding a class ID), modifying one or more files, creating one or more files, where the process creates and/or modifies files (e.g., files in a system directory), modifying locations in memory, and the like. Behavior may be monitored by executing the program in a virtual environment such a virtual operating system and monitoring the program's behavior, by executing the program in the real operating system and monitoring the program's behavior, by a combination of the above, and the like. In one embodiment, the anti-malware engine 230 may not directly execute the program in the real operating system but may allow the operating system to execute the program after doing static and/or dynamic analysis.
The anti-malware engine 230 may use the static properties and/or behavior of the program to determine whether the program is malware. The anti-malware engine 230 may assign confidences levels to one or more properties and behaviors of the program and may combine confidence levels (e.g., according to rules) to determine whether the program is malware.
The anti-malware engine 230 may use feedback to determine that others actions are to be taken in determining whether the program is malware. For example, if the confidence level obtained via static property analysis is over a threshold, the anti-malware engine 230 may cause the program to be emulated more extensively in a virtual environment to determine whether the program is malware. If the behavior during emulation or real time execution increases the confidence level above a threshold, the anti-malware engine 230 may cause more rigorous static analysis to be performed on the program.
The kernel driver 220 monitors changes to the resources 225-227 made by the program 215. The resources 225-227 may include, for example, a portion of a registry or other data base, files or other object of a file system, data sent or received from a network, a portion of memory, and the like.
The kernel driver 220 may be configured to notify the real time input component 232 when predefined resources are accessed by the program 215. For example, if the program 215 adds a class ID to the registry 225, the kernel driver 220 may notify the real time input component 232 that a class ID has been created. The kernel driver 220 may also include additional information, if desired, such as which class ID was created or what registry values were changed.
If files within a system directory are changed by the program 215, the kernel driver 220 may notify the real time input component 232 and provide the files changed. If other monitored files are changed (e.g., a partition table, boot information, other sensitive files, and the like) by the program 215, the kernel driver 220 may notify the real time input component 232 of the change.
The kernel driver 220 may be configured to notify the real time input component 232 if the program 215 downloads certain binaries from the network. If the application downloads one or more of these binaries, the kernel driver 220 may notify the real time input component 232 and indicate the binaries downloaded.
The kernel driver 220 may be configured to notify the real time input component 232 if the program 215 modifies or attempts to modify certain locations in memory. For example, the program 215 may attempt to modify memory to gain access to sensitive resources. If the program modifies the certain locations in memory, the kernel driver 220 may notify the real time input component 232 and indicate the memory that has been modified.
The behaviors that the kernel driver 220 may monitor that are mentioned above are intended to be exemplary and not all-inclusive or exhaustive. In some embodiments, the kernel driver 220 may monitor any designated behavior of the program 215. The behaviors and/or resources that the kernel driver 220 monitors may be configured via the real time input component 232.
If the anti-malware engine 230 determines that a program is or is likely malware, the service 210 may notify a user via the user interface 205. A user interacting with the user interface 205 may instruct the service 210 to perform various actions in response. In another embodiment, an administrative process may automatically take one or more actions without any user interaction. Such actions may include, for example, stopping the program, putting the program in quarantine, allowing the program to continue executing, other actions, and the like.
The program 215 may include one or more executables, libraries, scripts, processes, threads, and the like. In one embodiment, the program 215 comprises any thread, process, instructions, or the like that are capable of being executed by a computer (e.g., such as the computer 110 of
Although the entities illustrated in
The behavior detector(s) 315 may configure external detection components (e.g., the kernel driver 220) and determine the behavior that a program is exhibiting based on input received from the external detection components.
The static detectors 320 may analyze static properties of a program in an attempt to match the properties to a signature, for example.
The behavior monitoring component 305 may include pre filtering, correlation, and post filtering subcomponents. The pre filtering component may filter out behaviors that are not deemed to be indicative of malware activity. The correlation component may correlate activity with malware activity. The post filtering may apply rules to determine when identified and correlated activity is not sufficient to be considered possible malware activity.
The malware detection engine 325 may take input from the static detectors 320 and the behavior monitoring component 305 and may make a determination as to whether a program is malware. In making this determination, the malware decider 325 may be driven by rules. These rules may specify conditions that must exist before a program is considered malware. For example, a rule may state that if the static detector has detected an irregular version number and the behavior monitoring component has detected registry modification, that this indicates that the program is suspect to be malware and needs to be further emulated or to be sent back for more rigorous static detection. As another example, a rule may state that if the program was encrypted with a particular encryption algorithm and is attempting to download files from a particular server, that the program is suspected to be malware and is to be further analyzed.
A rule may also specify what additional activities are to be performed, if any, to determine whether a program is malware. These additional activities may be triggered when the conditions of the rule are met.
If a program is determined to be malware or likely to be malware, the anti-malware engine 230 may send notifications to users, system administrators, programs that have subscribed to be notified, and the like. A backend server may be notified if a program is determined to be malware. A anti-malware vendor may use this information to locate the malware and create a signature for the malware to use in updating the malware signature sets on one or more other machines.
The malware decider 325 may determine that additional real time and/or emulation monitoring is to be performed and/or that more static analysis is to be performed. In response, the anti-malware engine may continue or increase the level of real time monitoring, emulation, and/or static analysis.
The malware decider 325 (or the components doing the malware analysis) may direct the type of additional malware detection that is to be performed on a program. For example, the malware decider 325 may, based on various inputs from various modules, determine that static analysis that searches for certain types of API calls be performed, that certain types of network activities are to be further monitored, that other portions of a registry or file system are to be monitored for changes, and the like. The direction of the additional malware detection may be determined based on the rules.
A data structure that tracks what has been discovered about a program may be created and maintained. This data structure may be passed or otherwise made available to each of the components of the anti-malware engine 230. A component may use the data structure to modify its detection behaviors and may also update the data structure as additional information is discovered about the program.
Turning to
At block 415 the program code is executed in a virtual and/or non-virtual environment. For example, referring to
At block 420, behavior of the executing program code is monitored. For example, referring to
At block 425, results obtained during static analysis and behavior monitoring is combined. For example, referring to
At block 430, a determination is made as to whether more static and/or behavior analysis is needed. If so the actions continue at block 410 and/or block 435. For example, referring to
At block, 435, the program code is executed in the appropriate environment for behavior monitoring if the program code is not already executing in that environment. For example, if the program code has been executing in a virtual environment and is now to be executed in a non-virtual environment, the program code is executed in the non-virtual environment. If the program code has been executing in a non-virtual environment and is now to be also be executed in a virtual environment, the emulation component 310 may initialize the virtual environment, if needed, and execute the program code in the virtual environment.
At block 440, the actions end.
Note that the actions associated with blocks 410 and 415 and 420 may be performed in parallel. Also note that the combination of results at block 425 may be performed at any time and that the actions associated with block 430 may also be performed at any stage in the method of
As can be seen from the foregoing detailed description, aspects have been described related to malware detection using code analysis and behavior monitoring. While aspects of the subject matter described herein are susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit aspects of the claimed subject matter to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of various aspects of the subject matter described herein.