The present invention relates in general to the field of computers, and more particularly, to a system and method for behavioral analysis of suspicious events and malicious applications in computer systems.
Without limiting the scope of the invention, its background is described in connection with suspicious events and malicious applications in computer systems.
Malicious software applications, or malware, are the primary source of many security problems. These intentionally manipulative malicious applications intend to perform unauthorized activities on behalf of their originators on the host machines for various reasons such as stealing advanced technologies and intellectual properties, governmental acts of revenge, and tampering sensitive information, to name a few. Malware applications are complex software programs that are often obfuscated to disguise their main intentions and thus deceive network administrators and the underlying intrusion detection systems. Although such obfuscations can be captured, reported, and maintained in a repository as a reference for building better detection mechanisms, newer malware programs are constantly developed by professional hackers raising the challenging problem of zero-day malware detection [1]. As a result, in order to build an effective malware detection and defense system, it is crucial to understand each malware and comprehend its behavior through rigorous analysis.
There are two conventional approaches that are widely adopted for analyzing software programs: 1) static analysis by which the underlying software is parsed, and intermediate transformations of the underlying software are generated without actually executing the software program. For instance, a control-flow graph can be created to represent the execution control of the program under test; and 2) dynamic analysis by which the program under test is executed in a controlled environment (e.g., a sandbox) and the behavior of the program is observed under various environmental conditions for further analysis. For instance, a sandbox (e.g., Cuckoo [2]) can capture the processes that are created along with the files that are tampered with or modified during the execution of the program under test. These two conventional program analysis techniques (i.e., static and dynamic) are often complementary to each other, each targeting different types of faults or malicious activities in the program that is being analyzed. There is also a third “hybrid” approach that enables conducting both static and dynamic analysis of the program under test.
Although these conventional program analysis techniques are shown to be effective in comprehending static and dynamic features of the software under test, it is often time-consuming, labor-intensive, and technically challenging to build a customized analysis platform. Therefore, to ease performing such a complex analysis, some other analysis techniques with a smoother learning curve and faster comprehension of functionalities of the underlying software under test should be developed for analysis purposes. The visual analytics approach is one of those possible solutions to facilitate the analysis process and efficiently and effectively showcase the processes involved in malware analysis.
Accordingly, there is a need for a system and method for behavioral analysis of suspicious events and malicious applications in computer systems.
In one embodiment of the present disclosure, a method, comprising: receiving, at a computer system, data capturing a run time behavior of a binary object; extracting, with the computer system, the data from the run time behavior of the binary object; mapping, with the computer system, the extracted data into one or more interactive visual representations of the run time behavior of the binary object comprising a scalar representation of process calls, dependencies among processes and executable files, or time dependencies between the processes and the executable files; and classifying, with the computer system, the binary object as malicious or benign.
In one aspect, the method further comprises identifying an intention and a location of a malicious payload in the binary object. In another aspect, the method further comprises analyzing an unknown malware within the object code by recognizing one or more unusual signatures or behaviors. In another aspect, the method further comprises generating one or more rules and signatures for fully-automated malware detection systems. In another aspect, the method further comprises identifying one or more of the following: one or more indicators of compromise and malicious activities; one or more system components that are affected, tampered or damaged by the object code; how the object code functions and infects the computer system; a primary target of the object code; one or more suspicious events that occurred on a network; and an impact on the host system and its registry. In another aspect, the data capturing the run time behavior of the binary object comprises an interaction of the binary object with the host system with respect to one or more of a file system, a registry, a network, a process and a process profile. In another aspect, the method further comprises selecting the data capturing the run time behavior of the binary object. In another aspect, extracting the data from the run time behavior of the binary object comprises processing the data into a time series data and a dependency data. In another aspect, the one or more interactive visual representation of the run time behavior of the binary object further comprises: an activity overview; a network visualization; or a libraries call. In another aspect, the method further comprises zooming, with the computer system, one of the one or more interactive visual representations of the run time behavior of the binary object. In another aspect, classifying, with the computer system, the binary object as malicious or benign further comprises classifying, with the computer system, the binary object as malicious, suspicious, undetected or harmless. In another aspect, the object code contains one or more of a remote access Trojan, a Trojan, a backdoor, a ransomware, an email flooder, a behavioral malware, and a hacktool malware. In another aspect, the method further comprises: executing the binary object on a host system; and logging the data capturing the run time behavior of the binary object during execution of the binary object into a file. In another aspect, the executing and logging steps are performed by a data provider. In another aspect, the host system comprises a sandboxed system. In another aspect, the data capturing the run time behavior of the binary object comprises one or more execution traces. In another aspect, the method further comprises: filtering out one or more first functions by default system operations; or filtering out one or more second functions that are not commonly encountered by malware. In another aspect, the method further comprises: receiving an output of an anti-virus tool using an application programming interface; and incorporating the output into the one or more interactive visual representations of the run time behavior of the binary object.
In another embodiment of the present disclosure, a non-transitory computer readable medium containing a set of instructions that, when executed by a processor, cause the processor to: receive data capturing a run time behavior of a binary object; extract the data from the run time behavior of the binary object; map the extracted data into one or more interactive visual representations of the run time behavior of the binary object comprising a scalar representation of process calls, dependencies among processes and executable files, or time dependencies between the processes and the executable files; and classify the binary object as malicious or benign.
In one aspect, further comprising causing the processor to identify an intention and a location of a malicious payload in the binary object. In another aspect, further comprising causing the processor to analyze an unknown malware within the object code by recognizing one or more unusual signatures or behaviors. In another aspect, further comprising causing the processor to generate one or more rules and signatures for fully-automated malware detection systems. In another aspect, further comprising causing the processor to identify one or more of the following: one or more indicators of compromise and malicious activities; one or more system components that are affected, tampered or damaged by the object code; how the object code functions and infects the computer system; a primary target of the object code; one or more suspicious events that occurred on a network; and an impact on the host system and its registry. In another aspect, the data capturing the run time behavior of the binary object comprises an interaction of the binary object with the host system with respect to one or more of a file system, a registry, a network, a process and a process profile. In another aspect, further comprising causing the processor to select the data capturing the run time behavior of the binary object. In another aspect, extracting the data from the run time behavior of the binary object comprises processing the data into a time series data and a dependency data. In another aspect, the one or more interactive visual representation of the run time behavior of the binary object further comprises: an activity overview; a network visualization; or a libraries call. In another aspect, further comprising causing the processor to zoom one of the one or more interactive visual representations of the run time behavior of the binary object. In another aspect, causing the processor to classify the binary object as malicious or benign further comprises causing the processor to classify the binary object as malicious, suspicious, undetected or harmless. In another aspect, the object code contains one or more of a remote access Trojan, a Trojan, a backdoor, a ransomware, an email flooder, a behavioral malware, and a hacktool malware. In another aspect, further comprising causing the processor to: execute the binary object on a host system; and log the data capturing the run time behavior of the binary object during execution of the binary object into a file. In another aspect, the executing and logging steps are performed by a data provider. In another aspect, the host system comprises a sandboxed system. In another aspect, the data capturing the run time behavior of the binary object comprises one or more execution traces. In another aspect, further comprising causing the processor to: filter out one or more first functions by default system operations; or filter out one or more second functions that are not commonly encountered by malware. In another aspect, further comprising causing the processor to: receive an output of an anti-virus tool using an application programming interface; and incorporate the output into the one or more interactive visual representations of the run time behavior of the binary object.
In another embodiment of the present disclosure, a system comprises: a database; a memory; and one or more processors communicably coupled to the database and the memory, wherein the one or more processors receive data capturing a run time behavior of a binary object, extract the data from the run time behavior of the binary object, map the extracted data into one or more interactive visual representations of the run time behavior of the binary object comprising a scalar representation of process calls, dependencies among processes and executable files, or time dependencies between the processes and the executable files, and classify the binary object as malicious or benign.
In one aspect, the one or more processors identify an intention and a location of a malicious payload in the binary object. In another aspect, the one or more processors analyze an unknown malware within the object code by recognizing one or more unusual signatures or behaviors. In another aspect, the one or more processors generate one or more rules and signatures for fully-automated malware detection systems. In another aspect, the one or more processors identify one or more of the following: one or more indicators of compromise and malicious activities; one or more system components that are affected, tampered or damaged by the object code; how the object code functions and infects the computer system; a primary target of the object code; one or more suspicious events that occurred on a network; and an impact on the host system and its registry. In another aspect, the data capturing the run time behavior of the binary object comprises an interaction of the binary object with the host system with respect to one or more of a file system, a registry, a network, a process and a process profile. In another aspect, the one or more processors select the data capturing the run time behavior of the binary object. In another aspect, the one or more processors extract the data from the run time behavior of the binary object comprises the one or more processors process the data into a time series data and a dependency data. In another aspect, the one or more interactive visual representation of the run time behavior of the binary object further comprises: an activity overview; a network visualization; or a libraries call. In another aspect, the one or more processors zoom one of the one or more interactive visual representations of the run time behavior of the binary object. In another aspect, the one or more processors classify the binary object as malicious or benign further comprises the one or more processors classify the binary object as malicious, suspicious, undetected or harmless. In another aspect, the object code contains one ore more of a remote access Trojan, a Trojan, a backdoor, a ransomware, an email flooder, a behavioral malware, and a hacktool malware. In another aspect, the one or more processors: execute the binary object on a host system; and log the data capturing the run time behavior of the binary object during execution of the binary object into a file. In another aspect, the executing and logging steps are performed by a data provider. In another aspect, the host system comprises a sandboxed system. In another aspect, the data capturing the run time behavior of the binary object comprises one or more execution traces. In another aspect, the one or more processors: filter out one or more first functions by default system operations; or filter out one or more second functions that are not commonly encountered by malware. In another aspect, the one or more processors: receive an output of an anti-virus tool using an application programming interface; and incorporate the output into the one or more interactive visual representations of the run time behavior of the binary object.
Note that the invention is not limited to the embodiments described herein, instead it has the applicability beyond the embodiments described herein. The brief and detailed descriptions of this disclosure are given in the following.
For a more complete understanding of the features and advantages of the present invention, reference is now made to the detailed description of the invention along with the accompanying figures and in which:
Table 1 depicts a feature-based comparison of Anyrun [39], Hybrid and MalView;
While the making and using of various embodiments of the present invention are discussed in detail below, it should be appreciated that the present invention provides many applicable inventive concepts that can be embodied in a wide variety of specific contexts. The specific embodiments discussed herein are merely illustrative of specific ways to make and use the invention and do not delimit the scope of the invention.
To facilitate the understanding of this invention, a number of terms are defined below. Terms defined herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a”, “an” and “the” are not intended to refer to only a singular entity but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not limit the invention, except as outlined in the claims.
Various methods are described below to provide an example of each claimed embodiment. They do not limit any claimed embodiment. Any claimed embodiment may cover methods that are different from those described above and below. The drawings and descriptions are for illustrative, rather than restrictive, purposes.
Malicious applications are usually comprehended through two major techniques, namely static and dynamic analyses. Through static analysis, a given malicious program is parsed, and some representative artifacts (e.g., control-flow graphs) are produced without any execution; whereas, the given malicious application needs to be executed when conducting dynamic analysis. These two mainstream techniques for analyzing the given software are effective in detecting certain classes of malware. More specifically, through static analysis, the patterns and signature of the malware are exposed, helping in detecting any known malicious payload hidden in or injected into the code. On the other hand, behavioral and run-time execution patterns of software are explored through dynamic analysis. To case the analysis process, a third analysis approach, known as the visual representation of the artifacts created by both static and dynamic analysis tools, would also be a supplementary asset for malware experts. This disclosure introduces MalView, an interactive visualization platform, for malware analysis by which pattern matching techniques on both signature-based and behavioral analysis artifacts can be utilized to: 1) classify malware, 2) identify the intention and location of the malicious payload in the artifacts, 3) analyze unknown malware (i.e., zero-day malware) by recognizing any unusual signature or behavior, and 4) explore the time dependencies and thus the system components affected or tampered by the underlying malware. The results of several case studies conducted show that MalView offers more features and information compared to some other visualization tools, facilitating the malware analysis process.
MalView is an analysis-oriented development to the previously created malware visualization tool [3]. MalView emphasizes comprehensive understanding from visual analytics with in-depth, multi-faceted explorations of malware behavior and scalability to multiple malware families. The result of malware triage and analysis is significantly enhanced if a provenance of software artifacts can be identified, especially when specific attributes of suspected malware are used to identify similarities to a set of known malware artifacts, as shown by Casey et al. [4]. In light of improving malware analysis utilizing malware artifacts, one embodiment provides a detailed graphical representation for malware analysis to identify: (1) indicators of compromise and malicious activities, (2) tampering, modification, and possible damages occurred on the system, (3) the mechanics of how malware functions and infect, (4) the primary target of the malware, (5) the suspicious events occurred on the network, (6) the impact on the host and its registry, and more notably (7) the time and process dependencies occurred while executing the malware, a key feature of MalView. An application and demonstration video of one embodiment of MalView can be accessed at malview.netlify.app and malview.netlify.app/video.
Malware visualization systems can be categorized into three categories: Malware forensics, Malware Comparison, and Malware Summarization [5]. The work in MalView is under Malware Forensics and Malware Comparison categories: assisting the understanding of the behavior of an individual malware sample for forensics. By exploring the characteristics and relationships between the process and its dependencies and mapping them to visual features, MalView provides an interactive and intuitive platform to comprehend malware behavior towards the ultimate goal of generating rules and signatures for fully-automated malware detection systems. To demonstrate the effectiveness of MalView in identifying and interpreting malicious and suspicious activities of malware, this disclosure reports the analysis of different families of malware namely: Remote Access Trojans (RATs), Backdoor, Ransomware, Behavioral, Email Flooder, and Hacktool. The results show that using MalView it is possible to quickly understand the main functionalities of the underlying malware without delving into a complex analysis of the static and dynamic analysis reports. In some embodiment, the MalView visualization tool visualizes the output of several dynamic and static analysis tools. The tool also integrates the output of many anti-virus tools using their Application Programming Interface (API) to provide additional insights for each malware.
While conducting the case studies and inspecting some malware families, the different behavior exhibited by the same malware on different operating system (OS) platforms was noticed. As a result, each malware was executed and inspected on three different Windows platforms: Windows XP, Windows 7, and Windows 10 in order to recognize the impact of environmental settings on malware comprehension and analysis. Even though the execution of each malware was performed in a controlled environment, it was noticed that the newer platforms of Windows operating systems (e.g., Windows 10) were creating more system and kernel-level processes making it harder to thoroughly inspect and analyze the exact flow of each malware on these recent versions of platforms. As a remedy for such problem, it is suggested to apply additional filtering mechanisms in order to analyze each malware and its processes thoroughly.
Dynamic analysis aims at studying the behavior and actions of a malware sample when it is executed. This technique analyzes malware and returns the collected information of such behavior and actions for further processing or analysis [5]. Assuming that the malicious sample does not employ any anti-forensics guards, in this disclosure, the Windows Sysinternals Process Monitor [6], or Procmon, is employed to capture the run time behavior of malware during execution. MalView visualizes the outputs and traces produced by Procmon rather than explicitly executing a given malware directly. In other embodiments, MalView could execute the malware directly or indirectly through a data provider. During the dynamic analysis of malware execution, Procmon can capture five types of events that the Windows-based malware interacts with the host system: 1) file system, 2) registry, 3) network, 4) process and 5) profiling.
While capturing dynamic behavior of malware, it is important to use a proper Procmon filtering to avoid capturing unnecessary information from the normal execution of the system. Furthermore, even when the underlying system is idle, it has numerous background processes running that can be captured by the Procmon. As a result, the authors filtered out the activities by capturing suspicious processes only represented by functions commonly encountered by malware analysts [7], [8]. Furthermore, they excluded the default system operations such as Procmon, Autoruns, Sysmon from further visualization and analysis.
Procmon provides records of Windows activities through the low-level system events, where thousands of events are generated every minute. The standard output in Comma Separated Value (CSV) format from Procmon is used as the primary input for visualization components in MalView. One row in the CSV log file demonstrates one specific event and comprises of these major attributes [9]:
The log file output from Procmon contains five major types of process activities, which are color-coded in our framework.
MalView is aimed at accelerating malware analysis and integrating visual analytics to enable interactive data exploration and malware behavior comprehension.
The MalView prototype provides visual representations for system and malware activities captured by Procmon [6] utility. In the context of malware analysis, four important system-level activities 112 are of utmost importance that need to be captured, namely registry 114, file system 116, processes and threads 118, and network activities 120. These are four major categories that are highlighted by InfoSec [7], [8] as an indication of malicious activities. The processes and events related to these four activities 112 are captured by Procmon 106, filtered, and then fed to MalView 102 where the data is preprocessed 122 into time series data 124 and dependency data 126. The processed data is then used in processes visualization 128, activity overview 130, network visualization 132, libraries calls 134, and analysis summary 136. The host system 108 may also provide data regarding connecting domains 138 for external analysis 140. The data from the external analysis 140 is provided to connecting domains analysis 142 via one or more APIs, which is then used by the analysis summary 136.
MalView 102 provides an analysis of linked views 144 with interactions for users to gain comprehensive insights into malware behaviors within the system, such as filtering, highlighting, ordering and details on demand. Details of MalView visual components with their corresponding interactive features are described in the following section, MalView: Visual components.
The tool MalView is developed as a web-based application using JavaScript and D3.js library created by M. Bostock et al. [10]. The primary goal of MalView is to provide an interactive visualization platform that demonstrates the malware behaviors and interactions within the system. The captured events are presented in multiple perspectives: in a temporal manner of processes and function calls between them, the dependency graph between a process and the objects it operates on, including registry, system files, network addresses, and dynamic-link libraries. The platform gives classification of malicious or benign connecting domains with further analysis. To meet the primary goal, MalView implements the analysis tasks below, based on the analysis task types for employing information visualization systems [11]:
Taking into account the mapping to time, the associations between processes and dependencies, and guided by the designed tasks, the user interfaces of MalView were designed.
MalView provides the options for users to select input data from a default set of data samples or from their local machine, as depicted in
After the input file is uploaded, the operation overview panel A shows how the operations are categorized and allows user to observe the prevalence of event types (visualization task T3). Each event type 202 is represented as a rectangle, stacked horizontally by its category in a bar chart visualization. There are four color hues representing four categories: yellow for File System, blue for Process and Thread, green for Registry, red for Network. The color coding employed is based on the category the event type belongs to and incorporated with the statistics of the amount of total corresponding function calls during the monitored period. An individual event type is mounted with interactivity: it acts as a button providing filtering upon mouse click, the result of which is shown directly on the below adjacent panel, processes activity panel B.
A special group of existing critical operations 204, defined by “Commonly encountered” from InfoSec [7], [8], is shown on the right of panel A, such as Create File, UDP Send, TCP Connect, TCP Send, etc. File System is mostly made up of Create File and Write File. Process Thread is mostly made up of Load Image. Registry is mostly made up of Registry Create Key. Network is mostly made up of TCP Connect, UDP Recieve and UDP Send. All the available operations captured that match the commonly encounter criteria are presented. This list serves as a selection box for highlighting critical activities in both operation overview and processes activity (visualization task T5).
Building upon the visual information mantra by Shneiderman [12]: “Overview first, zoom and filter, then details-on-demand,” the process activity in
Each process is associated with an aligned set of events executed by the process itself. An individual event is represented by a thin vertical bar, color coded by its event type, which is introduced in section II-B and presented in panel A. These small, thin bars are presented with 50% transparency so that if multiple events appear at nearly the same time, the color will add up on display (visualization task T3); therefore, users can see that the calls are busy there and there is a chance for anomalies detection at these spots (visualization task T7).
The interaction arc starts from the parent process (the one that initializes the call, or the source) and ends at the child process (the destination, or the target of the call) (visualization task T4). The interactions here are the typical events of processes and thread, as specified in section II-B; hence they have the blue color of process and thread category. One of the most common events in this category is Process Create, in which a process creates a new process and its primary thread. Besides the source-target interactions, MalView also supports to visualize the call-to-self events (or the loops). In this case, the process is both the one that initializes and the target of the call.
MalView supports details-on-demand in terms of process detail, event call detail, filtering calls related to one specific process, and zooming in a period (visualization task T2). The details of an event can be shown on the tooltip by mousing over the corresponding bar, including process name, operation, event type, timestamp, process ID, and additional operation-specific information about the event, as shown in
This processes activity panel supports the detail view by a magnification feature called “Lensing” (visualization task T2). When this feature is enabled, hovering along the timeline will expand the current window at that time step. For example, panel B1 in
The API automatically scans a given malware, and their patterns are automatically compared with more than 70 servers and databases. The virus classification result consists of four categories: malicious, suspicious, undetected, or harmless, each indicated by the number of
detections found corresponding to the targeted domain. Spring et al. [13] discussed that the malicious domains are attempts to connect with a command and control server or dropbox and are expected to behave differently from a typical phishing or a drive-by-download malicious site. In MalView, this list of connecting domains is ordered by the variety of the outcomes of each domain (visualization task T6).
This process dependencies view in
In an effort to provide its users with a safe and productive experience, Microsoft provides information about malware and unwanted applications affecting its operating systems online [18] and details about these in its documentation platform [19]. Microsoft [19] classifies malware into 13 categories categories: 1) Backdoor, 2) Downloader, 3) Dropper, 4) Exploit, 5) Hacktool, 6) Macro virus, 7) Obfuscator, 8) Password Stealer, 9) Ransomware, 10) Rogue security software, 11) Trojan, 12) Trojan clicker, and 13) Worm. Furthermore, Microsoft also provides a tool to search for current cyber threats, viruses, and malware in its online platform called Microsoft Security Intelligence (MSI) platform [20].
MalView can be utilized in different settings. 1) When the objective is to comprehend malware functionalities and not detection, 2) when a new malware application (zero-day malware) is developed and not detectable by any tool (due to lack of profiles and signatures), 3) when the objective is to classify a family of malware and then employ a set of generic solutions and remedies to address each class of malware, and 4) when new malware is developed, and we are interested in investigating whether it follows some existing known malicious patterns or not (i.e., labeling malware type). Accordingly, if there is an incident report about zero-day vulnerability where there is no clear patching solution developed, MalView can help analyze and comprehend the malware with zero-day vulnerability and thus enable the identification of patches or solutions better. To demonstrate the usability of MalView in analyzing malware software visually, a set of case studies were conducted in which the output and behavior of the selected malware were captured. Due to the space limit, the processes involved in seven malware were captured and presented, namely 1) Backdoor, 2) Remote Access, 3) Behaviour, 4) Ransomware, 5) Email Flooder, 6) Hacktool, and 7) Trojan (Info stealer). The following sections demonstrate the applications of MalView to several of these malware types.
The malware experimentation setup needs an isolated and controlled environment so that the malicious code does not propagate or infect other entities in the network. This clean and isolated environment also helps to identify the changes and possible tampers in the system due to the malicious activities of the malware specimen. For this work, three different Windows systems were installed on an Oracle Virtual Box: Windows XP, Windows 7, and Windows 10. The windows defender services, windows security services, firewalls, and other automatic security updates were disabled on each of the virtual OSs to prevent any interruption during the malware sample's execution and capture all the traces of their dynamic behavior. To capture the interaction between the malware and each host system, Procmon was installed on all environments. More specifically, all the user applications on the virtual OS were closed, the malware process name was added to the monitor filter to capture only the events of the malware executable. Then the executable was run for two minutes before saving the time-ordered system activities from Procmon and fed to MalView.
Since MalView depends on the output of Procmon, the amount of information it visualizes depends on how long Procmon is executed. The execution time also shortens the amount of data captured by Procmon. According to experience with MalView, a larger and more complex output and traces produced by Procmon makes MalView less effective since the visualization needs to capture a vast number of processes and events. However, a key feature of MalView is to offer different levels of abstraction and complexity. If the window width (interface size) is adjusted and the sample rerun, the visual components would readjust to fit the new window size. More specifically, the execution time depends on how large the malware sample is, ranging from 0.3 s to several seconds.
Remote Access Tools are useful applications to provide administrative assistance to the end-users remotely. However, these pieces of software are increasingly abused by adversaries to gain control over the target systems and are referred to as Remote Access Trojans (RATs). RATs are distributed through email attachments or as a patch with pirated software to infect the target in order to gain administrative control. Once the target machines are infected, RATs have complete control over the victim system to perform malicious activities, such as password sniffing, keylogging, track file transfer information, webcam feed, control the system by issuing shell commands, or even propagate some other malwares/viruses. RATs are particularly hard to detect, as they execute legitimate operating system processes resembling the behavior of other commercial remote access tools, and they usually do not show up as running tasks. Besides, there are tools that enable performing obfuscation on a given application and produce obfuscated malicious applications. Using various obfuscation methods, along with managing resource utilization, RATs can remain undetected. According to the October 2018 Global Threat Index published by Check Point, RATs are ranked among the top 10 “most wanted” malware.
The run time behavior of RATs was captured on different Windows and visualized using visualization tool MalView. The live malware sample was downloaded from public malware dataset VirusShare [22]. According to a multi-scan report from Virustotal [23], this sample has a community score of 66 out of 70, i.e., out of 70 detection engines, 66 could identify it as a malicious executable.
Besides the malware-associated events, MalView is also able to capture the recurrent pattern of periodical operations, such as the system process of Local Security Authority Subsystem Service lsass.exe or Virtual Box's vboxservice.exe. The influence of running platform will be discussed further in Section VI.
A Trojan is a type of malware that pretends to be a benign program, but after installation, it executes hidden code and then performs malicious activities such as deleting or tampering with data, stealing information, running some other scripts, and creating backdoors. In general, it enables the attacker to access the victim's system, and these types of malware are not able to replicate themselves [24].
A sample of Trojan (MD5: b3eebe51ccc4a95815ddef3ef55604d2) was obtained from VirusShare [22]. The output file containing all the processes was created after running the malware in a controlled environment using Windows 7 as its platform.
By clicking the name of this malware on MalView Process Activity window 502, one can observe the file dependencies in dependency graph 504 and that this executable file has created two processes: cmd.exe and tmp.exe (at the blue links). By further clicking on the child process, one can retrieve the list of processes created by cmd.exe and tmp.exe. Then, the cmd.exe process has created two child processes: reg.exe and timeout.exe. The tmp.exe process did not create any child process. The process networks of cmd.exe, tmp.exe, reg.exe, and timeout.exe are overlaid on top of the process timeline on request as shown in dependency graph 506, 508, 510 and 512, respectively.
MultiInjector, under trojan classification, is a trojan that tries to inject code into other processes to hide or execute its payload and download and install other malware [25].
By exploring details-on-demand via mousing over, as shown in panel (c), the first event in this sequence as shown in window 606 is Process Create from the malware to cmd.exe leading to the subsequent calls. Around 12:23:47, the second event in this sequence as shown in window 608 there are four consecutive Process Create calls from the malware to net.exe. The subsequent calls can be seen in panel (b) and panel (a) (for a broad view). Finally, the repeated event patterns associated with malware are clear in panel (d): one Process Create event from the malware to cmd.exe, followed by the four subsequences to net.exe. The behavior from this observation aligns with the characteristics of the malware of injecting code into other processes. The visualization helps to discern these low-level operations from the malware to other system processes.
A backdoor is a type of malware that provides unauthorized remote access to the compromised system by exploiting security vulnerabilities. The malware works in the background while hiding from the user. Meanwhile, it enables the attacker to have access to the victim's computer, such as databases and file servers, as well as running system-level commands. The process of injecting Backdoor is usually performed in two stages: First, a small file, called a dropper, is installed. Second, the dropper downloads the main malicious file from a remote location [26]. It is important to mention that Trojan and backdoor malwares are not the same: A Trojan might contain a backdoor, but a backdoor can execute as a stand-alone program without being a part of a Trojan.
The MalView visualization 700 of malware Backdoor Androm execution on Windows 7 is presented in
The overlay dependency graphs in
A typical ransomware program encrypts the victims' computer files and demands a ransom to restore access to the data. A ransomware program locks a system, utilizes some visual messages, imposing law enforcement to threaten the target. The ransomware scam has matured over time, utilizing different methods to impair a computer. According to a report published by Symantec [27], the latest advancement prevents the computer from functioning and dismisses the
client from gaining any access. The system at such a stage displays a message that proclaims to be from a local law enforcement organization. The ransomware application asks for money in exchange for letting the users re-gain access to their systems. In recent news in July 2021 by Malwarebytes report [28], a severe ransomware attack was reportedly taking place against the popular Remote Monitoring and Management software tool Kaseya VSA. This attack forced an immediate shut down of the VSA servers, where Kaseya VSA was used to encrypt over 1,000 businesses. The attackers asked for $70M in exchange for a universal decryptor. Also reported by Malwarebytes [29], 35% of small and medium-sized businesses were under attack of ransomware. A lot of times, these organizations end up paying for the ransom. According to a multi-scan report from Virustotal, the sample studied in this paper has a community score of 47 out of 72, i.e., out of 72 detection engines, 47 could identify it as a malicious executable.
At the time of this writing, a search on Microsoft Security Intelligence threat search platform [20] returned 500 malware as Behavior type, in which the distribution of alert levels was as 400, 38, 3, and 16 for severe, high, moderate, and low, respectively. A behavior type of malware generally includes malware that exhibits suspicious activities, but it is not classified into a specific popular category of malware. This type of malware is difficult to detect because its activities can greatly vary depending on the intention of the underlying malware and the current user context. Our study of several malwares in “Behavior” type shows that these suspicious activities include 1) disabling system recovery, 2) deleting shadow copies, 3) hidden code executions, 4) creating files in the user's system, 5) changing the registry key to run itself, and 6) accessing to netsh.exe to modify firewall configuration that allows itself to run on system startup. Examples of such behavioral malware include Bladabindi.gen [30], Vawtrak.A [31], and Teerac.B [32]. Furthermore, some behavioral malware (e.g., MultiInjector [33]) involves accessing the command prompt (CMD).
For instance,
Hacktool is a piece of software that malicious attackers use to gain unauthorized access to user's devices [18]. As of the time of this writing, Microsoft lists 188 active entries as Hacktools, of which 93 are severe, 80 are high, and 15 are moderate in terms of alert levels [20]. The popular attacking channel for Hacktool is via insecure Universal Serial Bus (USB) communication design and Windows Autoplay features [34]. Malicious activities for Hacktool launched from USB include 1) changing registry settings, 2) installing a backdoor, 3) stealing confidential information, and 4) reading data encryption keys. Recently, besides Trojan, Hacktool is also the second most prevalent type of malware embedded in pirated software [35].
Determining whether the connecting domains from network activities are malicious or benign is important. The classification 1100 for malicious connecting domain for the malware Mailpassview is shown is
To examine how malware behaves in different platforms, multiple malware were executed on Microsoft's mainstream Windows OSs.
The ransomware samples were collected under different Windows platforms and had their behaviors compared using MalView.
The “email flooder” malware was chosen to compare the visualization for this sample run in different platforms, including Windows XP in panel (a), Windows 7 in panel (b), and Windows 10 in panel (c), as depicted in
One of the key features and benefits of employing visualization tools is to perform pattern detection and classification visually prior to delving into analytical approaches. MalView captures key features that are indicators for profiling classes or families of malware.
More specifically, using MalView it is possible to capture features such as volume of processes, registry activities, files manipulation and accesses, and network activities. As described below, these features are able to detect any “behavioral” patterns in the set of malware studied and thus enable us to classify them according to their dynamic behavior. Instead of trying to generate patterns of interest, in this study, we show how the analysis works based on malware behavior tracing, the kind of information it entails, and how the tool can enable analysts to quickly study the interaction of malware with system internals using selections, focus and context technique, and aggregations.
MalView focuses on the interactions of the malware program to other system's internal processes. While Procmon, as the data provider, brings detailed information into each of the processes running in the system, the interval and log activity captured may be subjective to the person behind the capturing execution. To focus on the time interval in which one can witness the most significant amount of malware activity to other system's internal processes, called busy interval, focus and context visualization technique was applied in MalView to support 1) close-up view for individual malware analysis and 2) standardization for malware comparison. To accommodate the context around the focal point, the interval was selected that satisfies either ensuring the equal paddings to the first and last interaction to the boundary of the interval or equal paddings to the peak of the area chart—where the highest amount of interactions are witnessed.
This section compares the features offered by MalView with the ones offered by some other malware visualization tools, including Hybrid [38] and AnyRun [39]. First, each visualization tool is briefly reviewed and then its features are compared.
Funded in 2016 by a Russian security researcher, Alexey Lapshin, AnyRun [39] offers a free “interactive” sandbox tool for dynamic analysis of malware. The tool enables uploading a suspicious file and, in the meantime, interacting with the sandbox and thus with malware to trigger some functionalities or execute macros embedded into the uploaded file. AnyRun offers several key features, as follows:
The tool generates a fine-format report for publication and sharing purposes. The professional-looking report consists of supporting screenshots, Process Behavior Graphs, indicators of being malicious/suspicious, and many other components.
Hybrid [38] is a free malware analysis tool that enables both static and dynamic analysis. It utilizes several analysis reports and sandbox tools, including Falcon Sandbox [41], a dynamic analysis framework. In addition to the dynamic analysis offered by Falcon Sandbox, Hybrid integrates some other anti-virus tools such as VirusTotal, OPSWAT Metadefender, SIEM systems, NSRL (i.e., white listing), TOR (e.g., avoiding external IP fingerprinting), Phantom, Thug Honey Client (e.g., URL exploit analysis), and Suricata (ETOpen/ETPro rules). The tool provides several useful analysis features such as:
This section aims to highlight the key features of AnyRun [39] and Hybrid [38] in comparison with the features offered by MalView. The comparison is performed through the classification of features into 1) general features, 2) behavioral activities and dynamic analysis, 3) structure based and static analysis, and 4) network-level analysis. Table 1 lists the features classified into these four groups.
MalView offers not only comparatively similar features but also additional features that are unique to MalView. More precisely, the tool offers features such as 1) compliance with InfoSec classification with respect to malicious processors and indicators (Feature #4), and 2) simplification of visualization through filtering and focusing only a subset of processes for the analysis (Feature #5).
The features related to dynamic analysis are considerably diverse. As a result, each analysis tool offers its own set of unique features. Given the fact that MalView mostly visualizes the output of Procmon [6], it is primarily a dynamic analysis tool. Depending on how the underlying malware visualization tool is implemented, most of these tools are able to visualize the “basic” sets of dynamic data captured through Procmon or similar utilities. For instance, as Table 1 shows, most of the behavioral features are visualizable by these three tools.
On of the major and key features that is unique to MalView is the exploration of “time dependencies between processes” (Features #15 and #16). The visualization of time and process dependencies are an important part of malware analysis in order to comprehend the nature of the underlying malware.
As stated earlier, MalView is primarily a visual analytics tool based on the output of the dynamic analysis of the underlying application or malware. As a result, it is less focused on visualizing static features of executable files. However, MalView is integrated with several static analysis tools, including VirusTotal, and thus is capable of capturing this information and visualizing them accordingly. VirusTotal is able to capture static information such as the size of header files, type of files, PE Specific, and other static and signature-based features. As a result, MalView can visualize all the information captured by VirusTotal and uses its API to retrieve this information and visualize them accordingly.
Similar to signature-based and static analysis features, MalView is less focused on visualizing purely network-level features. However, given the strength of Procmon in capturing all related processes and events, MalView is capable of visualizing the network-level events and processes captured by Procmon and thus provides a process-level view on this network-level information.
The malware analysis methods can be broadly categorized into static vs. dynamic analysis [44]. Many of these approaches utilize visual representation to enable the analysts to visually capture general activities related to malware from a large number of data files or logs which are infeasible to digest in text or binary format [45].
Panas [46] visualized software binaries in order to demonstrate malware samples. In their approach, they first disassemble the file to obtain the Abstract Syntax Tree (AST) and then provided the intermediate representation of the file by using ROSE [47], an open-source compiler. Visualizing the signature of a set of different malware families, they were able to show the changes in different versions of a malware family. Also utilizing visualizations in dynamic malware analysis, Grégio et al. [45] proposed a solution with two interactive visualization tools. The two visualization prototypes are a timeline with a magnifier and a spiral view of the malicious activities. The first tool provides analysts with views of the malware activities over time. While the time selection for the x-axis is similar to ours (and many others), the uses of colors and what is to be presented in the y-axis are different. They used the y-axis to represent activities and colors to different processes or services involved by the malware execution. Each event (an activity at a timestamp of an involved process) is presented by a circle connected by a line, which represents changes over time. This presentation leads to the visual cluttering issue, especially when malware does many different activities in a short time interval [48].
Gove et al. [49] presented their tool Similarity Evidence Explorer for Malware (SEEM), which compares a focal sample of malware with other malicious samples in the database. The malware features are grouped into nine categories, and feature similarities are visually presented in three ways: 1) histogram, 2) Venn diagram list, and 3) a feature matrix. The histogram utilizes the Jaccard similarity of the features of the focal sample with the other samples. In contrast, the Venn diagram is more granular and shows information of overlap, strict subset, and disjoint features. The feature matrix highlights the specific features present in the analyzed sample. Long et al. [50] proposed a versatile and instinctive technique to identify a given malware file from its image sets. The authors argued that the desktop icons are one effective social engineering attempt employed by some malware developers. The victim clicks on the icon resulting in the execution of the malware. Hence, comparing a new malware based on its image with a previously known malware database results in effective malware identification. Extracted greyscale malware images
are shown using a force-directed graph [15], [17], which is essentially a similarity network of sample malware images computed with the nearest neighbor index. The visualization tool shows hash values of the executable, and upon clicking on the values, it draws the similarity network graph, and it provides zooming functionalities for multiple hops of each node.
While static analysis is computationally efficient, its performance could be impaired by packed or encrypted malware. On the other hand, dynamic analysis analyzes actual behaviors from the malware while it is running, so it is more efficient [51]. In their work [52], Shaid and Maarof proposed a method to generate images representing malware API calls. First, the API calls are monitored in the malware behavior capturing step. These calls are then sorted from malicious to less malicious. Finally, each API call is assigned a color depending on its maliciousness level. Similarly, Kancherla et al. [51] proposed to convert the malware into a gray-scale image called byteplot. They then used machine learning (ML) methods (e.g., Support Vector Machines) to analyze the low-level features (e.g., intensity and textures) extracted from the resulted images. Regarding ML approaches, LeDoux and Lakhotia [53] presented that ML has a natural fit with malware analysis, where ML operates by rapidly learning, discovering inherent patterns and similarities in the corpus.
With image-based malware classification, O'Shaughnessy [54] utilized the space-filling curves approach to formalize a scalable solution for classification ambiguity among anti-virus programs. Donahue [55] proposed another idea of using Markov Byte Plot [56] to convert Portable Executable (PE) files into truecolor (defined by red, green, and blue (RGB) color components) images that help to highlight the differences between the packed and unpacked malware. Another common approach is to convert malware PE or binary files into images and analyze the resulted images. There are various ways to turn the malware into images and different methods to analyze the produced images. For instance, Han et al., [57] proposed a three-step approach to analyze malware in this direction. First, the opcode sequences of the malware are extracted in Step 1. Step 2 generates an image with both width and height are of 2n, where n is a user-defined number. Next, this step applies a hash function, such as SimHash [58], to each of the extracted opcode sequences to generate a pixel with corresponding x-y position and RGB color. Finally, similarities between the resulted images are calculated in Step 3. In their extended version of this approach [44], they also incorporated dynamic analysis to filter for essential opcode sequences.
Miles et al. [59] presented VirusBattle, a system equipped with intelligence navigation and visualization to mine and to discover interrelationships between malware instances automatically. This system provides two primary analyses: 1) a program's dynamic trace tree and 2) a scalable method of discovering shared Computed Semantics artifacts among instances of malware. VirusBattle analyzes the interrelationships over many types of malware artifacts, including the binary, code, code semantics, dynamic behaviors, malware metadata. Shaid and Marrof [60] proposed the method of presenting the behavioral pattern of malicious files using a Hot-to-Cold color ramp. As the malware runs, the user-mode API calls are captured, then ordered and grouped based on their maliciousness. This behavior-to-color map of the malware helps visualize when and in which order a malware sample performs malicious activities during execution.
Besides software systems, research in hardware advancement has introduced many approaches that facilitate malware analysis to build a transparent dynamic analysis system. In terms of hardware virtualization extensions, Dinaburg et al. [61] proposed Ether, an application that remained transparent and defeated a large percentage of obfuscation tools. Later, Lengyel et al. [62] built DRAKVUF on a similar virtualization extensions approach and provided greater insight into the execution of the system to trace system execution for malware analysis.
This visual analytics approach in MalView can benefit the branch prediction in dynamic environment analysis in GoldenEye by Xu et al. [63] and the analysis of sequences of API calls in VECG by Alaeiyan et al. [64]. The interactive visual representations can expedite the process of proactively detecting environment-sensitive and context-based behaviors, where “human-in-the-loop” can accelerate early stopping and quickly capture patterns that emerged from the API call sequences.
Most static approaches focus on comparing, clustering malware instances, or classifying if a new sample belongs to a known family of malware. For example, Paturi et al., [65] used Pythagoras tree to represent the similarities in codes between malware. The similarity metrics might be “Cosine similarity” or “Normalized Compression Distance.” The hierarchical structure of the Pythagoras tree is characterized as the distance between nodes at each lower depth of the tree is reduced by √{square root over (2)}/2. Thus, the tree helps bring malware with higher similarities into clusters as leaf nodes with shorter distances stay close to one another.
Anderson et al. [66] presented a malware classification system that works based on the combination of static and dynamic features. For static feature extraction, they used three sources, including 1) the binary file, 2) the disassembled binary, and 3) the control flow graph of the disassembled binary file. For dynamic feature extraction, they used dynamic instruction sequence and the dynamic call sequence. They tested their system using a large malware dataset and achieved 98.07% accuracy with the combined static and dynamic features. They also achieved a 96.14% accuracy by using only static features. Yoo [67] designed the visualization based on the belief that malicious content in an executable file has a unique feature called SOM (Self-Organizing Map). By calculating the SOM and visualizing a specific executable file, the potential portion of the malicious content can be determined, and by checking the generated pattern, the malware family can be detected.
Saxe et al. [68] developed an interactive visualization system for comparing malware samples in a dataset using the extracted features. Based on the presence of the system call sequence, the similarity matrix for the malware dataset is generated. This system also provides a comparison view among malware samples based on their malicious activity. On mobile computing platforms such as Android devices, Jenkins and Cai [69], [70] explored Inter-Component Communications (ICC) via interactive visual explorations, showing thorough ICC comprehension and security vulnerability inspection, revealing the malicious behaviors that were normally hidden to users.
Analyzing malware through visual behaviors has been studied with the aim to observe the overall flow of a program, discover malicious patterns, and quickly assess the nature of the malware sample [5], [71]. Wagner et al. [72] proposed KAMAS, a knowledge-assisted visualization system for behavior-based malware analysis, which visualizes API call sequences gathered during the execution of malicious software. The Applicant's approach aligns with this direction, but the focus is shifted on the analysis side with different malware families and the influence of operating systems on malware behavior. In particular, the design decisions and techniques in MalView are applied in the malware analysis domain and derived from visualization principles for time-series data, which is the collection of observations through repeated measurements over time, including but not limited to numerical, geolocation, and text data [73], [74], [75]. Using Ether [61] as the monitoring platform, Quist and Liebrock [76] propose a directed graph structure of all the basic blocks of an executable with a navigable interface to explore the code structure. Treemaps and thread graphs are also visualization techniques that show usefulness in detecting maliciousness of software and in classifying malicious behavior [77]. The classification decisions can be supported by the visual analytics solution provided by Angelini et al. [78] to provide the user a better understanding of such decisions and the possibility of changing the classification results. Visual analytics approach, when combined with predictive analysis, can project potential threats or detect malicious attacks for securing efficient manufacturing automation [79].
Conti et al. [80] designed a system for file analysis with these features: 1) Analyze undocumented file format, 2) Audit files for vulnerabilities, 3) Compare files, 4) Crack, Cryptanalysis, and Forensic analysis, 5) Identify unknown file format, 6) Malware analysis and 7) Reporting. Their system is an extension to the hex editor and consists of both textual and graphical visualization. Quist and Liebrock [76] developed a tool called VERA (Visualization of Executables for Reversing and Analysis) that can be used for visualizing the structure and flow of an executable file, including memory reads and writes. Later, they extended their work [81] by adding more reverse engineering tools and providing more testing case studies in detail.
Trinius et al. [77] used two different approaches for malware visualization. They first generated an XML file containing dynamic analysis information of the malware sample using CWSandbox [82], including 1) loaded system libraries, 2) outgoing and incoming network connections, and accessed or manipulated registry keys. Using the XML output, they visualized the key feature using two techniques: “treemaps” and “thread graphs”. They argue that these two methods are complementary and, using these two visual representations, can effectively help detect the malicious behavior of the given malware and identify the malware family. They tested their proposed approach by executable and nonexecutable (PDF format) malware samples.
Gregio and Santos [83] developed an interactive timeline tool for visualizing dynamic malware behavior using various techniques [48]. They ran the given malware in a controlled environment and captured its behavior using a modified version of BehEMOT [84] (a malware behavior monitoring tool). They captured high-level activities such as file write and delete, process creation and termination, registry reads and writes, mutexes and network operations, and system calls using System Service Dispatch Table (SSDT) hooking, which operates at the kernel level. In addition, they used identification labels provided by VirusTotal [23].
An interactive visualization tool called Malware Vis is introduced by Zhou et al. [48] for malware dynamic analysis with a concentration on network traces including the total number of packets, size of the transmission, number of streams, and the packet trace's duration. They ran the malware in a controlled environment and captured these network traces by using packet sniffer software, where the output packet capture (PCAP) files were used for visualization. They used table views and shape views for representing the features that allow the user to browse, filter, and compare different types of malware. Cappers et al. [85] also utilize PCAP to discover the patterns in traffic to explore the intrusive behavior from malware activities.
This disclosure introduces MalView, an interactive visualization platform for hybrid analysis and diagnosis of malware. This approach first represents the behavioral properties of the major malware classes (such as Trojan or backdoor), aiming to capture the common visual signatures of these malicious applications. MalView implements a web-based prototype for demonstrating this approach to analyzing 60 malware samples from seven different classes. The behavior aspects of these malware files are captured using Process Monitor (i.e., Procmon [6] on three different platforms (Windows XP, Windows 7, and Windows 10). The functionality and features offered by MalView are designed and developed based on a thorough literature review and a comparison with the state-of-the-art malware analysis tools, including AnyRun and Hybrid. In order to have better insight regarding the features offered by MalView, a feature table is presented in which MalView is compared with AnyRun and Hybrid analysis tools. The feature comparison is performed based on four classes of features. The feature table demonstrates that MalView comparatively implements most of the features offered by the other two tools. In addition, the time and processed dependencies, the key features of MalView, are implemented in the prototype, making the analysis more thorough. Given the ability to process, visualize and analyze the system activities and put them into a comprehensive view, MalView can serve as an informative and potential interest to developers, engineers, and practitioners outside the laboratory.
There are several lines of research that can be explored through visual analytics when complemented by conventional static and dynamic analysis: The early detection of zero-day vulnerability and malware is a grand challenge. There are several machine learning-based approaches for addressing this problem [1], [53]. With the capability of visual analytics facilitating explainable machine learning [86], [87], applying visual analytics techniques to detecting and analyzing unknown and zero-day malware is an interesting research approach that can be explored using MalView. On of the key features of MalView is its features in demonstrating time and process dependencies that occurred during static and dynamic analysis. Additional applications may include modeling malware behavior through recurrent neural networks on the visual signatures and then predicting malware behaviors or even classifying suspicious programs into a particular class of malware. For example, modeling malware samples through genome alignments and then modeling the malware classification or detection problem through deoxyribonucleic acid (DNA) or sequence matching approaches. The sequence matching might be useful in capturing the core malicious functionalities of obfuscated malware. The obfuscation techniques employed by the obfuscating tools often follow similar patterns, and thus one would expect the control-flow graphs produced for all these obfuscated malicious applications share fully or partially the same core. MalView offers a visual analytic approach to spot these similar patterns in the execution traces. Once a section of the underlying execution trace is identified as obfuscated, it can be ignored by the user of MalView and then enables the users to focus on other parts of the malware in order to comprehend it. A second approach would be to employ existing de-obfuscated tools to de-obfuscate the malware under investigation (MUI) and then let Procmon generate the de-obfuscated traces of execution and processes.
Now referring to
In one aspect, the method further comprises identifying an intention and a location of a malicious payload in the binary object. In another aspect, the method further comprises analyzing an unknown malware within the object code by recognizing one or more unusual signatures or behaviors. In another aspect, the method further comprises generating one or more rules and signatures for fully-automated malware detection systems. In another aspect, the method further comprises identifying one or more of the following: one or more indicators of compromise and malicious activities; one or more system components that are affected, tampered or damaged by the object code; how the object code functions and infects the computer system; a primary target of the object code; one or more suspicious events that occurred on a network; and an impact on the host system and its registry. In another aspect, the data capturing the run time behavior of the binary object comprises an interaction of the binary object with the host system with respect to one or more of a file system, a registry, a network, a process and a process profile. In another aspect, the method further comprises selecting the data capturing the run time behavior of the binary object. In another aspect, extracting the data from the run time behavior of the binary object comprises processing the data into a time series data and a dependency data. In another aspect, the one or more interactive visual representation of the run time behavior of the binary object further comprises: an activity overview; a network visualization; or a libraries call. In another aspect, the method further comprises zooming, with the computer system, one of the one or more interactive visual representations of the run time behavior of the binary object. In another aspect, classifying, with the computer system, the binary object as malicious or benign further comprises classifying, with the computer system, the binary object as malicious, suspicious, undetected or harmless. In another aspect, the object code contains one or more of a remote access Trojan, a Trojan, a backdoor, a ransomware, an email flooder, a behavioral malware, and a hacktool malware. In another aspect, the method further comprises: executing the binary object on a host system; and logging the data capturing the run time behavior of the binary object during execution of the binary object into a file. In another aspect, the executing and logging steps are performed by a data provider. In another aspect, the host system comprises a sandboxed system. In another aspect, the data capturing the run time behavior of the binary object comprises one or more execution traces. In another aspect, the method further comprises: filtering out one or more first functions by default system operations; or filtering out one or more second functions that are not commonly encountered by malware. In another aspect, the method further comprises: receiving an output of an anti-virus tool using an application programming interface; and incorporating the output into the one or more interactive visual representations of the run time behavior of the binary object.
In another embodiment of the present disclosure, a non-transitory computer readable medium containing a set of instructions that, when executed by a processor, cause the processor to: receive data capturing a run time behavior of a binary object; extract the data from the run time behavior of the binary object; map the extracted data into one or more interactive visual representations of the run time behavior of the binary object comprising a scalar representation of process calls, dependencies among processes and executable files, or time dependencies between the processes and the executable files; and classify the binary object as malicious or benign.
In one aspect, further comprising causing the processor to identify an intention and a location of a malicious payload in the binary object. In another aspect, further comprising causing the processor to analyze an unknown malware within the object code by recognizing one or more unusual signatures or behaviors. In another aspect, further comprising causing the processor to generate one or more rules and signatures for fully-automated malware detection systems. In another aspect, further comprising causing the processor to identify one or more of the following: one or more indicators of compromise and malicious activities; one or more system components that are affected, tampered or damaged by the object code; how the object code functions and infects the computer system; a primary target of the object code; one or more suspicious events that occurred on a network; and an impact on the host system and its registry. In another aspect, the data capturing the run time behavior of the binary object comprises an interaction of the binary object with the host system with respect to one or more of a file system, a registry, a network, a process and a process profile. In another aspect, further comprising causing the processor to select the data capturing the run time behavior of the binary object. In another aspect, extracting the data from the run time behavior of the binary object comprises processing the data into a time series data and a dependency data. In another aspect, the one or more interactive visual representation of the run time behavior of the binary object further comprises: an activity overview; a network visualization; or a libraries call. In another aspect, further comprising causing the processor to zoom one of the one or more interactive visual representations of the run time behavior of the binary object. In another aspect, causing the processor to classify the binary object as malicious or benign further comprises causing the processor to classify the binary object as malicious, suspicious, undetected or harmless. In another aspect, the object code contains one or more of a remote access Trojan, a Trojan, a backdoor, a ransomware, an email flooder, a behavioral malware, and a hacktool malware. In another aspect, further comprising causing the processor to: execute the binary object on a host system; and log the data capturing the run time behavior of the binary object during execution of the binary object into a file. In another aspect, the executing and logging steps are performed by a data provider. In another aspect, the host system comprises a sandboxed system. In another aspect, the data capturing the run time behavior of the binary object comprises one or more execution traces. In another aspect, further comprising causing the processor to: filter out one or more first functions by default system operations; or filter out one or more second functions that are not commonly encountered by malware. In another aspect, further comprising causing the processor to: receive an output of an anti-virus tool using an application programming interface; and incorporate the output into the one or more interactive visual representations of the run time behavior of the binary object.
Now referring to
In one aspect, the one or more processors identify an intention and a location of a malicious payload in the binary object. In another aspect, the one or more processors analyze an unknown malware within the object code by recognizing one or more unusual signatures or behaviors. In another aspect, the one or more processors generate one or more rules and signatures for fully-automated malware detection systems. In another aspect, the one or more processors identify one or more of the following: one or more indicators of compromise and malicious activities; one or more system components that are affected, tampered or damaged by the object code; how the object code functions and infects the computer system; a primary target of the object code; one or more suspicious events that occurred on a network; and an impact on the host system and its registry. In another aspect, the data capturing the run time behavior of the binary object comprises an interaction of the binary object with the host system with respect to one or more of a file system, a registry, a network, a process and a process profile. In another aspect, the one or more processors select the data capturing the run time behavior of the binary object. In another aspect, the one or more processors extract the data from the run time behavior of the binary object comprises the one or more processors process the data into a time series data and a dependency data. In another aspect, the one or more interactive visual representation of the run time behavior of the binary object further comprises: an activity overview; a network visualization; or a libraries call. In another aspect, the one or more processors zoom one of the one or more interactive visual representations of the run time behavior of the binary object. In another aspect, the one or more processors classify the binary object as malicious or benign further comprises the one or more processors classify the binary object as malicious, suspicious, undetected or harmless. In another aspect, the object code contains one or more of a remote access Trojan, a Trojan, a backdoor, a ransomware, an email flooder, a behavioral malware, and a hacktool malware. In another aspect, the one or more processors: execute the binary object on a host system; and log the data capturing the run time behavior of the binary object during execution of the binary object into a file. In another aspect, the executing and logging steps are performed by a data provider. In another aspect, the host system comprises a sandboxed system. In another aspect, the data capturing the run time behavior of the binary object comprises one or more execution traces. In another aspect, the one or more processors: filter out one or more first functions by default system operations; or filter out one or more second functions that are not commonly encountered by malware. In another aspect, the one or more processors: receive an output of an anti-virus tool using an application programming interface; and incorporate the output into the one or more interactive visual representations of the run time behavior of the binary object.
It is understood that particular embodiments described herein are shown by way of illustration and not as limitations of the invention. The principal features of this invention can be employed in various embodiments without departing from the scope of the invention. Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, numerous equivalents to the specific procedures described herein. Such equivalents are considered to be within the scope of this invention and are covered by the claims.
All publications and patent applications mentioned in the specification are indicative of the level of skill of those skilled in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
The use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.” Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the device, the method being employed to determine the value, or the variation that exists among the study subjects.
As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps. In embodiments of any of the compositions and methods provided herein, “comprising” may be replaced with “consisting essentially of” or “consisting of”. As used herein, the term “consisting” is used to indicate the presence of the recited integer (e.g., a feature, an element, a characteristic, a property, a method/process step or a limitation) or group of integers (e.g., feature(s), element(s), characteristic(s), property(ies), method/process steps or limitation(s)) only. As used herein, the phrase “consisting essentially of” requires the specified features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps as well as those that do not materially affect the basic and novel characteristic(s) and/or function of the claimed invention.
The term “or combinations thereof” as used herein refers to all permutations and combinations of the listed items preceding the term. For example, “A, B, C, or combinations thereof” is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, CBA, BCA, ACB, BAC, or CAB. Continuing with this example, expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, AB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth. The skilled artisan will understand that typically there is no limit on the number of items or terms in any combination, unless otherwise apparent from the context.
As used herein, words of approximation such as, without limitation, “about”, “substantial” or “substantially” refers to a condition that when so modified is understood to not necessarily be absolute or perfect but would be considered close enough to those of ordinary skill in the art to warrant designating the condition as being present. The extent to which the description may vary will depend on how great a change can be instituted and still have one of ordinary skill in the art recognize the modified feature as still having the required characteristics and capabilities of the unmodified feature. In general, but subject to the preceding discussion, a numerical value herein that is modified by a word of approximation such as “about” may vary from the stated value by at least +1, 2, 3, 4, 5, 6, 7, 10, 12 or 15%.
All of the compositions and/or methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
To aid the Patent Office, and any readers of any patent issued on this application in interpreting the claims appended hereto, applicants wish to note that they do not intend any of the appended claims to invoke paragraph 6 of 35 U.S.C. § 112, U.S.C. § 112 paragraph (f), or equivalent, as it exists on the date of filing hereof unless the words “means for” or “step for” are explicitly used in the particular claim.
For each of the claims, each dependent claim can depend both from the independent claim and from each of the prior dependent claims for each and every claim so long as the prior claim provides a proper antecedent basis for a claim term or element.
This application claims priority to U.S. Provisional Application Ser. No. 63/583,737, filed Sep. 19, 2023, the entire contents of which are incorporated herein by reference.
This invention was made with government support under Grant/Contract No. 1821560 awarded by the U.S. National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63583737 | Sep 2023 | US |