Embodiments of the present disclosure relate generally to data processing and, more particularly, but not by way of limitation, to methods and systems for protecting computer systems from malware (a portmanteau for malicious computer software).
“Ransomware” is a type of malware from cryptovirology that attacks a victim's computer by encrypting their files or otherwise denying file access until a ransom is paid. Ransomware attackers may also threaten to publish the victim's data unless a ransom is paid. In the computer context, a file is a virtual component stored in memory to represent e.g. an image, text, video, or a computer program. A ransomware process, an instance of a ransomware program executing on a computer, can overwrite and delete files using calls to the computer's operating system.
Anti-virus (AV) and Next-generation Anti-virus (Nextgen AV) software detect malware using techniques like signature-based detection, traffic-based detection, and behavioral detection. Signature-based detection requires malware with a known signature and is thus ineffective against recent strains or targeted attacks. Traffic-based detection looks for communication patterns common to malware to detect recent strains but is relatively slow and inefficient. Behavior-based detection evaluates a process' actions or intended actions for suspicious behavior. Though promising, behavioral detection techniques suffer from high false-positive rates and concomitant productivity loss. There is therefore a need for improved malware detection and mitigation.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
User endpoint 105 includes a raw-data queue 120 that relays incoming events to an event reader 125 for interpretation. A detection engine 130 scans an event sequence corresponding to a given file to check if it is a ransomware threat. A high-priority queue 135 acts as a buffer to store the events deemed threatening before passing them on to a response engine 140, which performs any one of three responses as configured by the system administrator. The three responses are 1) Detect, 2) Detect and kill, and 3) Detect, kill, and back-up recovery. Response engine 140 notifies server 110 about events posing ransomware threats and about the suitable configured response performed.
Events from event reader 125 also flow through a filtering unit 145, which extracts necessary events and moves them to a low-priority queue 150 on their way to a storage unit 155 for future reference in case of any investigation. Examples of necessary events are File_Write,
File_Rename, and File_Delete, which are necessary to note because they impact stored files. Unnecessary events, such as File_Read and Dir_Traversed, are filtered out to save space because they are not needed to recall changes. Server 110 may also insist on storing all the events posing ransomware threats to endpoint 105 for future investigation.
Behavioral detection unit 200 identifies behavioral patterns in event sequences for each process under consideration. These behavioral patterns include:
The behavioral pattern can also be one of the following based on the process and corresponding binary file:
Behavioral detection unit 200 passes files whose event sequences have been determined to be a ransomware threat to file-traversal-information detection unit 205, which determines the information corresponding to the file traversal pattern using Application Program Interface (API) based traversal and/or New Technology File System (NTFS) based traversal. Detection unit 205 supports an algorithm that detects path-traversal attacks, which exploit security flaws associated with user-supplied file names. Algorithms for preventing path-traversal attacks are well known so a detailed discussion is omitted.
The file traversal pattern alone may be insufficient to determine whether a file or files is affected by ransomware because a desired process may traverse files in a similar manner. If the file traversal pattern is suspect, then the process moves to true-root-path detection unit 210 for further consideration.
Files are typically organized in a file system, a data structure that the operating system uses to control how files are stored in memory. The file system separates data into named pieces, giving each piece a file name and location in memory. File systems typically allow files to be grouped in folders, also called directories, that can themselves include subfolders (subdirectories) with their own files and subfolders. The folder/subfolder relationship is analogous to parent/child, with the uppermost folder called the root.
The location of a file within a file system is specified as a string of characters called a “folder path” that specifies the folders and subfolders that contain the file. For example, the folder path for a text file Notes.txt stored on a disk drive designated with a drive letter “C” might be expressed as C:\Users\Test\Documents\Notes. In this example, the root folder C includes subfolders Users, Test, and Documents. Folder and file names are delineated using colon and backslash characters in this embodiment, but other characters and formats can be used.
Ransomware tends to modify many disparate files across a file system, among myriad directories and subdirectories, and this pattern of action can aid in ransomware detection. However, many genuine processes also modify disparate files across large numbers of directories and subdirectories, “genuine” here referring to processes that are intended by the user or consistent with the user's goals. True-root-path detection unit 210 distinguishes between the file-modification patterns of ransomware and genuine processes to improve ransomware detection and reduce false positives.
Whenever a file is modified by a process, true-root-path detection unit 210 looks up the path of the folder containing the file and compares this path with those of other files modified by the same process. A popular CHROME web browser developed by Google LLC exemplifies a genuine process, the executable chrome.exe, that can modify files within many folders. The following Table 1 lists twenty-nine paths associated with an instance of process chrome.exe, each path identifying a folder that can include one or more files that can be acted upon by the process.
True-root-path detection unit 210 reviews the file paths associated with a given process to find one or more of “true root folders” for the process. True root folders contain all the files a process acts upon and tend to be at lower levels in the file system hierarchy. In the example of Table 1, the folders “Google”, “Chrome”, and “UserData” are all specified by relatively low-level, subfolder paths and contain all the files chrome.exe is acting upon. True-root-path detection unit 210 narrows these down to identify the path to folder “Chrome” (C:\Users\test\AppData\Local\Google\Chrome) as the true root folder using an algorithm that distinguishes folder levels based on folder creation times. Genuine processes may have more than one true root folder, but the number will likely be low.
Ransomware processes tend to attack files across file hierarchies. Examining files acted upon by a ransomware thus yields a relatively large number of true root folders. True-root-path detection unit 210 counts the number of true root folders for each process under consideration and compares the number to a folder threshold. Processes that produce a number of true root folders above the threshold are flagged as ransomware or potential ransomware. In one embodiment, the true root folder threshold is three.
The following is an exemplary algorithm true-root-path detection unit 210 deploys to find the true root folder(s) for a given process. Other embodiments can work differently.
True-root-path detection unit 210 finds the true root folder(s) for every file accessed by a process. The entire directory path is fed as input. The foregoing algorithm considers each folder's creation time, which is noted by the operating system at time of creation. Detection unit 210 fetches the creation times of folders at e.g. levels N and N-1 in a directory path folder using API calls and compares the times. If the time difference exceeds a level threshold, or time threshold, then folder level N is considered a stopping point. The path the folder at level N is returned as the true root folder. If the time difference does not exceed the level threshold, detection unit 210 continues along the directory path until the true root folder is found. Other embodiments use different path characteristics to detect ransomware.
In step 305, detection unit 210 gets the paths for a file F. Next, in steps 310 and 315, detection unit 210 gets the file's parent folders F1 and F2, the latter being the direct parent of the former. The folder creation times for folders F1 and F2 are read (steps 320 and 325) and the time difference calculated (step 330). Per decision 335, if the time difference falls below a level threshold, the folder level is incremented (step 340) and the loop repeats for the next level in the folder hierarchy of the file's path. If the time difference exceeds the threshold, detection unit 210 outputs the path associated with the parent folder as the subfolder path that is the true root folder for the file (step 345). The method is then complete for that file. The method is repeated for each file associated with the process under consideration.
Identical subfolder paths for different files from flowchart 300 are consolidated into one true-root path. In the example of Table 1, all files acted upon by chrome.exe share the same true-root path C:\Users\test\AppData\Local\Google\Chrome. The true-root-path count for process chrome.exe would thus be one. True-root-folder detection unit 210 compare the true-root-path count with a root-path threshold. If the number is greater than the root-path threshold, then files associated with the process under considerations are suspected of to be or to be infected by malware. Chrome.exe, having a count of just one, would not raise suspicion.
Files of suspect processes can be treated to additional inspection to avoid false positives. Recalling that ransomware threatens to encrypt the affected files, detection engine 130 can feed files associated with a suspect process to neural-network-based classifier 215, a linear neural-network model with five layers of neurons in this embodiment. Classifier 215 applies a machine-learning algorithm to classify each file as either encrypted or not encrypted. Encrypted files further suggest a ransomware threat. The sequence of events corresponding to the suspect file (such as “File_Delete”, “File_Modified”, “File_Rename”) are put into high priority queue 135 for further consideration.
Returning to
A simple pattern for a ransomware process is to read and modify a file or files. This pattern is also exhibited by genuine processes. In the foregoing example, the process chrome.exe reads and modifies cookie and cache files for webpages visited by a user. Such actions can make a genuine application appear to be a ransomware threat, thereby increasing the false-positive rate. The increased false-positive rate poses a major challenge. Some embodiments deploy filters to further reduce false positives. These filters include:
Per decision 420, detection engine 130 ignores processes that yield fewer than some folder threshold (e.g. three) of true root folders, passing suspicious files to decision 425. A neural network is applied for decision 425 to detect whether a suspicious file is encrypted. If not, the event sequence is ignored. If so, then detection engine 130 signals ransomware detection (step 430) and performs some configured response to address the threat (step 435).
Ransomware often attempts to delete backup files. Windows OS includes backup and recovery software called Volume Snapshot Service (VSS). System 100 periodically requests the VSS to save snapshots of files—shadow copies—including when the files are in use. The latest snapshots can be used to recover corrupted or lost files, helpful in the event of a ransomware attack. A major issue in relying on behavioral ransomware detection is that by the time a threat is detected, some of the user's files may have already been encrypted. One embodiment protects backup storage using Component Object Model (COM) Application Programming Interface (API) hooking and a Kernel Input/Output Request Packet (IRP) filtering method that prevent ransomware from deleting shadow copies so that encrypted files can be recovered.
User layer 510 includes a VSS operation requestor 520 that represents a volume-snapshot service tasked with saving shadow copies by invoking a system call to operating-system kernel 515 to access secondary storage 505. A malware VSS operation requester 520 can issue file deletion requests to VSS 520 using e.g. command-line instructions via an administrative process vssadmin.exe 525, code or script via a utility process wmic.exe 530, or directly via the COM API 535.
Windows OS comes with a shadow-copy process vssvc.exe in place of VSS 520. Like VSS 520, process vssvc.exe includes a COM receiver 540 that can receive deletion requests and pass them to a shadow-copy-delete routine 545 that makes the call to kernel 515. VSS 520 is like shadow-copy process vssvc.exe but modified to include a hook 550 to intercept requests to delete shadow copies. COM receiver 540 implements a Windows interface called “COM API,” which refers to the Component Object Model Application Programming Interface. Hook 550 augments the COM API by intercepting function calls or messages.
There are many API hooking methods available. In Windows, for example, a Dynamic Link Library (DLL) is injected into a VSS process from the kernel. During process creation, the injected DLL is loaded along with other DLLs. Since DLLs are injected from the kernel, it is difficult to control the DLL load order. When two DLLs are injected, for example, the first one can be injected into a VSS process, which hooks NtQuerySystemInformation API, which in turn loads the second DLL, which hooks the DeleteSnapshots COM API. DeleteSnapshots COM API address is calculated using vssvc.pdb. To hook a non-exposed API, the address at which the API resides is calculated using the pdb file and that address is used for hooking.
Kernel 515 includes a filter driver 555 that prevents shadow-copy delete or modification requests that bypass VSS 520, e.g. using a direct drive access (IOCTL command) to a file-system driver 560, or via a shadow-storage resize. Filter driver 555 prevents deletion requests from reaching file system driver 560, a conventional component of the Windows OS, and thus foils efforts by ransomware to prevent recovery of encrypted or otherwise lost files.
Filter driver 555, in one embodiment, employs Kernel Input Output Request-Packet (IRP) filtering to protect shadow copies from deletion methods that bypass VSS 520, methods like direct-device access and shadow-storage resize. Filter driver 555 can load above or below a device driver to capture IOCTL control codes that are sent to a storage volume (e.g. secondary storage 505). Filter driver 555 can block delete requests from VSS 520 that would otherwise delete shadow copies. Hook 550 is nevertheless included so that VSS 520 maintains synchronization with kernel 515. VSS 520 can, for example, maintain a record of shadow copies in secondary storage 505. Hook 550 can alert VSS 520 that a request is to be ignored, leaving filter driver 555 to block the request. Alternatively, VSS 520 can itself block the request.
Variations of these embodiments, and variations in usage of these embodiments, including separate or combined embodiments in which features are used separately or in any combination, will be obvious to those of ordinary skill in the art. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description. In U.S. applications, only those claims specifically reciting “means for” or “step for” should be construed in the manner required under 35 U.S.C. § 112(f).
Number | Date | Country | Kind |
---|---|---|---|
202141041976 | Sep 2021 | IN | national |
Number | Date | Country | |
---|---|---|---|
63280539 | Nov 2021 | US |