With each passing day, cyber-attacks are becoming increasingly sophisticated. Attacks are often targeted to exploit specific vulnerabilities in specific applications. Various methods and tools exist for identifying and protecting against these vulnerabilities in applications, but these existing methods and tools are inadequate.
Embodiments provide improved systems, methods, and computer program products to protect against cyber-attacks.
An example embodiment to directed to a computer-implemented method that, responsive to loading code of an executing application into memory, identifies one or more call instructions in the code. In turn, the method, based on the identified one or more call instructions, creates a list of one or more legitimate return instruction destinations. In this way, such an embodiment identifies legitimate code paths of the application.
Another embodiment causes execution of the application to pause prior to identifying the one or more call instructions. Such an embodiment may, after creating the list of one or more legitimate return instruction destinations, cause execution of the application to resume and control the execution of the application based on the created list of one or more legitimate return instruction destinations.
According to an embodiment, controlling the execution of the application based on the created list of one or more legitimate return instruction destinations includes, upon encountering a given return instruction, determining if a given destination of the given return instruction is approved or unapproved. In such an embodiment, execution of the application is allowed to continue responsive to the given destination being an approved destination. Further, responsive to the given destination being an unapproved destination, such an embodiment: (i) checks the given destination of the given return instruction against the list of one or more legitimate return instruction destinations and (ii) controls the execution of the application based on the checking. In an embodiment, an approved destination is a destination previously determined to be in the list of one or more legitimate return instruction destinations and an unapproved destination is a destination not previously determined to be in the list of one or more legitimate return instruction destinations.
According to an embodiment, controlling the execution based on the checking comprises, responsive to the given destination being in the list of one or more legitimate return instruction destinations, allowing execution of the application to continue and, responsive to the given destination not being in the list of legitimate return instruction destinations, declaring a security attack. Such an embodiment may further include responsive to declaring the security attack, implementing a protection action. In embodiments, protection actions may include any such actions known to those of skill in the art. Non-limiting example protection actions include at least one of: blocking the given destination from being reached, terminating execution of the application, and logging the given destination. According to an embodiment, the security attack is a return-oriented programming attack or buffer overflow attack.
Further still, in an embodiment, the application is executed utilizing a Dynamic Binary Instrumentation (DBI) tool. According to an embodiment, utilizing a DBI tool enables embodiments to pause, resume, and otherwise control execution of the application.
Another example embodiment is directed to a computer system. The computer system includes a processor and a memory with computer code instructions stored thereon. In such an embodiment, the processor and the memory, with the computer code instructions, are configured to cause the system to implement any embodiment or combination of embodiments described herein.
Yet another embodiment is directed to a computer program product. The computer program product comprises one or more non-transitory computer-readable storage devices and program instructions stored on at least one of the one or more storage devices. The program instructions, when loaded and executed by a processor, cause an apparatus associated with the processor to implement any embodiment or combination of embodiments described herein.
The foregoing will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments.
A description of example embodiments follows. Embodiments provide improved functionality to identify legitimate application code paths and protect applications from attacks.
In recent years, there has been a shift in the security industry towards protecting application runtime. Many security tools today claim runtime protection, but offer runtime protection from outside of the application through mechanisms like an enhanced web application firewall. These external approaches to runtime protection are still failing to protect applications from attacks such as Return-Oriented Programming (ROP) that target the application's memory. There is an increased desire and need to monitor the application's runtime from within to provide more deterministic protection.
ROP attacks work by causing the application to diverge from its intended control flow. To defend against this, Control-Flow Integrity (CFI) is an approach that ensures an application only follows its intended paths. Theoretically, CFI is the perfect defense against these types of attacks and can deterministically ensure that an application is not hijacked during its runtime. This type of approach (e.g., CFI) is needed to provide a novel deterministic level of protection against these increasingly sophisticated attackers. However, traditional CFI in practice, cannot be implemented in a way that meets the constraints of enterprise grade applications.
There are two primary issues with CFI in practice. The first issue is determining/obtaining an application's legitimate code paths. The legitimate code paths of an application are an assembly instruction-level understanding of where each code transition is intended to go. Obtaining this information without access to source code, which is common in enterprise environments especially when third-party applications are in use, is a known hard problem for x86-based applications. Without accurate knowledge of legitimate code paths, CFI cannot be enforced properly. Even if the knowledge of code paths is slightly inaccurate, this results in false positives or false negatives. Second, enforcing CFI at runtime requires an instruction-level monitoring of an application during the application's runtime. CFI requires a notification every time the application has a code transition. Code transitions in an application are common, and intercepting each transition results in a prohibitive slowdown in the application's performance.
Attempts have been made to provide runtime protection, and these runtime application protection attempts have generally been handled differently by different providers. There are tools such as Signal Sciences[1] (bracketed numbers in this document refer to the enumerated list of references hereinbelow), Imperva[2], and Contrast Security[3] that offer runtime application protection, but do not guarantee protection against ROP-based attacks. Furthermore, the approaches of Signal Sciences[1], Imperva[2], and Contrast Security[3] are targeted towards web applications and do not monitor an application's internal control flow. Without this visibility, these tools (Signal Sciences[1], Imperva[2], and Contrast Security[3]) cannot offer the same kind of runtime memory protection that embodiments, which may be referred to herein as “Virsec Security Platform (VSP) memory,” offers.
There are also endpoint protection tools such as Palo Alto Traps[4], Crowdstrike[6], and Mcafee[6] that offer continuous monitoring and detection through a variety of methods such as Machine Learning and behavior analysis. These tools typically rely on peripheral information from the system and/or network to detect when an endpoint is compromised. Again, without instruction-level visibility, these tools cannot obtain the same deterministic and reliable detection against memory-based attacks that embodiments provide.
A few providers, such as Karambe[7] and Runsafe[8], offer runtime memory protection. However, Karambe[7] only claims protections for embedded systems, which are not typically x86 application binaries. Runsafe[8] offers runtime memory protection by transforming the application binary and changing how the application loads its code into memory. This is done either statically or at load-time when the application first starts. This approach (Runsafe[8]), while offering some protection against memory-based attacks, is not as comprehensive and deterministic as embodiments for several reasons.
First, the static transformation implemented by Runsafe[8] is only performed once. After the transformation is complete, the binary has no further protection and, thus, Runsafe[8] cannot guarantee that the application has not been hijacked after it is launched. Runsafe's[8] second option, which performs the static transformation at load time, at the beginning of each and every application start, still does not offer any actual runtime monitoring of the application. The Runsafe[8] tool starts and stops after the application is loaded into memory and has no mechanism for detecting or stopping memory-based attacks.
Embodiments solve these problems and provide runtime application protection. Embodiments, i.e., VSP memory, implement a modified version of CFI that is capable of detecting ROP and similar buffer overflow attacks with sufficiently low runtime overheads and no prior knowledge of the application code paths. Embodiments can detect attacks in real time and even stop attacks inline before the attacks succeed. Because embodiments are based on CFI, embodiments have near-zero false positives and offer a deterministic approach that is not learned and instead based on core binary code tenets.
The method 550 begins at step 551 where, responsive to loading code of an executing application into memory, one or more call instructions in the code are identified. According to an embodiment, the call instructions are identified in the code through limited dynamic disassembly. When new code is being loaded into a code cache, such an embodiment has the opportunity to examine the code. During this examination, disassembly (the process of turning machine code/binary into assembly, which is understandable) is run. The code in assembly form is examined to identify the call instructions. In turn, at step 552, a list of one or more legitimate return instruction destinations is created based on the identified one or more call instructions (from step 552). In this way, the method 550 identifies legitimate code paths of the application. To illustrate, it is a tenet of instruction level behavior that after a call, and executing instructions pursuant to the call, e.g., a function, an application returns to the address immediately after the call. Thus, for every call identified at step 551, a corresponding return destination, the address immediately after the call, is identified and added to the list of legitimate return instruction destinations at step 552. Advantageously, the method 550 identifies the legitimate code paths without access to the original source code of the application and without doing a full static disassembly of the application. This is a significant improvement over existing methods, which require source code access and a static disassembly to understand the code.
While the method 550 functionality of identifying legitimate code paths provides significant improvements over existing functionality, embodiments of the method 550 go further and utilize the created list of legitimate return instruction destinations to protect the application from malicious attacks. One such embodiment of the method 550 causes execution of the application to pause prior to the identifying the one or more call instructions at step 551. Such an embodiment may, after creating the list of one or more legitimate return instruction destinations at step 552, cause execution of the application to resume and control the execution of the application based on the created list of one or more legitimate return instruction destinations (from step 552). In an embodiment of the method 550, the application is executed utilizing a Dynamic Binary Instrumentation (DBI) tool, such as DynamoRIO. According to an embodiment, utilizing a DBI tool enables embodiments of the method 550 to pause, resume, and otherwise control execution of the application.
According to an embodiment, controlling the execution of the application based on the created list of one or more legitimate return instruction destinations includes, upon encountering a given return instruction, determining if a given destination of the given return instruction is approved or unapproved. In such an embodiment, execution of the application is allowed to continue responsive to the given destination being an approved destination. Conversely, responsive to the given destination being an unapproved destination, such an embodiment: (i) checks the given destination of the given return instruction against the list of one or more legitimate return instruction destinations and (ii) controls the execution of the application based on the checking. In an embodiment of the method 550, an approved destination is a destination previously determined to be in the list of one or more legitimate return instruction destinations and an unapproved destination is a destination not previously determined to be in the list of one or more legitimate return instruction destinations. In other words, if a destination was previously evaluated and determined to be legitimate (i.e., in the list), the destination is considered “approved,” and if a destination was not previously evaluated, the destination is considered “unapproved” and is checked.
According to an embodiment of the method 550, controlling the execution based on the checking comprises, responsive to the given destination being in the list of one or more legitimate return instruction destinations, allowing execution of the application to continue and, responsive to the given destination not being in the list of legitimate return instruction destinations, declaring a security attack. Such an embodiment may further include, responsive to declaring the security attack, implementing a protection action. In embodiments, protection actions may include any such actions known to those of skill in the art. Example protection actions include at least one of: blocking the given destination from being reached, terminating execution of the application, and logging the given destination. In this way, embodiments of the method 550 protect the application from security attacks, such as return-oriented programming attacks and buffer overflow attacks, amongst other examples.
To implement the functionality described herein, an embodiment uses a code cache-based DBI tool to track execution of the application, e.g., 660. In an embodiment, as the application executes a portion of code for the first time, such an embodiment loads that new portion of code into the code cache 662. As noted above, in
Specifically,
Embodiments are able to build a reliable set of potential return instructions because: (1) a call instruction is made prior to its corresponding return, (2) an application must have its basic block loaded into the code cache 662 to continue to execute, and (3) embodiments control execution of the application to provide an opportunity to inspect each basic block prior to loading it into the code cache 662. Further, embodiments, at least partially, maintain a low steady-state performance overhead because this loading of the basic blocks is typically only done once per application execution. As such, inspection to identify legitimate return destinations is only performed once per execution instance.
Embodiments have low steady-state performance overhead because, for most applications in the steady-state, return instructions rarely go to new code locations. In the steady-state, embodiments do not require any runtime intervention or inspection while the application is running. This is because such an embodiment only receives a call-back to check a destination the first time a destination is encountered.
To map legitimate code paths, embodiments uses a DBI tool to instrument any application. The DBI tool gives embodiments instruction-level information about each code segment that the application executes. The instructions that the application is going to execute are only loaded once, which minimizes the steady state overhead of the analysis that embodiments perform.
With the opportunity to analyze the instructions of the application, embodiments look for two key instructions: the call instruction and the return instruction. A core tenet of application instruction-level behavior is that the return instruction jumps to an instruction immediately following a call instruction. This also means that a call instruction should always be executed prior to the return instruction executing. With this key fact, embodiments are able to pseudo-dynamically build a set of legitimate return instruction destinations while the application is executing. When a return instruction is observed, embodiments execute minimal code to detect where the return instruction is jumping. The destination is verified at runtime against the list of legitimate return destinations built during runtime. This check is done in real time, meaning embodiments can also stop attacks before they execute. Embodiments can ensure that all legitimate return destinations will be in the embodiments' return map because the corresponding call instruction must have executed prior.
With the knowledge of where returns should go, an embodiment maintains a sufficiently low latency overhead by only intercepting the application's runtime at two key points. First, such an embodiment examines the application code as it is loaded into memory, which typically only occurs once per execution. Such an embodiment intercepts the execution here in order to look for possible return destinations. Because application code is only loaded into memory once, this runtime cost is only incurred once. Second, such an embodiment intercepts the application's execution when a return instruction goes to code that has not yet been loaded into memory. This interception allows the embodiment to dynamically check that the return destination exists in its return map. This occurs often during the initialization of the application because the application has not loaded much of its code into the code cache. However, once the application reaches the steady state, return instructions rarely go to new locations, which means embodiments incur minimal steady state latency overhead.
Moreover, in contrast to most other CFI-based approaches, embodiments do not have to insert any code that is executed every time a return instruction is executed. Most other approaches insert some code after every return instruction to save the real-time destination of the return. As such, even if such an existing approach has perfect knowledge of an application's legitimate code paths, such an existing approach is, nonetheless, computationally expensive to execute. An embodiment optimizes this approach and only runs its return-analysis code when a return instruction is jumping to a location unseen by said embodiment. This means that in the steady state, once most of the code that is being used has already been executed, embodiments incur near-zero latency overheads.
With this implementation, embodiments can detect when a return instruction jumps to a location that is an illegitimate call site. This is not perfect CFI; however, it does eliminate more than 98% of destinations within an application significantly increasing the bar of difficulty to execute a buffer overflow attack [9]. Additionally, ROP and buffer overflow attacks typically rely on more than one errant return, which increases the chances of detection by embodiments.
Embodiment's, i.e., VSP memory's approach offers three main advantages: deterministic detection, inline protection, and no prior knowledge requirement. Because embodiments are based on CFI and rely on core binary code tenets, the detection provided by embodiments is not probabilistic and, rather, based on rules that all applications, no matter their origin or use case, need to follow. This offers near-zero false positives in comparison to other approaches. Embodiments can also offer inline protection because they can be run inside of the application and intercept the application's execution in real time. During this interception, embodiments can check the legitimacy of code transitions and have the ability to stop the application prior to illegitimate transitions. This level of accuracy and speed is unprecedented for this type of detection. Further, embodiments are able to build a map of intended paths as the application executes. Embodiment's novel approach to generating this map means that there is no source code requirement or pre-analysis required. Embodiments do not need to know about the application at all prior to launching the application. Embodiments can offer deterministic, inline detection and protection from the first time they launch an application.
Client computer(s)/devices 50 and server computer(s) 60 provide processing, storage, and input/output devices executing application programs and the like. The client computer(s)/devices 50 can also be linked through communications network 70 to other computing devices, including other client devices/processes 50 and server computer(s) 60. The communications network 70 can be part of a remote access network, a global network (e.g., the Internet), a worldwide collection of computers, local area or wide area networks, and gateways that currently use respective protocols (TCP/IP, Bluetooth®, etc.) to communicate with one another. Other electronic device/computer network architectures are suitable.
Client computers/devices 50 and/or servers 60 may be configured, alone or in combination, to implement the embodiments described herein, e.g., the method 550, amongst other examples. The server computers 60 may not be separate server computers but part of cloud network 70.
Embodiments or aspects thereof may be implemented in the form of hardware including but not limited to hardware circuitry, firmware, or software. If implemented in software, the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein.
Further, hardware, firmware, software, routines, or instructions may be described herein as performing certain actions and/or functions of the data processors. However, it should be appreciated that such descriptions contained herein are merely for convenience and that such actions in fact result from computing devices, processors, controllers, or other devices executing the firmware, software, routines, instructions, etc.
It should be understood that the flow diagrams, block diagrams, and network diagrams may include more or fewer elements, be arranged differently, or be represented differently. But it further should be understood that certain implementations may dictate the block and network diagrams and the number of block and network diagrams illustrating the execution of the embodiments be implemented in a particular way.
Accordingly, further embodiments may also be implemented in a variety of computer architectures, physical, virtual, cloud computers, and/or some combination thereof, and, thus, the data processors described herein are intended for purposes of illustration only and not as a limitation of the embodiments.
The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 63/266,119, filed on Dec. 29, 2021. The entire teachings of the above application are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/082511 | 12/29/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63266119 | Dec 2021 | US |