One of the main gateways for malicious software (generally referred to as “malware”) to enter and take control of a user's computer is to trick the computer into executing instructions that were not intended as part of the currently executing program. As those skilled in the art will appreciate, to execute an application, both executable instructions and data are loaded into memory. Buffer overrun attacks write malicious instructions into the data areas in memory and trick the computer into executing those instructions as if the data were a legitimate program. There are several ways to trick the computer which usually (but not always) involve corrupting a pointer value in the computer's memory. Some examples of this kind of malware attack include buffer overflow attacks, stack overflow attacks, data-as-instructions attacks, injected code attacks, format string attacks, integer overflow attacks, malicious string attacks, malicious code attacks, heap-smashing attacks, pointer-rewrite attacks, and worms. Of these, the stack overflow technique is likely the most common.
While the above attacks differ slightly, they almost always result in the same thing: a pointer to legitimate executable code is corrupted such that it points to malicious code that was surreptitiously loaded into a data area in memory. With the pointer corrupted, at some point during the normal execution of the program the corrupted pointer is followed and begins executing the malicious code.
As those skilled in the art will appreciate, several ways have been proposed for protecting the stack from being “smashed,” i.e., corrupted or otherwise overrun in order to carry out malicious code, which are extensively described in online information stores and encyclopedias such as Wikipedia. However, these techniques for protecting the system are inherently flawed: they try to protect the program from becoming corrupted, or try to detect corruption before the malicious code is executed, which means that these techniques are either too slow to be practicable or are easily circumventable.
In contrast to the above solutions, a desirable approach in protecting a computer against attacks would be to remove the ability of an attacker to inject executable instructions into the computer system in a form that could then be executed by the processor. By removing the ability to inject properly formed instructions into the computer system, the system is not compromised even when an attacker injects instructions (though not properly formed) and attempts to redirect execution on those instructions.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
According to aspects of the present invention, a method to execute instructions stored in a transformed state in a memory, is presented. The method comprises the following steps. As part of fetching a value from memory, restoring the value according to a context and a restore function if the fetch is for an instruction. Thereafter, the restored value is passed on for execution.
According to additional aspects of the present invention, a method to execute an application on a computing device, is presented. The method comprises the following steps. Loading the application into a memory for execution and selectively transforming the instructions of the loaded application according to a transform function and a context. As a transformed instruction is fetched from the memory for execution, the fetched instruction is restored using a restoration function and the context. Thereafter, restored instruction is passed on to the next stage of the processor for execution.
According to further aspects of the present invention, a computing device for protecting against overrun errors by fetching transformed instructions stored in a memory and restoring the instructions prior to their execution is presented. An exemplary computing device includes a processor and a restoration means. The restoration means is logically located on a data path between the processor's instruction decoder and a memory. The restoration means is configured to, upon the a fetch of a value from the memory, selectively restore the value using a context and a restore function if the fetch is for at least part of an instruction, and pass the restored instruction to the next stage of the processor for execution.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
In order to avoid confusion with regard to the usage of various terms, as used throughout this description, the term “transform” and its conjugations are used to refer to the encoding of instructions from the executable opcodes (that are typically decoded by a processor for execution) to an altered state, i.e., a “transformed” state. Indeed, the term “transform,” and its converse “restore,” are used to distinguish between the decoding that processors, interpreters, and/or virtual machines perform on an instruction/opcode in the typical execution thereof, and the transformation and restoration of the executable instructions to foil the various attacks described above in accordance with the present invention.
In a computing device, the execution of applications by a processor, whether that processor is part of a computer, personal digital assistant (PDA), intelligent appliance, mobile phone, or the like, follows a series of steps. On almost all but the simplest processor-embedded devices, when a process is initiated, a corresponding application is loaded into the computing device's memory. To execute, the processor repeatedly fetches instructions and data from memory, usually one or several bytes at a time. The processor then executes the fetched instructions one or several at a time.
As mentioned above, code injection and similar attacks trick the processor into executing one or more instructions of malicious code by writing the malicious code into the memory's data area, for example by overflowing a buffer. (In this paragraph the word “buffer” is used in a computer science context meaning “memory used to temporarily store data”.) The contents of the overflowed buffer in a code injection attack is a series of executable instructions written in such a way as to cause the computing device to perform the attacker's intended action. For example, the “Code Red” worm used a buffer overflow in a URL parsing function to take control of and infect computers running the Microsoft IIS Web Server. The worm defaced any web sites running on an infected server, and directed all infected servers to launch coordinated denial-of-service attacks on certain well known IP addresses, such as various government web sites. The particular details of the Code Red worm are well known and readily available on various Web sites.
To combat the types of attacks described above, and according to aspects of the present invention, when any application is loaded into memory for execution, the executable instructions of that application are transformed and stored in memory. Moreover, the executable instructions remain in their transformed state as long as they reside in memory. In contrast, the application's data is not transformed and, therefore, is simply placed in memory.
Clearly, transformed executable instructions must be restored to their executable form in order to be executed by the processor. Accordingly, the transformed instructions are restored as part of the fetch process for execution by the processor and remain restored only within the context of being executed by the processor. However, the executable instructions remain transformed in memory.
As those skilled in the art will recognize, the present invention does not address malware's ability to corrupt a return pointer on the stack to point to malicious instructions surreptitiously written into a data area according to the various attacks described above. However, the malicious instructions are placed in memory without being transformed and, thus, when the malicious instructions are fetched, the restoration process renders the instructions ineffective. Of course, attempting to execute the “restored” malicious instructions may result in a program or system crash. However, in almost all instances, a crash is preferable to the malicious results of the malware. Fortunately, some crashes may be averted or quickly detected and handled gracefully by the operating system.
To better set forth the various aspects of the present invention, reference is made to the accompanying drawings. More particularly,
As further shown in
As mentioned above, aspects of the present invention may be implemented on a variety of devices such as, but not limited to, mobile phones, PDAs, mini- and mainframe computers, tablet, notebook, and desktop personal computers, and the like, running nearly any type of processor, including either synchronous or asynchronous processors. However, while there are numerous configurations of computing devices in which the present invention may be implemented,
The storage device 106 also typically stores an operating system 108. When the computing device 200 is powered on, the operating system 108 is loaded (as illustrated by operating system 108') and executed as part of an overall computing system. Moreover, the operating system 108 typically includes an application loader 114 which is used to load applications 110-112 from the storage device 106 into memory. Of course, in a computing system 200 adapted according to the present invention, the various executable instructions of the operating system 108 are stored in memory 104 as transformed instructions.
Other items that may be stored on the storage device 106 include an interpreter, virtual machine, or processor emulator (none of which are shown) any of which may be configured according to aspects of the present invention to fetch, decode, and execute executable instructions loaded in memory 104. Of course, while aspects of the present invention may be implemented in an interpretive or emulative environment, in at least one embodiment these same aspects may be implemented in circuitry in or around the processor 102.
In order to more fully illustrate aspects of the present invention with regard to loading an application,
With the application loaded into memory 104, including the executable instructions being transformed, focus can now turn to the fetch and execute the processes. To this end,
With regard to the processor 102, the processor is connected to a data bus 412 from which it reads and writes data to and from memory 104. Of course, in the sense of information on the data bus 412, this “data” may include both instructions and data. The processor 102 is also shown as having two output lines: a read/write line 414 and a instruction/data line 416. These lines are binary output lines. The read/write line 414 outputs an indicator as to whether the operation on the data bus 412 is a read operation (i.e., from memory 104 to the processor 102) or a write operation (i.e., from the processor 102 to memory 104). As those skilled in the art will appreciate, the bar above “WRITE” for the read/write line 414 indicates the low (or zero) value is indicative of a write operation on the data bus 412. The code/data line 416 indicates whether the information requested (either read or written) is an instruction or data.
The exemplary labyrinth circuitry 402 includes a write buffer 404, a read buffer 406, a decode block 408, and a key unit 410. (In this and the following paragraphs the term “buffer” is used in the electrical engineering context meaning “an amplifying or isolating logic element”.) Additionally, the exemplary labyrinth circuitry 402 includes an inverter 418 connected to the read/write line 414; an AND gate 420 connected to both the read/write line 414 and the instruction/data line 416 with the instruction/data line inverted; and another AND gate 422 connected to both the read/write line and the instruction/data line. Still further, a key bus 424 connects the key unit 410 to the decode block 408. As can be seen, the write buffer 404, the read buffer 406, and the decode block 408 are each connected to the data bus 412, and placed in such a way such that all information that flows to the processor 102 from memory 104 must pass through the labyrinth circuitry 402.
While each of the write buffer 404, read buffer 406, and decode block 408 (as well as the key unit 410) will be described below, it should be appreciated that according to the illustrated embodiment of
In this illustrated embodiment, the key unit 410 is configured to always output a key value over the key bus 424, which key value is used to restore transformed instructions in the decode block 408.
As can be seen, the OE input of the write buffer 404 is connected to the read/write line 414 via the inverter 418. Thus, when the read/write line 414 outputs a low value (0), the inverter 418 inverts the value which enables the write buffer 404. As can be seen, the write buffer is enabled and outputs a value on the data bus 412 when the read/write line 414 is low (0), but is disabled when the read/write line is high (1).
The OE input of the read buffer 406 is connected to both the read/write line 414 and the instruction/data line 416 via an AND gate 420, with the value of the instruction/data line inverted. Thus, when the read/write line 414 is high (1), implying a read from memory 104 to the processor 102, and the instruction/data line 416 is low (0), implying that the requested information is data, the read buffer is enabled and transfers the data on the data bus 412 from memory 104 to the processor 102.
The OE input of the decode block 408 is connected to both the read/write line 414 and the instruction/data line 416 via an AND gate 422. In this configuration, when the read/write line 414 is high (1) and the instruction/data line 416 is also high (1), implying that the requested information is an instruction, the decode block decodes the instruction obtained from memory 104 using the key value on the key bus 424 and outputs the decoded instruction on the data bus 412 to the processor 102.
As a simplification of the above enabled/disabled states,
With regard to the configuration presented in
With further regard to the configuration presented in
It should be appreciated by those skilled in the art that there may be some instructions that span more than one discrete memory such that it requires more than a single fetch, i.e., a “single” instruction may span multiple bytes, or words, etc. Moreover, when this occurs, it may not be necessary to transform the “entire” instruction but only one or more portions. For example, assume that a single instruction is stored in two words, and each word must be fetched separately. It may be sufficient to foil an injection attack if only one of the words was transformed and subsequently restored, assuming also that the labyrinth circuitry was aware as to which word was transformed. This could logically be accomplished by using a key value in the transform and restore functions that represent the identity function. If the restore/transform functions used XOR, a key value of zero would leave a word (i.e., a portion of a multi-fetch instruction) unchanged. However, for purposes of the present discussion, whether all or a part of an instruction (that requires more than one fetch to retrieve) is transformed, it is viewed as the instruction, as a whole, is transformed and subsequently the instruction, as a whole, is restored.
While
The labyrinth circuitry 602 includes a write buffer 404 and a decode block 408, both of which are tri-state devices. The labyrinth circuitry 602 further includes a key unit 604, different from the key unit 410 described above in regard to
The write buffer 404 is the same as described above in regard to
The decode buffer 408, itself, is also configured the same as described above in regard to
For its part, the key unit 604, not being a tri-state device, always outputs a value on the key bus 424 to the decode block 408. However, the key unit 604 is configured with one or more Select lead(s) connected to the instruction/data line 416, and possibly other context information. In this embodiment, if the select value is high (1), as indicated on the instruction/data line 416, the appropriate key value to restore a transformed instruction is output to the decode block 408. Alternatively, if the select value is low (0) indicative of data, as indicated on the instruction/data line 416, an alternative key value is output such that when combined with data in the decode block 408, the data is unmodified. While there are a variety of means to implement selectively restoring transformed instructions while data remains unchanged, an efficient way to accomplish this is to use a logical XOR (exclusive or) operation as the transform/restore operation. Those skilled in the art know that the XOR operation provides a true “round trip,” i.e., that a value encoded (or transformed) with a key in an XOR operation is properly restored with a subsequent application of the XOR operation using the same key, as defined by the formula, v=XOR( XOR(v, k), k), where v is the value transformed and restored, and k represents a key value used by the XOR operation to transform and subsequently restore v. Moreover, when the select value is low (0) on the instruction/data line 416, indicating that the value read is data, the key unit 604 outputs a zero (0) thereby leaving the data unmodified, since v=XOR(v, 0) or the identity function.
While an XOR operation using a discrete key value is both efficient and effective, a more general mathematical description of the transform/restore operation is found in the formula, v=Restore(Transform(v, c), c), where v is the value to be transformed and restored, Restore represents the restoration function that restores a transformed v, Transform represents the transformation function, and c represent a context which is used by both the restoration function Restore and the transformation function Transform to “round trip” the value v. A context c may simply comprise a key value, but may also include, but not be limited to, one or more process privilege bits, address space numbers, a process identifier (ID), a user's password and/or biometric information, one or more lowest address lines, or any combination thereof. Moreover, when using the general mathematical formula described earlier, an identity context, ic, should also be available such that it satisfies the equation v=Restore(v, ic), and possibly, v=Transform(v, ic). In this manner, if all information—both instructions and data—were “restored” via the labyrinth circuitry (or its functional equivalent), an identity context could be provided when the fetched information corresponds to data.
With regard to a key value used by a key unit 604 or 410 to restore transformed instructions, while a suitable key value may be obtained from a variety of sources, a key value may be generated via a random number generator (or pseudo-random number generator) at boot of the computing device and used as the key value for each process running on the computing device, or in combination with a context for each process such that each process ultimately includes a unique key value. Moreover, a random number may be generated for each process launched on the computing device and used as the key value for the corresponding process, or in combination with the process's context. As an alternative to a random number, or pseudo-random number, a key value may be assigned to a particular computing device and/or processor, or derived from each computing device's serial number. Yet other alternatives include a nonce, a virtual machine ID, a security token, a hard to guess number, or any combination thereof. While not shown, in at least one embodiment, the processor 102 is configured to store one or more key values, irrespective of whether or not the key values are used in combination with a processes context. Alternatively, a software module, such as the application loader 114 or some other component of the operating system 108 may store the various key values corresponding to the executing processes. Still further, it should be appreciated that context switches from one process to another, and/or privilege escalations may require the use of different key values and/or contexts.
Returning again to
It should be appreciated that while an XOR operation is an efficient and relatively effective means for encoding instructions, the present invention should not be construed as limited to the use of XOR. Any suitable encoding/encryption algorithm may be used for transforming and restoring executable instructions. For example, other transform/restore functions may include, but are not limited to, a substitution box (s-box) technique, a Feistel network, and the like, each of which are well known in the art. Moreover, while symmetrical encoding techniques may be more efficient in storing a single encoding value, both symmetrical and asymmetrical encoding/decoding techniques may be used so long as the equation v=Restore(Transform (v, c), c) is preserved.
Turning now to
To enhance the security of the instructions, multiple key values may be used. In one embodiment, the key value necessary to transform and restore an executable instruction depends upon the storage/memory address of the instruction. More particularly, in one embodiment, the least significant bits of the memory address of a given instruction are used as an index to determine which key value to use in transforming and/or restoring the instruction. Accordingly, the number of key values should correspond to a power of two.
At control block 906, a looping process is begun to iterate through each instruction loaded into memory 104 for transformation. At block 908, an instruction is transformed using the obtained encoding/key value according to a predetermined transformation function (such as XOR). If multiple key values are used, the appropriate key value is selected for each instruction and used to transform that instruction, as described above in regard to
It should be appreciated that on some devices, particularly “simple” devices, applications are pre-loaded by the manufacturer. In these instances, the pre-loaded applications would be written to the device such that the instructions written to the device are already transformed using a predetermined encoding key, such as the device's serial number.
If the fetching operation is for an instruction, at block 1006 the fetched instruction is restored using the restoration function and the corresponding key value. Of course, while not shown, if multiple key values are used, the particular key value to be used must be also determined using the available context, such as one or more of the lowest significant address bits. After restoring the transformed instruction, or if the value was not an instruction but rather data, the value is then passed onto the next stage within the processor 102. Thereafter, the routine 1000 terminates.
With regard to the above described routines, it should be appreciated that these are logical steps and may not correspond to actual discrete steps in their respective processes. Similarly, those skilled in the art will appreciate that the above-described routines may include various steps that are not identified which have been omitted for purpose of clarity in describing more pertinent aspects of the present invention. Moreover, with regard to routine 900, whether the application is first loaded into the memory 104 before the instructions are transformed, or whether the instructions are transformed as they are retrieved from storage 106 and stored in memory 104 is not important, and both are anticipated as falling within the scope of the present invention.
As can be seen by the numbered icons, the labyrinth circuitry may be located in any number of locations with regard to the processor 1100. Moreover, icon 1 illustrates that the labyrinth circuitry may be placed between the memory and the memory controller 1110. Icon 2 illustrates that the labyrinth circuitry may be located within the memory controller 1110. The labyrinth circuitry may also be placed between the memory controller 1110 and the instruction fetcher 1106, as indicated by icon 3. Still further, the labyrinth circuitry may be placed in either the instruction fetcher 1108 as indicated by icon 4, between the instruction fetcher 1106 and the instruction decoder 1104 as indicated by icon 5, and in the instruction decoder 1104 as indicated by icon 6. It is worth noting that at positions 3, 4, 5, or 6 only instructions (no data) are received. If the labyrinth circuitry is added at any one of these locations, the decoder block, such as decoder block 408 of
The circuitry and/or various functionality of the present invention can be located at any one or any combination of these locations within a given processor 1100 while remaining true to the spirit and purpose of the invention. The spirit of the invention is present whenever instructions and data are stored in memory such that either instructions or data or both are encoded in such a way that data cannot be transparently read as executable instructions; plus the invention includes a decoder that restores transformed instructions or data, or both, as each instruction or chunk of data, or both, are read from memory.
Those skilled in the art will appreciate that at least some processors include a plurality of instruction decoder 1104. Accordingly, (while not shown) as an alternative to providing a labyrinth circuitry separate to an instruction decoder 1104, one or more instruction decoders 1104 could be particularly configured to process transformed instructions as part of its decoding functionality. However, this is viewed as being the logical equivalent of including a labyrinth circuitry within an instruction decoder 1104, as illustrated by icon 6, and is therefore anticipated as falling within the scope of the present invention.
It should be appreciated that the exemplary processor 1100 has been simplified in order to illustrate various locations in which the labyrinth circuitry may be placed, and an actual processor would typically include numerous other components not currently shown. One such component is a memory cache. In processors with one or more memory caches, it is simpler to not locate the labyrinth circuitry between a cache and memory 104, as cache consistency issues are thereby avoided. However, the present invention is not so limited. If the labyrinth circuitry is logically located between a cache and memory 104, care should be taken to ensure that stale cached instructions are refetched at appropriate times such that they are restored correctly.
While the above description has been generally made in regard to a software application loader 114 and a hardware implementation of labyrinth circuitry, it should be appreciated that the general principles may be equally and beneficially applied to a software implemented process and/or an application interpreter such as a virtual machine. In these embodiments, the steps of fetching an encoded instruction, decoding that instruction, and executing instructions should be implemented in software.
While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.
This application claims the benefit of the filing date of Provisional Patent Application No. 60/718,123, filed Sep. 17, 2005, the subject matter of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60718123 | Sep 2005 | US |