1. Technical Field
The present disclosure relates to software protection and more specifically to detecting, during program execution, unauthorized modifications to computer programs.
2. Introduction
There are many examples of computer software that contain sensitive information, such as proprietary algorithms, security routines, or cryptographic keys. One prime example is Digital Rights Management software, which can contain a variety of highly sensitive information. Because of this, software is often the target of various attacks, like software tampering or reverse engineering, in which a user modifies the software as part of an effort to dissect, analyze, and discover how it works or to alter the functionality. In the currently used computing model, software is particularly vulnerable to tampering and reverse engineering attacks because the software is often executed in an open environment in which the user has full control. For example, when software is executed on a user's personal computer, which is running a commonly available operating system, e.g., Mac OS, Microsoft Windows, Linux, etc., the user is able to closely monitor the execution of the program. Even in more closely managed operating systems, such as mobile device operating systems, tools are available for the user to monitor and tinker with program execution. Furthermore, with the right skills and tools, the user can even alter the execution of the program.
A variety of techniques have been employed to address these vulnerabilities, such as code obfuscation and tamper resistance techniques. Unfortunately, most of these techniques are vulnerable to the same attacks to which unprotected software is vulnerable.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or can be learned by practice of the herein disclosed principles. The features and advantages of the disclosure can be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the disclosure will become more fully apparent from the following description and appended claims, or can be learned by the practice of the principles set forth herein.
Disclosed are systems, methods, and non-transitory computer-readable storage media for gathering statistics related to the execution of the program and in some embodiments using the statistics to detect unauthorized modifications to a computer program. The statistics can include actual statistics as well as anything sequentially derivable, such as computing a hash chain to detect unauthorized access. A general-purpose computing device can be configured to gather the statistics by including a number of special purpose statistics gathering hardware instructions, such as a reset instruction counter, a start instruction counter, a stop instruction counter, a get instruction count instruction, and/or a get CPU cycle count instruction. When the computing device executes the instructions in a program it can perform specific actions for the statistics gathering instructions, such as resetting an instruction counter, starting or stopping the instruction counting, or fetching the instruction count value. In some embodiments, instead of hardware instructions, the special purpose statistics gathering instructions can be instructions included in a virtual machine.
To gather statistics on a specific section of code, a software developer can insert statistics gathering instructions around the desired code section. For each instruction inside the identified code block, a counter can be incremented when the instruction is executed. If the developer is only interested in the count for that code section, a reset instruction can be inserted prior to the start instruction. A get instruction count instruction can be used to get the count in the counter. The get instruction count instruction can be inserted right after the stop instruction, or anywhere else in the code prior to a reset instruction. In some cases, the counter can be stored in a dedicated register.
In some embodiments, a software developer can use the statistics gathering features to detect unauthorized changes to a computer program. For example, if an attacker modifies a monitored section of code by increasing or decreasing the number of instructions, the instruction count can be used to detect the change. Alternatively, in some embodiments, the statistics gathering features can be used to measure performance of a specific section of code. For example, different compiler optimizations and/or code protection mechanisms can have different effects on a monitored section of code. The statistics can be used to identify the code impact.
In order to describe the manner in which the above-recited and other advantages and features of the disclosure can be obtained, a more particular description of the principles briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only exemplary embodiments of the disclosure and are not therefore to be considered to be limiting of its scope, the principles herein are described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Various embodiments of the disclosure are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the disclosure.
The present disclosure addresses the need in the art for improved methods of gathering computer program statistics and detecting, during program execution, unauthorized changes to the computer program. The approaches set forth herein can also be used for profiling performance of particular portions of the computer program, or for comparing the performance of executables generated by different compilers, for example. The disclosure first sets forth a discussion of a basic general-purpose system or computing device in
With reference to
The system bus 110 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. A basic input/output (BIOS) stored in ROM 140 or the like, may provide the basic routine that helps to transfer information between elements within the computing device 100, such as during start-up. The computing device 100 further includes storage devices 160 such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive or the like. The storage device 160 can include software modules 162, 164, 166 for controlling the processor 120. Other hardware or software modules are contemplated. The storage device 160 is connected to the system bus 110 by a drive interface. The drives and the associated computer readable storage media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computing device 100. In one aspect, a hardware module that performs a particular function includes the software component stored in a non-transitory computer-readable medium in connection with the necessary hardware components, such as the processor 120, bus 110, display 170, and so forth, to carry out the function. The basic components are known to those of skill in the art and appropriate variations are contemplated depending on the type of device, such as whether the device 100 is a small, handheld computing device, a desktop computer, or a computer server.
Although the exemplary embodiment described herein employs the hard disk 160, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, digital versatile disks, cartridges, random access memories (RAMs) 150, read only memory (ROM) 140, a cable or wireless signal containing a bit stream and the like, may also be used in the exemplary operating environment. Non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se.
To enable user interaction with the computing device 100, an input device 190 represents any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. An output device 170 can also be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems enable a user to provide multiple types of input to communicate with the computing device 100. The communications interface 180 generally governs and manages the user input and system output. There is no restriction on operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.
For clarity of explanation, the illustrative system embodiment is presented as including individual functional blocks including functional blocks labeled as a “processor” or processor 120. The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software and hardware, such as a processor 120, that is purpose-built to operate as an equivalent to software executing on a general purpose processor. For example the functions of one or more processors presented in
The logical operations of the various embodiments are implemented as: (1) a sequence of computer implemented steps, operations, or procedures running on a programmable circuit within a general use computer, (2) a sequence of computer implemented steps, operations, or procedures running on a specific-use programmable circuit; and/or (3) interconnected machine modules or program engines within the programmable circuits. The system 100 shown in
Having disclosed some basic system components, the disclosure now turns to an introductory discussion of gathering execution statistics and a couple exemplary uses of those statistics. One way to gather information pertaining to the execution of a program accurately and efficiently is to implement a specific set of CPU instructions for the desired statistics. Once the information has been gathered, developers can use the statistics in a number of ways, such as to aid in detecting unauthorized modifications to the software or to measure or compare performance. The present disclosure is directed to gathering and using statistics regarding the number of instructions executed and/or CPU cycles completed for a particular section of computer executed code, however the principles disclosed herein can also be used to gather other information.
The statistics gathering instruction set can be incorporated in a CPU, a crypto-processor, or any hardware chipset executing sequences of instructions. The instruction set does not have to be dependent on any particular CPU capabilities or specifications. For example, the instructions can be implemented for a 16-bit, 32-bit, 64-bit, or other architecture. Additionally, the instructions can be implemented for a computer architecture that supports fixed width or variable width instructions.
While the disclosure discusses implementations of the statistics gathering capabilities as part of a hardware instruction set, one skilled in the art will recognize that these capabilities can also be implemented as part of a virtual machine using the same principles disclosed for the hardware implementation.
To allow a developer to gather statistics regarding the number of instructions executed and/or CPU cycles completed for a particular section of code, the system can include a set of instructions including a resetInstCounter instruction, a startInstCounter instruction, a stopInstCounter instruction, a getInstCounter instruction, and a getCPUCycleCount instruction. Each instruction can be an atomic instruction. The resetInstCounter instruction can be used to reset the instruction counter and/or CPU cycle counter to a default value, such as zero. The starting instruction can be used to start the instruction counting and or CPU cycle counting. Further, the system can count other similar actions taken by the computing device or components of the computing device. For example, the system can count instructions processed by a graphical processing unit, the number of bytes written to or read from memory, the number or order of accesses to registers in the CPU, and so forth. While the examples below discuss CPU cycles and instructions, metrics other than CPU cycles, as well as combinations thereof, can be used.
Once a start instruction has been executed, the instruction counter can be incremented for each instruction executed after the start instruction. The stopInstCounter instruction can stop the instruction counting and/or CPU cycle counting. Once a stopInstCounter instruction has been executed, the instruction counter will not be incremented again until a start instruction is executed. The getInstCounter instruction can be used to fetch the current value of the instruction counter. The getCPUCycleCount instruction can be used to fetch the current value of the CPU cycle counter. In some embodiments, alternative instructions can be used, such as instructions that permit gathering statistics that differentiate between the number of instructions executed system wide and the number of instructions executed on a specific execution thread.
A key to the statistics gathering is that at least one counter is maintained. The counter(s) can be implemented in a number of ways. For example, the counter(s) can be stored in one or more dedicated registers. Alternatively, the counter(s) can be stored in a dedicated section of memory. In some cases, it may be desirable to store a counter in a manner that is only modifiable by the dedicated hardware instructions or is at least difficult to modify by a user or other processes.
The statistics gathering instructions can be added to the software at various points in the software development. For example, in some cases, the developer can indicate the location and type of instruction that should be added to the compiled program by adding annotations or flags to the source code. These annotations can take the form of specially formed comments, preprocessor directives, macros, specific instructions, calls to a particular API, code formatting, and so forth.
Alternatively, in some cases, the source code can be translated into an intermediate representation prior to indicating the location and type of instruction that should be added to the compiled program. In both of these cases, a compiler can then be used to produce the executable program with the desired statistics gathering capabilities embedded. In other cases, the developer can add the desired instructions at the assembly level after the source code has been compiled to an executable program.
The statistics gathering instructions can be inserted in the software in various points depending on the desired goals of the software developer. For example, to obtain information on a specific section of code, a software developer can insert startInstCounter and stopInstCounter instructions around the desired code section. For each instruction inside the identified code block, the instruction counter is incremented when the instruction is executed. If the developer is only interested in the count for that code section, a resetInstCounter instruction can be inserted prior to the startInstCounter instruction. The getInstCounter instruction can be used to get the count in the counter. The getInstrCount instruction can be inserted right after the stopInstCounter instruction, or anywhere else in the code prior to a resetInstCounter instruction. Alternatively, the developer can insert a getInstCounter instruction prior to a stopInstCounter instruction. This allows a developer to gather information at various points during the execution of a section of code instead of just at the completion of the code section.
As mentioned briefly above, in some embodiments, the instruction set can be implemented to permit gathering statistics that differentiate between the number of instructions executed system wide and the number of instructions executed on a specific execution thread. In this case, multiple registers can be used to maintain the counters. One possible register configuration 400 is illustrated in
There are a number of different ways that a developer, a third party, and/or the system can use the statistics once gathered. These examples are discussed in terms of a developer, but a third party and/or the system can also perform these actions. In some cases, the statistics can be used to detect unauthorized code modifications. As part of a code protection mechanism, the developer can identify one or more sensitive sections of code. Once identified, the developer can insert appropriate statistics gathering instructions around the selected code sections. The developer can determine at compile time the number of instructions that should be executed in a selected code section. The developer can then incorporate this expectation in a protection mechanism of their choice, such as sending an error message to a remote computer or to a local display, corrupting key data, quitting program execution, requesting some additional user authentication, and so forth. By strategically placing the instructions, an attack that increases or decreases the number of instructions in a code section can be detected at run time. If the number of instructions executed does not match the expectation, the protection mechanism can react appropriately. For example, the instruction count may be useful in detecting a modification in which an attacker has tried to remove or bypass a security check by removing instructions. Alternatively, the instruction count may also be useful in detecting the presence of a debugger when an attacker has inserted a breakpoint in a monitored section of code.
Statistics gathering instructions can aid in measuring performance of one or more sections of code. A variety of factors can impact the performance of a program, such as compiler optimizations, use of external and uncontrolled APIs, etc. For example, some compiler optimizations can increase code size in exchange for a speed increase, decrease performance at the expense of other priorities, or make changes that do not provide any real benefit. The statistics gathering instructions can be used to identify unexpected performance or size bottlenecks or simply to perform a performance audit of particular code sections or the program as a whole.
In the second example, the developer is interested in gathering an overall performance profile of the function 810. In this case, the developer starts the counter at the beginning of the function and makes periodic checks of the CPU cycle counter. This technique can be useful in identifying performance hotspots, monitoring code performance as a program evolves (such as between daily builds or between different release versions), etc. Other techniques of using the statistics gathering instructions to measure performance are also possible, such as incorporating the getInstCounter instruction. The CPU cycle counting can be incorporated into an existing command, such as a loop.
Having disclosed some basic system components and concepts, the disclosure now turns to an exemplary method embodiment shown in
The exemplary method 900 in
After processing the instruction type and performing any necessary actions, the system 100 checks if the count instruction flag is set (922). If the flag is set, the system 100 increments the instruction count value (924). In some embodiments, additional or alternative counters can be incremented, e.g. CPU cycle counter, system wide instruction counter, thread specific instruction counter, etc. After incrementing the counter, or if the count instruction flag is not set, the system 100 returns to step 902 to see if there are more instructions to process.
The exemplary method 1000 in
After identifying a code section, the system 100 can determine an expected number of instructions that are required to execute that code section (1004). In some embodiments, the system can do this by directly and/or statically analyzing the code. For example, if the code is in an executable format, the system may be able to simply count the number of instructions in the code section. Alternatively, the code may be in some other format, such as source code, in which case, the system 100 may need to transform the code prior to determining the number of instructions. For example, the system 100 may need to compile the code before determining the number of instructions.
After identifying the code section, the system 100 can insert instructions to count an actual number of instructions that are required to execute the code section (1006). In some embodiments, the system 100 can insert instruction counting instructions, such as the reset, start, stop, and getInstCounter instructions by identifying annotations that indicate the appropriate instruction type and location. The system 100 can also be configured to insert instruction counting instructions through other means, such as by analyzing the program to identify the appropriate instruction type and location.
The system 100 can also insert a protection mechanism that makes use of the expected and actual instruction count values (1008). The expected instruction count value can be one value, multiple values, and/or a range of valid values. In some embodiments, the system 100 can randomly choose a protection mechanism to insert from a set of predefined protection mechanisms. The system can also be configured to analyze the code and select a protection mechanism based on a predefined set of criteria. Alternatively, in some embodiments, the system can be configured to insert a particular protection mechanism. For example, the system can receive a command indicating the protection mechanism to insert or the code can contain an annotation or other notation that indicates the protection mechanism the system 100 should insert.
In some embodiments, a developer or a computer program, such as a compiler, can perform method 1000. For example, the developer can visually inspect the program to identify a section of code to protect using instruction counting (1002). After identifying a section, the developer can determine the number of computer instructions required to execute the code section (1004). The expected number of instructions can be determined in a number of ways. For example, the developer can compile the program, identify the code section in the executable, and count the number instructions with an automated tool. After identifying a code section, the developer can also insert instruction count instructions at various points in the program. The type of instructions inserted and the insertion location can vary depending on the protection goals of the developer. In some embodiments, the developer can insert annotations that represent the desired instructions and are later replaced by the actual instructions by another tool, such as a compiler. Alternatively, in some embodiments, the developer can manipulate the program at the assembly level and can directly insert the appropriate assembly instructions. The developer can also insert a protection mechanism that makes use of the expected and run-time instruction count values (1008). In some cases, the protection mechanism can take the form of a comparison, such as that used in function 710 in
Embodiments within the scope of the present disclosure may also include tangible and/or non-transitory computer-readable storage media for carrying or having computer-executable instructions or data structures stored thereon. Such non-transitory computer-readable storage media can be any available media that can be accessed by a general purpose or special purpose computer, including the functional design of any special purpose processor as discussed above. By way of example, and not limitation, such non-transitory computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions, data structures, or processor chip design. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or combination thereof) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media.
Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, components, data structures, objects, and the functions inherent in the design of special-purpose processors, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
Those of skill in the art will appreciate that other embodiments of the disclosure may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
The various embodiments described above are provided by way of illustration only and should not be construed to limit the scope of the disclosure. Those skilled in the art will readily recognize various modifications and changes that may be made to the principles described herein without following the example embodiments and applications illustrated and described herein, and without departing from the spirit and scope of the disclosure.