Embodiments of the present invention are understood by referring to the figures in the attached drawings, as provided below.
Features, elements, and aspects of the invention that are referenced by the same numerals in different figures represent the same, equivalent, or similar features, elements, or aspects, in accordance with one or more embodiments.
The present disclosure is directed to systems and corresponding methods that facilitate the identification of a program code based on the sequential arrangement of the program code's basic blocks.
In the following, numerous specific details are set forth to provide a thorough description of various embodiments of the invention. Certain embodiments of the invention may be practiced without these specific details or with some variations in detail. In some instances, certain features are described in less detail so as not to obscure other aspects of the invention. The level of detail associated with each of the elements or features should not be construed to qualify the novelty or importance of one feature over the others.
Referring to
To identify the program code, a unique identifier is associated with the program code. In one embodiment, the unique identifier is constructed or detected based on the order in which the basic blocks are arranged in the program code. A basic block is a straight-line segment of logic code without any jumps in the middle. That is, each basic block comprises a sequence of instructions, where the instruction in each position dominates or executes before other instructions positioned in subsequent portions of the logic code, such that no other instruction executes between two instructions in a sequence. For example, referring back to
To control the flow of execution between the basic blocks, branch instructions may be added at the end of each basic block. The blocks to which control may transfer after reaching the end of a block are that block's successors. The blocks from which control may have come when entering a block are that block's predecessors. Referring back to
Referring to
It is noteworthy that the number of the basic blocks in the selected subset need not be less than the number of the basic blocks in the original program code. In other words, the selected subset may, in certain embodiments, comprise all the basic blocks in the original program code (e.g., {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}). The selected basic blocks are, preferably, the basic blocks that are not frequently executed. We refer to the less frequently executed basic blocks in the original program code as cold basic blocks, and to one or more other basic blocks that are more frequently executed as the main basic blocks, for example.
In accordance with one embodiment, the basic blocks in the original program code are rearranged to construct a copy of the original program code. We refer to the newly constructed copy of the original program code as the target program code. During the rearrangement process, preferably, the cold basic blocks in the original program code are rearranged in a different sequential order, while the sequential order of the main basic blocks remains unchanged. Advantageously, rearranging the cold basic blocks and maintaining the original order of the main basic blocks is likely to less adversely affect the execution efficiency of the target program code.
In some embodiments, the rearrangement enhances the execution efficiency of the target program code. It is noteworthy, however, that in alternative embodiments, the rearranging process is not limited to the cold basic blocks. Thus, in one or more embodiments, once a subset of the basic blocks is selected, the selected basic blocks are rearranged, regardless of the execution frequency (S230). As shown in
The new sequential order in the target program code may be used to generate a unique identification key (S240). The unique identification key can be, for example, used to identify the target program code. If the original program code comprises N basic blocks, then N*(N-1)*(N-2)* . . . *3*2*1) or N! unique rearrangement of the original program code can be generated. That is, N! unique target program codes can be generated from the original program code. Since each arrangement is unique, N! unique identification keys can therefore be generated to identify N! target program codes for N! licensees or end users.
In other embodiments, other reordering schemes may be utilized to generate one or more unique identification keys. For example, in one embodiment, a derangement scheme may be used. A derangement is a permutation in which none of the members of a set or subset appear in their “natural” (i.e., ordered) place. For example, the derangements of {1 2, 3} are {2, 3, 1} and |{3, 1, 2}|, represented by |3=2|. The function giving the number of distinct derangements on n| elements is called the subfactorial |!n and is calculated as follows:
In yet other embodiments, additional unique identification keys may be generated by reordering a subset of the basic blocks in the original program code and randomly selecting M of the basic blocks to construct the unique identification key. For example, referring to
In yet another embodiment, a second subset of the cold basic blocks (e.g., {1, 5, 8, 9}) can be randomly selected from the subset {1, 2, 7, 8, 9} to construct a unique identification key. The sequential order of the randomly selected cold basic blocks may be rearranged to construct a unique identification key (e.g., {5, 9, 1, 8}), as shown in
In some embodiments, one or more optimization tools may be used for rearranging the order of the basic blocks as provided above. For example, an optimization tool configured for tuning the output of a compiler or maximizing the efficiency of an executable program may be used to rearrange the order of the basic blocks in the original program code. The following publications, the entire content of which is incorporated by reference herein, disclose exemplary optimization tools or methods that may be utilized to implement the rearrangement process disclosed here.
Nahshon and D. Bernstein, “FDPR—A Post-Pass Object Code Optimization Tool”, Proc. Poster Session of the International Conference on Compiler Construction, pp. 97-104, April 1996; G. Haber, E. A. Henis, and V. Eisenberg, “Reliable Post-link Optimizations Based on Partial Information” Proc. Feedback Directed and Dynamic Optimizations 3 Workshop, December 2000; E. A. Henis, G. Haber, M. Klausner and A. Warshavsky, “Feedback Based Post-link Optimization for Large Subsystems” Second Workshop on Feedback Directed Optimization, pp. 13-20, November 1999; R. Cohn, D. Goodwin, and P. G. Lowney, “Optimizing Alpha Executables on Windows NT with Spike”, Digital Technical Journal, vol. 9, no. 4, Digital Equipment Corporation 1997, pp. 3-20; T. Romer, G. Voelker, D. Lee, A. Wolman, W. Wong, H. Levy, B. Bershad and B. Chen, “Instrumentation and Optimization of Win32/Intel Executables Using Etch”, Proceedings of the USENIX Windows NT Workshop. August 1997, pp. 1-7.
In one embodiment, the above noted optimization tools or other control flow management tools may be used to add the needed control flows (e.g., branch instructions) to maintain the control transition between basic blocks as it is in the original program code. For example, referring to
Accordingly, when the target program is constructed, the target program will comprise the basic blocks of the original program code in a new sequence that is unique with reference to the initial order of the basic blocks in the original program code. Thus, if someone makes an unauthorized copy of the target program code, the unique position attributes associated with the plurality of basic blocks in the target program code are also transferred to the copy of the target program code.
Referring to
For example, as shown in
In some embodiments, if the copy is determined to be illegitimate, then the legitimate owner of the target program code may be determined by mapping the unique identification key to the entity to which the unique identification key was issued or assigned. In this manner, the source of an illegitimate copy can be identified and further action may be taken to determine how to respond to the unauthorized copying of the program code.
The advantage of using different permutations of basic blocks in a program code to generate a corresponding unique identification key is that a hacker, by looking at the target program code, will be unable to determine whether the basic blocks have been rearranged. Therefore, unless the hacker knows the sequential arrangement of the basic blocks in the original program code, he won't be able to determine how the basic blocks in the target program code have been rearranged, and therefore cannot extract or remove the unique identification key.
Thus, the rearrangement of the basic blocks creates a watermark for the program code that is invisible to the hacker without the knowledge of the original order of the basic blocks. As such, in contrast to other watermarking methods that embed a specific character string in the program code as the identification key, a hacker will be unable to search for an embedded identification key. Since, it is nearly impossible for an outsider to know the original order of the basic blocks, finding the unique identification key, or rearranging the basic blocks to their initial state would be very difficult.
In different embodiments, the invention can be implemented either entirely in the form of hardware or entirely in the form of software, or a combination of both hardware and software elements. For example, one or more computing systems in conjunction with one or more software environments may be used to identify and rearrange the basic blocks in a program code or construct and extract the unique identification key. The computing systems and software environments may comprise a controlled computing system environment that can be presented largely in terms of hardware components and software code executed to perform processes that achieve the results contemplated by the system of the present invention.
Referring to
As provided here, the software elements that are executed on the illustrated hardware elements are described in terms of specific logical/functional relationships. It should be noted, however, that the respective methods implemented in software may be also implemented in hardware by way of configured and programmed processors, ASICs (application specific integrated circuits), FPGAs (Field Programmable Gate Arrays) and DSPs (digital signal processors), for example.
Software environment 1120 is divided into two major classes comprising system software 1121 and application software 1122. System software 1121 comprises control programs, such as the operating system (OS) and information management systems that instruct the hardware how to function and process information.
In a preferred embodiment, a software application is implemented as application software 1122 executed on one or more hardware environments to rearrange the basic blocks of an original program code to generate a target program code and a unique key from the rearranged basic blocks or to extract a unique key from the rearranged basic blocks. Application software 1122 may comprise but is not limited to program code, data structures, firmware, resident software, microcode or any other form of information or routine that may be read, analyzed or executed by a microcontroller.
In an alternative embodiment, the invention may be implemented as computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate or transport the program for use by or in connection with the instruction execution system, apparatus or device.
The computer-readable medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read/write (CD-RW) and digital video disk (DVD).
Referring to
A user interface device 1105 (e.g., keyboard, pointing device, etc.) and a display screen 1107 can be coupled to the computing system either directly or through an intervening I/O controller 1103, for example. A communication interface unit 1108, such as a network adapter, may be also coupled to the computing system to enable the data processing system to communicate with other data processing systems or remote printers or storage devices through intervening private or public networks. Wired or wireless modems and Ethernet cards are a few of the exemplary types of network adapters.
In one or more embodiments, hardware environment 1110 may not include all the above components, or may comprise other components for additional functionality or utility. For example, hardware environment 1110 can be a laptop computer or other portable computing device embodied in an embedded system such as a set-top box, a personal data assistant (PDA), a mobile communication unit (e.g., a wireless phone), or other similar hardware platforms that have information processing and/or data storage and communication capabilities.
In some embodiments of the system, communication interface 1108 communicates with other systems by sending and receiving electrical, electromagnetic or optical signals that carry digital data streams representing various types of information including program code. The communication may be established by way of a remote network (e.g., the Internet), or alternatively by way of transmission over a carrier wave.
Referring to
Software environment 1120 may also comprise browser software 1126 for accessing data available over local or remote computing networks. Further, software environment 1120 may comprise a user interface 1124 (e.g., a Graphical User Interface (GUI)) for receiving user commands and data. Please note that the hardware and software architectures and environments described above are for purposes of example, and one or more embodiments of the invention may be implemented over any type of system architecture or processing environment.
It should also be understood that the logic code, programs, modules, processes, methods and the order in which the respective steps of each method are performed are purely exemplary. Depending on implementation, the steps can be performed in any order or in parallel, unless indicated otherwise in the present disclosure. Further, the logic code is not related, or limited to any particular programming language, and may comprise of one or more modules that execute on one or more processors in a distributed, non-distributed or multiprocessing environment.
The present invention has been described above with reference to preferred features and embodiments. Those skilled in the art will recognize, however, that changes and modifications may be made in these preferred embodiments without departing from the scope of the present invention. These and various other adaptations and combinations of the embodiments disclosed are within the scope of the invention and are further defined by the claims and their full scope of equivalents.