In the current design of computers, it is possible to inspect code before and during execution. The information obtained can be used to reverse engineer, modify, or attack the code. Obfuscation is a technique for mitigating this issue. Obfuscation transforms original code into an obfuscated code. The obfuscated code is less intelligible, but behaves like the original. Various obfuscation techniques have been proposed. Some apply to source code and some apply to compiled code, while others apply to both. Obfuscation techniques integrated into the compiler have also been proposed. A disadvantage of existing obfuscators is that the original code must exist in order for the obfuscator to produce an obfuscated code. Moreover, the obfuscator must parse and preprocess the original code in order to produce an obfuscated code. Systems using obfuscation have also been proposed. In these systems, all users receive the same obfuscated instance, or each user receives a different obfuscated instance coupled with a program that allows the instance to be operable. Due to the complexity of implementing unique instances per user, existing obfuscation systems do not take full advantage of the benefits of obfuscation.
Embodiments are provided for code obfuscation. In one embodiment, a code representation is obtained when a code template is applied to data. A code host selects a location for the code representation and returns a reference. The reference can be used to replace the data and thus may be used for code obfuscation. The original code may not be required. In another embodiment, unique obfuscated instances are provided when requests are received.
The following figures illustrate the embodiments by way of example. They do not limit their scope.
This section includes detailed examples, particular embodiments, and specific terminology. These are not meant to limit the scope. They are intended to provide clear and through understanding, cover alternatives, modifications, and equivalents.
Obfuscation is a transformation from code in one domain to another code in the same or another domain. The code may be in source form or in binary form. Binary form describes any code that is not source code. It includes, but is not limited to, object form, machine code, and microcode. The transformed code is intended to be less intelligible than the original code, while preserving the original code behavior.
A parsing obfuscation is an obfuscation that requires the original code in order to produce transformed code. It parses the original code. A referencing obfuscation is an obfuscation that does not require parsing of the original code and may not require the original code at all. A referencing obfuscation creates new code from existing code templates. The output of a referencing obfuscation is called a reference construct. The reference construct includes a reference to a function that would execute the code template. The code template may return a value, possibly void. The changes that would need to be made to the original code in order to incorporate the reference and the code it depends on may be included in the reference construct or they can be included elsewhere. Transformed code is created when these changes are applied. The changes may add new code that may or may not reference the original code, or they may modify a copy of the original code, or both.
int x=1000+234;
return x;
The selection logic 102 may select a code templating unit in any way, including random and fixed selection. The type of the data contained in the input 100 can be any type permitted by the code, including “void” and user defined types.
A code host 106 takes the code representation 104, selects a location for the code representation, and returns a reference construct 108 as output. A reference construct contains a reference to the code representation, and may optionally include a description of the changes that would need to be made to the original code in order to incorporate the reference and the code the reference depends on. The location selected by the code host 106 can be a new file or it can be selected from a list of files, or both. Furthermore, the selection can be random or fixed and the locations used by the code host 106 may represent files that do not exist. For example, suppose that the exemplary code template unit mentioned earlier is used to obfuscate the integer 1234 in the code below:
Further, suppose that the code host selects the same file to be the location for the code representation and that it uses “a1” as a reference. The obfuscated code may then be as below.
A description of the changes that would need to be made to the original code in order to incorporate the reference may be included in the reference construct 108. This description can be used to produce the obfuscated code, but neither the application of these changes nor the existence of the original code is required by the method.
In another embodiment, the input 100 may also contain an iteration counter, a code template unit may invoke obfuscators on elements from its code template and substitute the elements with reference constructs obtained from the obfuscators. Such code template units may be referred to as recursive. The selection logic 102 would choose a recursive code template unit only if the counter has not reached a threshold. As an example, using the above code, a code template unit whose code template contains the line “int x=1000+234” may produce a code representation containing the code “int x=a2( )+a3( )”, where “a2( )” and “a3( )” are references obtained by obfuscating the integers 1000 and 234, respectively.
The obfuscator may use random or different inputs so that a unique obfuscated instance 306 is created for each request 300. However, it can also be configured to use the same input. The original code may also have files containing data. The data files may contain, for example, a unique identifier and a password. It may also contain cryptographic material, which includes, but is not limited to, a description of keys and algorithms used in encryption, signatures, and other cryptographic algorithms. The data may be included in the instance request 300 or determined by the method, or both. The data may be identical or different for each request. Thus, the obfuscated instance 308 may have unique code and unique data files. As an example, a user, making three instance requests from the same device, may be provided with three unique obfuscated instances, and the data files of each instance may have a combination of similar data, such as a username, and unique data, such as the cryptographic material chosen by the method.
The method can use any obfuscator, regardless of whether it applies to source code or binary code. If the obfuscator produces as output a description of the changes that would need to be made to the original code in order to create an obfuscated instance, then the changes are applied prior to compilation 304. If the obfuscator modifies the original code, then the obfuscation should be applied to a copy of the original code. If the obfuscator creates a new program that only references the original code, then no changes are made to the original code, and the method does not require the original code to exist. Thus, the method can be used with only a compiled version of the original code, and a source version of the original code may not be required.
The specific embodiments and specific terminology used above should not be construed as limiting the scope of the embodiments. These details have been presented for purposes of illustration and are not intended to be exhaustive. Many modifications and uses are possible. The scope of the embodiments is defined by the Claims appended hereto and their equivalents.