The present invention relates to an obfuscation device, an obfuscation method, and an obfuscation program.
In the related art, analyzing how binary data to be analyzed calls what application programming interface (API) (which may be referred to as an “API call” below) has played an important role in program analysis. In addition, in order to analyze an API call, it is necessary to discover the boundary between a program to be analyzed and the API (which may be referred to as a “hook point” below). In other words, because a hook point is determined at the time of linking, it is necessary to find it later. The insight about the internal structure of the program is obtained when an API call is analyzed, and thus the author has motivation to obfuscate the API call.
A number of API call obfuscation techniques have been proposed so far. These include, for example, an API location obfuscation technique, a Dynamic Link Library (DLL) location obfuscation technique, and the like. These techniques make reference to the dynamically linked library files and the APIs in the library files complicated.
[NPL 1] Yuhei Kawkova, 2019, “Taint-based analysis Technologies against Evasive Malware,” Waseda University.
However, in the related art, an API call using binary data cannot be sufficiently obfuscated. For example, the related art described above is not robust with respect to an analysis method using an approach based on taint propagation or the like (for example, refer to NPL 1). The reason for this is that, in the related art, dynamic linking is generally used, binary data is referred to when a library file is to be executed, which enables the library file to be tracked, and thus a hook point can be searched for even when an API call is obfuscated.
To solve the above-described problems, the present invention includes an analyzing unit that converts first binary data output as an executable file into a first intermediate representation, a rewriting unit that inserts a predetermined code called when the first binary data is output into the first intermediate representation acquired from the analyzing unit and rewrites the first intermediate representation into a second intermediate representation, and an output unit that reads the predetermined code inserted by the rewriting unit, converts the second intermediate representation into executable second binary data, and outputs the second binary data when the second intermediate representation is to be converted into binary data.
The present invention can sufficiently obfuscate an API call using binary data.
An embodiment of an obfuscation device, an obfuscation method, and an obfuscation program according to the present invention will be described below in detail based on the drawings. Further, the present invention is not limited by the embodiment described below.
Hereinafter, a configuration of an obfuscation device according to the present embodiment, processing by an analysing unit, processing by a rewriting unit, processing by an output unit, and a flow of an obfuscation process will be described in order, and the effects of the present embodiment will be described at the end.
A configuration of an obfuscation device 10 according to the present embodiment will be described using
The input unit 11 inputs various types of information to the obfuscation device 10. The input unit 11 includes input devices, for example, a touch panel, a voice input device, a keyboard, a mouse, and the like. The display unit 12 outputs various types of information from the obfuscation device 10. The display unit 12 includes, for example, a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, or the like.
The communication unit 13 performs data communication with another device. The communication unit 13 performs data communication with other communication devices, for example. Further, the communication unit 13 can perform data communication with a terminal of an operator, which is not illustrated.
The control unit 14 controls the entirety of the obfuscation device 10. The control unit 14 includes an analyzing unit 141, a rewriting unit 142, and an output unit 143. Here, the control unit 14 is, for example, an electronic circuit such as a central processing unit (CPU) or a micro processing unit (MPU), or an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
The analyzing unit 141 converts first binary data 20 output as an executable file into a first intermediate representation 22. For example, the analyzing unit 141 converts the first binary data 20 into the first intermediate representation 22 by using an inverse assembler 141a. In addition, the analyzing unit 141 converts an API called when the first binary data 20 is output into a third intermediate representation 32 by using the inverse assembler 141a. Detailed description of the processing by the analyzing unit 141 will be described later.
Further, the above-described binary data may be a file in which binary data is stored, that is, a binary file, and is not particularly limited. In addition, although the above-described predetermined code is, for example, an original code of an API included in a library file referred to at the time of linking, or the like, it is not particularly limited.
The rewriting unit 142 inserts the predetermined code called when the first binary data 20 is output into the first intermediate representation 22 acquired from the analyzing unit 141, and rewrites the first intermediate representation into a second intermediate representation 40. For example, the rewriting unit 142 inserts the API called when the first binary data 20 is output into the first intermediate representation 22 acquired from the analyzing unit 141 by means of in-line expansion, and rewrites the first intermediate representation into the second intermediate representation 40. In addition, the rewriting unit 142 inserts an API called based on dynamic linking when the first binary data 20 is output by means of in-line expansion and rewrites the API into the second intermediate representation 40 which is eligible for static linking. Detailed description of the processing by the rewriting unit 142 will be described later.
When the second intermediate representation 40 is to be converted into binary data, the output unit 143 reads the predetermined code inserted by the rewriting unit 142, converts the second intermediate representation 40 into executable second binary data 70, and outputs the second binary data 70. For example, the output unit 143 converts the second intermediate representation 40 into the second binary data 70 based on static linking and outputs the second binary data 70. In addition, the output unit 143 calls an API different from the API inserted into the first intermediate representation 22 based on dynamic linking. Detailed description of the processing by the output unit 143 will be described later.
The storage unit 15 stores various types of information referred to by the control unit 14 for operations and various types of information acquired by the control unit 14 while it is operating. Here, the storage unit 15 is, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disc. Further, although the storage unit 15 is installed inside the obfuscation device 10 in the example of
Next, a general obfuscation process and the like will be described, and then a flow of each process of the obfuscation device 10 according to the present embodiment will be described using
The general obfuscation process is performed as one of various optimization operations for intermediate representation in the course of the process of inputting a source code and outputting an object file (compiling) using a compiler. For example, a source code is converted into a token sequence by a lexical analyzer, the token sequence is converted into an abstract syntax tree by a syntax analyzer, the abstract syntax tree is converted into an intermediate representation by a semantic analyzer, the intermediate representation is optimized using an optimization path, and the optimized intermediate representation is converted into an object file by a code generator. In this course, a method for enhancing the efficiency of a program and a method for applying obfuscation to the program can be applied to the intermediate representation.
In addition, a general linking process is a process of inputting an object file and a library file and outputting binary data as an executable file using a linker. The library file includes, as a function provided by an operating system, an API which is a function of file input/output or the like.
Here, there are two types of linking which are static linking and dynamic linking, and dynamic linking is generally used. Static linking is a linking system in which an API itself included in a library file is embedded in binary data, and the API can be executed even in an environment with no library files. On the other hand, dynamic linking is a linking system in which an external reference for a library file is embedded in binary data, the name of the library file and the name of the API desired to be used are embedded in the external reference, and a program is executed after the library file and the API are automatically retrieved for program execution.
In addition, binary re-rewriting is a process of inputting binary data and outputting binary data with some of functions rewritten. Specifically, when an intermediate representation is restored from binary data by means of a binary analysis tool or the like using an inverse assembler or the like and the intermediate representation restored here is given to an optimization path of a compiler, a program optimization method or an obfuscation method can be applied similarly to the compiler. After the process such as the optimization, and the like, the binary data is output using a linker.
With respect to the above-described obfuscation process, the existing techniques are performed based on dynamic linking which is a technique of loading the original code of the API included in the library file at the time of execution, and thus the dynamic linking is vulnerable to program analysis or the like. For this reason, the obfuscation device 10 according to the present embodiment is able to remake data using static linking that is a technique of not loading the original code of the API included in the library file at the time of execution. Specifically, the obfuscation device 10 uses a binary re-rewrite technique to transplant the original code of the dynamically linked API in binary data that calls the original code from the library file and remakes the data into binary data to which static linking has been applied.
Firstly, the flow of processing by the analyzing unit according to the present embodiment will be described using
Then, the first binary data 20 and the API 30 are converted into an assembly instruction sequence A 21 and an assembly instruction sequence B 31 through the inverse assembler 141a. Finally, the assembly instruction sequence A 21 and the assembly instruction sequence B 31 are converted into an intermediate representation A (a first intermediate representation) 22 and an intermediate representation B 32 (a third intermediate representation), respectively, via a lifter 141b, and transmitted to the rewriting unit 142.
Here, all APIs included in the library file may be converted into an intermediate representation B through processing by the inverse assembler 141a and the lifter 141b. In addition, some of the APIs included in the library file may be converted into the intermediate representation B, or the APIs may not be converted into the intermediate representation B.
Secondly, the flow of processing by the rewriting unit according to the present embodiment will be described in detail using
Next, the intermediate representation A 22 and the intermediate representation B 32 are rewritten to be an intermediate representation C 40 which is an optimized intermediate representation via an optimization path 142a of the rewriting unit 142. Here, the intermediate representation B 32 based on the API 30 is inserted into the intermediate representation A 22 based on the first binary data 20 by means of in-line expansion, which will be described later, and thereby an intermediate representation C (a second intermediate representation) 40 is generated. Finally, the generated intermediate representation C 40 is transmitted to the output unit 143.
Further, the processing performed for the generation of the intermediate representation C 40 is not limited to in-line expansion. A method for enhancing the efficiency of a program other than in-line expansion or a method for applying obfuscation to the program may be performed as an optimization process through The optimization path 142a.
Furthermore, in-line expansion according to the present embodiment will be described in detail using
In a flow using a normal function, the function B required for executing the certain function A is outside the function A, and a process of calling the function B is performed when the function A is executed or the like (see the normal function in
In the present embodiment, the intermediate representation C 40 which does not require an API call to the outside at the time of execution is generated by describing the intermediate representation B 32 based on the API 30 in the intermediate representation A 22 based on the first binary data 20.
Thirdly, the flow of processing by the output unit according to the present embodiment will be described using
Next, the intermediate representation C 40 is converted into an object file 50 through a code generator 143a of the output unit 143. Finally, the object file 50 is output as binary data (second binary data) 70 that is an executable file based on the inserted API 30 through a linker 143b, and is processed for display through a display unit 12.
Further, the object file 50 may be called when the first binary data 20 is output, and refer to a library file 60 including an API different from the inserted API to be output as binary data 70 which is an executable file through the linker 143b.
An example of a procedure of the obfuscation process according to the present embodiment will be described using
Next, the inverse assembler 141a of the analyzing unit 141 generates an assembly instruction sequence A 21 from the executable file 20. In addition, the inverse assembler 141a of the analyzing unit 141 generates an assembly instruction sequence B 31 from the library file 30 (step S102). Further, the assembly instruction sequence A 21 and the assembly instruction sequence B 31 may be generated simultaneously. In addition, the generation of the assembly instruction sequence A 21 may be performed prior to the generation of the assembly instruction sequence B 31, or the generation of the assembly instruction sequence B 31 may be performed prior to the generation of the assembly instruction sequence A 21.
Then, the lifter 141b of the analyzing unit 141 generates an intermediate representation A 22 from the assembly instruction sequence A 21. In addition, the lifter 141b of the analyzing unit 141 generates an intermediate representation B 32 from the assembly instruction sequence B 31 (step S103). Further, the intermediate representation A 22 and the intermediate representation B 32 may be generated simultaneously. Furthermore, the generation of the intermediate representation A 22 may be performed prior to the generation of the intermediate representation B 32, or the generation of the intermediate representation B 32 may be performed prior to the generation of the intermediate representation A 22.
Subsequently, the optimization path 142a of the rewriting unit 142 inserts the intermediate representation B 32 acquired from the analyzing unit 141 into the intermediate representation A 22 acquired from the analyzing unit 141 by means of in-line expansion to generate the optimized intermediate representation C 40 (step S104).
Next, the code generator 143a of the output unit 143 generates an object file 50 from the intermediate representation C 40 acquired from the rewriting unit 142 (step S105). Finally, the linker 143b of the output unit 143 generates an executable file 70 including binary data from the object file 50 (step S106), and the process ends.
Firstly, the obfuscation device 10 according to the above-described present embodiment converts the first binary data 20 output as an executable file into the first intermediate representation 22, inserts a predetermined code called when the first binary data 20 is output into the acquired first intermediate representation 22, and rewrites the first intermediate representation as the second intermediate representation 40, and reads the inserted predetermined code, converts the second intermediate representation 40 into executable second binary data 70, and outputs the executable second binary data 70 when the second intermediate representation 40 is to be converted into binary data. Thus, calling of the predetermined code using the binary data can be sufficiently obfuscated.
Secondly, the obfuscation device 10 inserts the API called when the first binary data 20 is output into the acquired first intermediate representation 22 by means of in-line expansion, and rewrites the API into the second intermediate representation 40. Thus, the API call using the binary data can be sufficiently obfuscated.
Thirdly, the obfuscation device 10 inserts the API called based on dynamic linking when the first binary data 20 is output by means of in-line expansion, rewrites the API into the second intermediate representation 40 that is eligible for static linking, converts the second intermediate representation 40 into the second binary data 70 based on the static linking, and outputs the second binary data 70. Thus, it makes difficult to find a hook point, an API call using the binary data can be made more obfuscated, and further illegal copy of original logic included in software and illegal use of software without permitted license can be prevented.
Fourthly, the obfuscation device 10 converts the first binary data 20 into the first intermediate representation 22 using the inverse assembler. Thus, it makes more difficult to find a hook point, and an API call using the binary data can be made further obfuscated.
Fifthly, the obfuscation device 10 converts an API called when the first binary data 20 is output into the third intermediate representation 32 using the inverse assembler. Thus, it makes more difficult to find a hook point, and an API call using the binary data can be made further obfuscated.
Sixthly, the obfuscation device 10 calls an API different from the API inserted into the first intermediate representation 22 using dynamic linking. Thus, the API that is not eligible for static linking can be called.
Each component of each device illustrated according to the above-described embodiment is a functional concept and needs not necessarily be physically configured as shown. That is, the specific forms of distribution and integration of the devices are not limited to the forms illustrated in the figure, and all or part of them can be configured by functionally or physically distributing and integrating them in any unit according to various loads, use situations, or the like. In addition, all or any part of the processing functions performed by the devices may be achieved by a CPU and programs analyzed and executed by the CPU or achieved as the wired logic of hardware.
Also, among the processes described in the present embodiment, all or some processes that are described as being automatically executed may also be manually executed, or all or some of processes that are described as being manually executed may also be automatically executed using a known method. In addition, the processing procedure, the control procedure, specific names, information including various data and parameters that are shown in the above document and drawings may be arbitrarily changed unless otherwise described.
In addition, a program that describes processes executed by the obfuscation device 10 described in the above embodiment in a computer-executable language may be created. In this case, the same effects as in the above embodiment may be exhibited by a computer executing the program. Furthermore, processes similar to those of the foregoing embodiment may be also realized by recording the creation program in a computer-readable recording medium, and causing a computer to load and execute the creation program recorded in this recording medium.
The memory 1010 includes a read only memory (ROM) 1011 and a RAM 1012 as illustrated in
Here, the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094 as illustrated in
Moreover, various types of data described in the foregoing embodiment may be stored, as program data, in the memory 1010 or the hard disk drive 1090, for example. In addition, the CPU 1020 reads the program module 1093 or the program data 1094 stored in the memory 1010 or the hard disk drive 1090 onto the RAM 1012 as needed, and executes various types of processing procedures.
Further, the program module 1093 and the program data 1094 related to the program need not be stored in the hard disk drive 1090, and may also be stored in, for example, a removable storage medium and loaded by the CPU 1020 via a disk drive or the like. Alternatively, the program module 1093 and the program data 1094 related to the program may also be stored in another computer that is connected via a network (a local area network (LAN), a wide area network (WAN), or the like) and loaded by the CPU 1020 via the network interface 1070.
The above-described embodiment and modification thereof are included in the technology disclosed by the present application, as well as in the scope of the invention described in the claims and the equivalent range.
10 Obfuscation device
11 Input unit
12 Display unit
13 Communication unit
14 Control unit
141 Analyzing unit
141
a Inverse assembler
141
b Lifter
142 Rewriting unit
142
a Optimization path
143 Output unit
143
a Code generator
143
h Linker
15 Storage unit
20, 70 Executable file
21 Assembly instruction sequence A
22 Intermediate representation A
30, 60 Library files
31 Assembly instruction sequence B
32 Intermediate representation B
40 Intermediate representation C
50 Object file
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/034332 | 9/10/2020 | WO |