The present invention relates to an apparatus and to a method to remanipulate instructions to be processed in a processor and especially to a manipulation of a digital stream of instructions.
In modern communication technology, it has become increasingly important to provide a high measure of security for sensible data. Since most of the data nowadays is processed by computers, or by processors, it becomes increasingly important to protect the data handling of the computer. This includes especially a protection with respect to the processing of data or of instructions, which are processed by the Central Processing Unit (CPU). Unauthorized persons should not get access to the data or should be unable to understand the way how the data is processed by the processor. Applications such as pay-TV or other applications whose content should be made accessible only to a limited number of authorized customers rely on an efficient way to cover the manner in which the processor is handling data. This is basically no problem for proprietary processor systems, which are however costly. The pressure for cost efficiency is putting at the same time constraints on the hardware to be used in communication devices. The aim is to use as much standard components as possible, which are widely available at an affordable cost. These standard components are, however, an easy target for an attack, for example, to uncover the data handling. Therefore, there is a need to cover, as efficiently as possible, the data handling of standard components like standard processors or standard CPUs.
Embodiments of the present invention relate to an apparatus for modifying instructions of a machine readable program according to remanipulation rules. The apparatus includes a remanipulation unit, which is configured to identify a manipulated instruction and to remanipulate the manipulated instruction according the remanipulation rules. The apparatus further comprises a processor unit configured to process a predetermined instruction set, wherein the predetermined instruction set includes manipulated instructions and remanipulated instructions.
The features of the embodiments of the invention will be more readily appreciated and better understood by reference to the following detailed description, which should be considered with reference to the accompanying drawings, in which:
Before embodiments of the present invention are explained in more detail below with reference to the drawings, it is to be noted that equal elements, or those operating in an equal manner are provided with same or similar reference numerals in the figures, and that a repeated description of these elements is omitted.
A processor to be used in a computer, for example, can handle a number of processor-specific instructions (instruction set) and a (computer) program comprises in general a stream of instructions to be processed by the processor. Hence, the set of instructions comprising the program should be processor compatible in order to be processed by the processor. In case the program as a sequence of instructions is known for a given type of processor, the program can be executed on any other instruction compatible processor. It is, moreover, possible to uncover and obtain the functionality of the program and to realize it on another processor.
In order to avoid this for critical applications, as for example for pay-TV cards, usually processors are used, that support only a fixed set of instructions, which are moreover not supported by so-called off-the-shelf processors.
This offers the following advantages:
(a) a potential attack could only be made possible by a significant effort to obtain the instructions of the program from the byte stream (digital data stream) and thereby understand the meaning and/or functioning of the program. This task is made more difficult, for example, by the fact that the data book is usually distributed only under a non-disclosure agreement;
b) the simple copying and executing of the program on a computer using a standard processor is made impossible by this.
The embodiments of the present invention relate to the possibility that the instruction set in the byte stream of the program can be manipulated and remanipulation directly before it is input into the processor or the central processor unit (CPU). The manipulated instruction set makes it difficult to understand program code and to remake the program. The manipulation rules used to manipulate instructions in the first place and to remanipulate the instruction, directly before inputting them into the CPU can theoretically, be made as complex as possible, so that it becomes in practice impossible to uncover the instructions set during a possible attack. Hence, the complete functionality of the program can only be understood in connection with the knowledge of the manipulation rules.
Embodiments of the present invention comprise an apparatus to modify instructions of a machine readable program according to a remanipulation rule, the apparatus comprising a remanipulation unit and a processor unit. The remanipulation unit is configured to identify a manipulated instruction within the instruction set, and moreover, to remanipulate the manipulated instruction according to the remanipulation rules. The processor unit processes a predetermined instruction set comprising the manipulated instructions, as well as the remanipulated instructions. Therefore, the manipulation of the instructions do not change the instructions itself, but change or modify the meaning of a given instruction according to the manipulation rules (adding instead of subtracting certain register content, for example).
Further embodiments comprise also an apparatus to manipulate the instruction set according to a manipulation rule. The apparatus comprises a manipulation unit configured to manipulate instructions and to output the manipulated instructions. The apparatus to manipulate instructions can, for example, be part of a compiler used to generate the byte stream from a program source code.
According to embodiments, the instructions of a program are manipulated or remanipulated, before they are input into the processor unit. The manner of manipulation can be quite flexible and are determined by the programmer. In the following, three examples are explained in more detail.
Modifying the meaning. The simplest way to manipulate an instruction set is to change the meaning of a given instruction. For example, instruction 1 can be given the meaning of instruction 2 and vice-versa. By this, the whole set of instructions can be mapped on another set of instructions in the manner that each instruction is mapped onto a new instruction having a different meaning. A simple example is given by changing operation codes (op-codes) which refer to an addition or subtraction of register entries:
This mapping can comprise all instruction or only part of them, e.g. instruction, which are typically used regularly. The remaining instruction can be left unmodified.
Mapping more than one op-code onto a single instruction. This means that for example an instruction 1 and an instruction 2 are both mapped onto an instruction 3 or a plurality of instructions are mapped onto a single instruction. For example, two different operations can be mapped on NOPs (NOP=No Operation):
By mapping two instructions onto one instruction, the instruction set, effectively looses one instruction and hence is not a uniquely reversible process. This, however, can be tolerated as many programs use only a reduced set of instructions and the unused instructions can be used for the manipulation purposes. For example, in the manipulation process the instruction 3 is randomly mapped onto the instruction 1 and instruction 2 and both are mapped back to instruction 3 in the remanipulation process.
A further example is not to map a given instruction onto a new instruction but instead to map a sequence of instructions onto a new instruction or onto a new sequence of instructions. For example, an instruction 1 followed by an instruction 2 can be mapped onto an instruction 3 followed by an instruction 4. Another example would be that the instruction 1 followed by the instruction 2 can be mapped onto an instruction 4. Also here, a whole plurality of subsequent instructions can be mapped onto another plurality of subsequent instructions, but only one plurality of subsequent instructions can be executed by the processor in a sensible way.
For example, a mapping of a sequence of op-codes onto another sequence of op-codes comprises:
and therefore, an instruction of an adding followed by a subtracting is remanipulated into two NOP instructions.
Thus, embodiments of the present invention relate also to a method for (re)manipulating the instruction-byte-stream before it is input into the instruction decoder of a CPU. The manipulation rules can be made as flexible or as involved as possible and can even change during operation. This means that the instructions are not always (re)manipulated in the same way—the instruction rules, can for example, be time dependent (depending for example on the current day) or depend on the occurrence within the program or within a subroutine of the program.
The manipulation rules can, for example, be input into the remanipulation unit by a separate input and hence, can be stored separately from the program code, which makes the system much more secure, since for a potential attacker it is difficult to understand the meaning of the program code and to reengineer a program.
Embodiments of the present invention comprise numerous advantages over conventional methods. For example, in case it would be possible to read the program, it is nevertheless very difficult to obtain the meaning of functionality of the program, since the manipulation rules are still needed for this. It is also advantageous that standard processor, as for example, ARM processors (ARM=Acorn Risc Machine) can be used without risking a successful attack. Without instruction manipulation, this would be impossible, as it would be simple to uncover the program structure and the instruction stream. It is moreover of advantage, that the manipulation or the remanipulation can optionally be done on-the-fly. This means that one and the same code piece could comprise two meanings, for example, to add or to subtract data. An example of this is given by the manipulation rule, where the meaning of the manipulated instruction depends on the previous instruction or on a sequence of previous instructions.
Hence, the byte stream of instructions is input into the remanipulation unit 120 and after identifying and remanipulating the manipulated instruction 112 the set of instructions, is sent to the processor unit 130. The remanipulation rules 114 can be input separately in order to support a preferred embodiment, wherein the remanipulation rules 114 and the program code are stored separately. This makes it more difficult to understand and comprehend the meaning of the program code.
In this embodiment, the first, third and fourth instruction 410, 430, 440 comprise unmodified instructions 113 and hence, the detector 210 in the remanipulation unit 120 sends these instructions directly to the output interface 216, whereas the second instruction 420 will be identified by the detector 210 as a manipulated instruction 112 and send to the remanipulator 220 in order to be remanipulated, and subsequently sent to the output interface 216. In order to identify a manipulated instruction 112, the detector 210 reads the remanipulation rules 114, which identify the instructions which have been manipulated.
In the process of manipulating original instructions to obtain the manipulated instructions 112, the sixth instruction I6 was mapped onto the seventh instruction I7 and also onto the eighth instruction I8. Since both instructions are remanipulated by the remanipulation unit 120 to the same instruction I6, the manipulation unit can map the instruction I6, for example, randomly (or using a certain system) to the instructions I7 and I8. Since after remanipulation the instructions I7 and I8 are absent in the stream of instructions, this is only consistent if, for example, the program does not need the instructions I7 or I8. Otherwise, the mapping or remapping would be inconsistent. Typically, this does not represent a problem, since most programs use anyway only a restricted set of instructions.
Hence, embodiments change only the mapping of given instructions to certain operations, but leave the instruction set of a given processor unit 130 unchanged—the instructions set of the processor unit 130 is neither configured nor changed. This is especially important in order to use standard CPUs and to put all the manipulation or remanipulation into a manipulation unit 120′ or remanipulation unit 120.
The remanipulation rules 114 only need to change the operation code (in the first group 510) so that the operation done by this instruction is changed. The remaining values in the second, third and fourth group 520, 530, 540 of the instruction I can remain unchanged. The operation code is typically encoded by a binary number, so that in the process of remanipulation one binary number is changed into another binary number. Thus, embodiments change only the operation code, but not the whole instruction, that means the second, third and fourth group 520, 530, 540 in the instruction I are not changed. Manipulating instructions refers therefore to a manipulation of the operation to be performed and does not refer to changes in the arguments (or objects) of the operation.
The manipulation unit 120′ can for example be combined with a compiler, which is used for compiling the program from the source code into a stream of executable instructions. Further embodiments use a different compiler for compiling the program, so that from the source code, a different or manipulated machine code is generated and the remanipulation unit identifies the “errors” and modifies the instruction code in a manner that it becomes executable in a sensible way by the CPU (processor unit 130).
The instructions 116′ and 116 are real instructions to be processed by the processor unit 130 in a sensible way (so that the program fulfills its purpose). The stream of instructions 112 comprise a byte stream comprising at least a part of manipulated instructions. The stream of instructions 112 comprise also valid instructions and therefore it can be executed by the processor unit 130—but the program will crash or generate meaningless output.
The (re)manipulation rules 114′, 114 can also be changed dynamically so that it becomes impossible to conclude from manipulation rules 114′ valid for a given time period to manipulation rules valid in the future. This makes it more difficult for a potential attacker to understand and to retrieve the functionality of the program.
The main advantage of embodiments is that standard CPUs can be used and only needs to be connected with a one dimensional matcher (remanipulation unit 120) which for example, just interchanges the op-codes adding to subtracting or moving one storage cell to moving a different storage cell. The (re)manipulation rules 114, 114′ can, for example, be stored on a separate chip, which is difficult to read out and, hence, improves even further the security. The dynamic change of the (re)manipulation rules 114, 114′ can for example also comprise the possibility that for different program parts, different manipulation rules are used. The different program parts can for example comprise different sub-routines, multi-stages algorithms (AIS) and for each of them, a different manipulation or remanipulation rules can be used.
This can, moreover, be combined with a change in the manipulation rules during the operation time. The illustrated examples for manipulating instructions (see table in
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disk or a CD having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive methods are performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine readable carrier, the program code being operative for performing the inventive methods when the computer program product runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer.