This invention relates to data security, cryptography and obfuscation.
In the field of data security, there is a need for fast and secure encryption. This is why the AES (Advanced Encryption Standard) cipher has been designed and standardized to replace the DES (Data Encryption Standard) cipher. Cryptographic algorithms are widely used for encryption and decryption of messages, authentication, digital signatures and identification. AES is a well known symmetric block cipher. Block ciphers operate on blocks of plaintext and ciphertext, usually of 64 or 128 bits length but sometimes longer. Stream ciphers are the other main type of cipher and operate on streams of plain text and cipher text 1 bit or byte (sometimes one word) at a time. There are modes of operation (notably the ECB, electronic code block) where a given block is encrypted to always the same ciphertext block. This is an issue which is solved by a more evolved mode of operations, e.g. CBC (cipher block chaining) where a chaining value is used to solve the 1-to-1 map.
AES is approved as an encryption standard by the U.S. Government. Unlike its predecessor DES (Data Encryption Standard), it is a substitution permutation network (SPN). AES is fast to execute in both computer software and hardware implementation, relatively easy to implement, and requires little memory. AES has a fixed block size of 128 bits and a key size of 128, 192 or 256 bits. Due to the fixed block size of 128 bits, AES operates on a 4×4 array of bytes. It uses key expansion and like most block ciphers a set of encryption and decryption rounds (iterations). Block ciphers of this type include in each round use of substitution boxes (S-boxes). This operation provides non-linearity in the cipher and significantly enhances security.
Note that these block ciphers are symmetric ciphers, meaning the same key is used for encryption and decryption. As is typical in most modern ciphers, security rests with the (secret) key rather than the algorithm. The S-boxes accept an n-bit input and provide an m-bit output. The values of m and n vary with the cipher and the S-box itself. The input bits specify an entry in the S-box in a particular manner well known in the field.
Many encryption algorithms are primarily concerned with producing encrypted data that is resistant to decrypting by an attacker who can interact with the encryption algorithm only as a “Black Box” (input-output) model, and cannot observe internal workings of the algorithm or memory contents, etc. due to lack of system access. The Black Box model is appropriate for applications where trusted parties control the computing systems for both encoding and decoding ciphered materials.
However, many applications of encryption do not allow for the assumption that an attacker cannot access internal workings of the algorithm. For example, encrypted digital media often needs to be decrypted on computing systems that are completely controlled by an adversary (attacker). There are many degrees to which the Black Box model can be relaxed. An extreme relaxation is called the “White Box” model. In a White Box model, it is presumed that an attacker has total access to the system performing an encryption (or decryption), including being able to observe directly a state of memory, program execution, modifying an execution, etc. In such a model, an encryption key can be observed in or extracted from memory, and so ways to conceal operations indicative of a secret key are important.
Classically, software implementations of cryptographic building blocks are insecure in the White Box threat model where the attacker controls the execution process. The attacker can easily lift the secret key from memory by just observing the operations acting on the secret key. For example, the attacker can learn the secret key of an AES software implementation by observing the execution of the key schedule algorithm.
Hence there are two basic principles in the implementation of secure computer applications (software). The Black Box model implicitly supposes that the user does not have access to the computer code nor any cryptographic keys themselves. The computer code security is based on the tampering resistance over which the application is running, as this is typically the case with SmartCards. For the White Box model, it is assumed the (hostile) user has partially or fully access to the implemented code algorithms; including the cryptographic keys themselves. It is assumed the user can also become an attacker and can try to modify or duplicate the code since he has full access to it in a binary (object code) form. The White Box implementations are widely used (in particular) in content protection applications to protect e.g. audio and video content.
Straightforward software implementations of cryptographic building blocks are insecure in the White Box threat model where the attacker controls the computer execution process. The attacker can easily extract the (secret) key from the memory by just observing the operations acting on the secret key. For instance, the attacker can learn the secret key of an AES cipher software implementation by passively monitoring the execution of the key schedule algorithm. Also, the attacker could be able to retrieve partial cryptographic result and use it in another context (using in a standalone code, or injecting it in another program, as an example).
Content protection applications such as for audio and video data are one instance where it is desired to keep the attacker from finding the secret key even though the attacker has complete control of the execution process. The publication “White-Box Cryptography in an AES implementation” Lecture Notes in Computer Science Vol. 2595, Revised Papers from the 9th Annual International Workshop on Selected Areas in Cryptography pp. 250-270 (2002) by Chow et al. discloses implementations of AES that obscure the operations performed during AES by using table lookups (also referred to as TLUs) to obscure the secret key within the table lookups, and obscure intermediate state information that would otherwise be available in arithmetic implementations of AES. In the computer field, a table lookup table is an operation consisting of looking in a table (also called an array) stored in a computer memory at a given index position in the table.
Chow et al. (for his White Box implementation where the key is known at the computer code compilation time) uses 160 separate tables to implement the 11 AddRoundKey operations and 10 SubByte Operations (10 rounds, with 16 tables per round, where each table is for 1 byte of the 16 byte long—128 bit—AES block). These 160 tables embed a particular AES key, such that output from lookups involving these tables embeds data that would normally result from the AddRoundKey and SubByte operations of the AES algorithm, except that this data includes input/output permutations that make it more difficult to determine what parts of these tables represent round key information derived from the AES key. Chow et al. provide a construction of the AES algorithm for such White Box model. The security of this construction resides in the use of table lookups and permutations supplied on the input and output of table lookups. The input and output mask applied to this data is never removed along the process. In this solution, there is a need for knowing the key value at the compilation time, or at least to be able to derive the tables from the original key in a secure environment or in a secure way.
The conventional implementation of a block cipher in the White Box model is carried out by creating a set of table lookups. Given a dedicated cipher key, the goal is to store in a table the results for all the possible input messages. This principle is applied for each basic operation of the block cipher. In the case of the AES cipher, these are the shiftRow, the add RoundKey, the subByte and the mixColumns operations.
So software implementations of cryptographic building blocks (operations) are insecure in the White Box threat model where as explained above the attacker controls the execution process. The attacker can easily lift the secret key from computer memory by just observing the operations acting on the secret key. For example, the attacker can learn the secret key of an AES software implementation by observing the execution of the key schedule algorithm.
AES and other ciphers need the lookup tables stored in memory as explained above to be executed at a reasonable speed. However, the tables in the White Box implementation may be computed from the AES S-box and use important information such as keys, masks, etc. If an attacker is able to obtain a copy of these tables, he will gain an advantage in understanding the code.
Making these tables hard to understand is an important point for White Box implementation. This disclosure is directed to new ways to protect the lookup tables, by applying non-trivial transformations to them. These transformations can be corrected with execution of a few instructions and furthermore, they may be seen by an attacker as a dynamic XOR mask.
See the NIST AES standard for a more detailed description of the AES cipher: Specification for the ADVANCED ENCRYPTION STANDARD (AES), NIST, http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf. The following is a summary of the well known AES cipher. The AES cipher uses a 16 byte cipher key, and has 10 rounds (final round plus 9 others). The AES encryption algorithm has the following operations as depicted graphically in prior art
11 AddRoundKey Operations
10 SubByte Operations
10 ShiftRow Operations
9 MixColumn Operations
AES is computed using a 16-byte buffer (computer memory) referred to as the AES “state” in this disclosure and shown in
To summarize,
Preliminarily to the encryption itself, in the initial round in
The following explains AES decryption round by round. For the corresponding encryption (see
Expressed schematically, AES decryption round-by-round is as follows:
ARK (K10)
ISR
ISB
ARK (K9)
IMC
ISR
ISB
ARK (K8)
IMC
ISR
ISB
ARK (K7)
IMC
ISR
ISB
ARK (K6)
IMC
ISR
ISB
ARK (K5)
IMC
ISR
ISB
ARK (K4)
IMC
ISR
ISB
ARK (K3)
IMC
ISR
ISB
ARK (K2)
IMC
ISR
ISB
ARK (K1)
IMC
ISR
ISB
ARK (K0)
It is evident that the method in accordance with the invention can be used for decryption, encryption (see
AES is considered very efficient in terms of execution on many different computer architectures since it can be executed only with table lookups (TLU) and the exclusive-or (XOR) operation. It is known that the AES state can be handled as a 4×4 square of bytes. As a square, it can be seen as 4 columns of 4 bytes each.
As described above, AES decryption is a succession of basic operations: ISB for the inverse of SubByte, IMC (for the inverse of MixColumn) and ISR (for the inverse of ShiftRow). The ISR operation modifies the state by shifting each row of the square. This operation does not modify the bytes themselves but only their respective positions. The ISB operation is a permutation from [0, 255] to [0, 255], which can be implemented by a table look-up.
The symbol {circle around (+)} here denotes the Boolean logic “exclusive OR” (XOR operation), which is a binary operator over two values.
If b denotes a bijection, b−1 is its inverse bijection, i.e. the unique function such that for all values x, b−1(b(x))=x
If T denotes a table, T[x] denotes the x-th value in this table.
“<<” (respectively “>>”) is the conventional binary left (respectively right) bit shift operation.
“<<<” (respectively “>>>”) is the conventional binary left (respectively right) bit rotation operation, which means that the i least significant bits of (x<<<i) become the i most significant bits of x, and the (w−i) most significant bits of (x<<<i) become the (w−i) least significant bits of x, where w is the size (in bits) of x.
So to explain the present method, let G: x->G(x) denote a bijection function (on bytes) defined as:
G(x)=x{circle around (+)}F(x) (1)
where F is a given function.
Preliminarily, assume there are numerous such functions. It is shown below how to generate such functions efficiently.
Let G1 and G2 denote two such functions, where:
G
1(x)=x{circle around (+)}F1(x)
G
2(x)=x{circle around (+)}F2(x)
Let T denote a given table.
T′ then denotes the masked version of table T, defined as:
T′[y]=G
2
−1(T[G1−1(y)]) (2)
This means, apply the inverse of G1 to the input value to the table (i.e., the value y) and then apply the inverse of G2 to the output value of the table, i.e., the T[something] value.
Given the table T′, recovering the original (non-obfuscated) table entry T[x] from T′ and x is carried out by executing the following steps:
1. Compute y=x{circle around (+)}F1(x)=G1(x)
2. Compute z=T′[y]
3. Return z{circle around (+)}F2(z)=G2(z), which is equal to T[x]
In the field of code (software) obfuscation, data are masked with various known methods. One of them is called “Boolean masking” and replaces the use of a data designated d by value d{circle around (+)}r, where r is a random value (chosen at the source code compilation time or at runtime); original data d thereby does not appear in computer memory and is always replaced by value d{circle around (+)}r. With the previous result, the values y and T[x] are computed using a final XOR operation. Indeed, y=x{circle around (+)}F1(x) and T[x]=z{circle around (+)}F2(z). Then, during the code (cipher process) execution, values F1(x) and F2(z) are applied as Boolean masks on values x and z respectively. Since F1(x) and F2(z) depend respectively on x and z, they misleadingly appear to be dynamic masks to an attacker. This enhances security.
When functions F1 and F2 are executed by computer processor instructions (as opposed to being executed by data look up tables stored in memory) such as +, −, *, <<<, <<, >>, >>>, / (division), etc, retrieving the value within the table T, an attacker has to retrieve T′ and has to isolate and reverse the part of the code dedicated to computing functions F1 and F2. Thus this enables one to protect tables which are stored in memory, by mixing tables (data) and processor instructions.
As shown above, this is an efficient solution, since the mask value is a function of the value x, contrary to most known masking techniques. It will be complicated for an attacker to recover the functions F1 and F2 from the code, since they are mixed in with the rest of the code (by the present obfuscation process), and so it will be complicated to recover the original table T, with its hidden secrets.
Assume that integers are represented in base 2. On a computer, each integer number h (expressed in bit form) has a width, denoted here w, usually having a value of 8, 16, 32, 64, 128. This width is the maximum number of bits that defines an integer. Since integers have width, it is possible to define some special operations on these representations that are not classical integer operations. The operations here are bit shifts and bit rotations. As explained above, right shift (respectively left shift) shifts each bit of an integer to the right (respectively left) by a specified value and removes the top right (respectively left) bits. These operations are denoted here >> and <<. Other such known operations are left and right bit rotations, denoted here <<< and >>>. These rotations are rotations on the bit representation of the integer.
Let F be a function with the following property:
For all x input, for all i in 1 to w,
(F(x)<<(i−1))>>(i−1)=F((x<<i)>>i) (3)
This particularly implies that F(0)=0.
Given a function F with the above property, function G defined as in equation (1) above is a bijection.
To prove this, one can construct an algorithm that inverts function G. As above, w denotes the width of the integer representation.
The bitwise algorithm to generate x from y is:
Input y=G(x)
Output x(=G−1(y))
For bit (w−1) of x:
For each subsequent bit (w−(i−1)) of x:
(x<<(w−(i−1)))>>(w−(i−1))= . . .
(x<<(w−i))>>(w−i)=(y<<(w−i))>>(w−i){circle around (+)}F(x<<(w−(i−1))>>(w−(i−1)))
(x<<(w−(i+1)))>>(w−(i+1))= . . .
So the result is:
x=y{circle around (+)}F((x<<1)>>1)
The main ideas in this algorithm are:
1. F(0)=0=F((x<<w)>>w), since any left or right shift of all w bits necessarily results in value 0.
2. (x<<(w−1))>>(w−1) {circle around (+)}(F(x)<<(w−1))>>(w−1)=((x{circle around (+)}F(x))<<(w−1))>>(w−1), because XOR is a bitwise operator.
So this is a proof that also enables one to compute the final x using w steps. This proves that functions F verifying equation (3) above allow one to generate invertible G functions.
Equation (3) above uses right and left bit shifts. Their roles can be inverted, resulting in:
(F(x)>>(i−1))<<(i−1)=F((x>>i)<<i) (4)
To prove the symmetric role of left and right bit shifts, the same algorithm as above can easily be constructed.
So one can use F such that for all x input, for all i in 1 to w,
(F(x)<<(i−1))>>(i−1)=F((x<<i)>>i) (3)
to construct
G(x)=x{circle around (+)}f(x)
One could use F such that for all x inputs, for all i in 1 to w,
(F(x)>>(i−1))<<(i−1)=F((x>>i)<<i) (3)
to construct
G(x)=x{circle around (+)}F(x)
In the following, simple functions F and G are presented.
Let P be a function made up of the conventional arithmetic and logical operations +, −, {circle around (+)} (XOR), *, & (AND), | (OR), <<, plus some constants; then the F function satisfying the following equation:
F(x)=2*P(x)=P(x)<<1
verifies equation (3).
The most difficult part to compute is equations (5) and (6). This is however a classical result for conventional logical bitwise or arithmetic operations (+, −, {circle around (+)}, *, &, |, <<) that:
P(x)modulus[2(w-i)]=P(x modulus[2w-i]).
Right-Shifting a Function
For equation (4), one can analogously define F functions as:
F(x)=P(x)/2=P(x)>>1
where P is a function made up of the operations {circle around (+)}, &, |, >>.
All in all, it is possible to construct F functions with the above equations.
For instance, the previous results show that
x{circle around (+)}(2*x*x), and
x{circle around (+)}((
x&(x>>1))>>1)
are examples of such invertible G functions.
The inverse of such a G function can be computed easily following the algorithm described above; moreover, it can be computed at the time of code obfuscation, i.e. when the source code to be run is generated. This produces the table T′.
If P is a function as described above, then it is also possible to construct a function F as follows:
F(x)=P(x<<<i)>>>i (7)
Indeed, if G(x)=x{circle around (+)}F(x), one obtains G′ (x)=x{circle around (+)}P(x) by computing:
Since function G′ is a bijection, function G is the composition of a bijection with two rotation functions, and thus, itself a bijection.
The above considered the equation:
G(x)=x{circle around (+)}F(x) (1)
However, for a function F verifying equation (3), one knows that the functions:
G(x)=x+F(x) (8)
and
G(x)=x−F(x) (9)
are also invertible functions. So one could use G functions with the + or − operations instead of the XOR operation. So generally, one can use G functions with {circle around (+)}, +, − and other arithmetic or logical bitwise operations that are invertible. The function F is chosen such that functions G defined by (8) and (9) above are invertible. An example of function F is given in equation (3), which is also a proof that functions F exist.
As readily understood by one skilled in the art, implementation of the above method first requires expressing a conventional (non-obfuscated) cryptographic (or similar) process as computer source code including table lookups, such as the conventional AES cipher encryption or decryption processes. Then one alters (transcodes) the conventional source code as described above so as to obfuscate the original TLU's to be instead a combination of obfuscated tables and logical and arithmetic processor instructions. This transcoded source code is then conventionally compiled into object code and executed, to carry out the cryptographic process. The above solutions thereby enable one to easily mix tables and operations in a cryptographic or similar process. Furthermore, the obfuscation would be seen by a hacker (erroneously) as being dynamic masking techniques. The above also shows how to implement these solutions with efficient formulas.
The computer code is conventionally stored in code memory (computer readable storage medium) 140 (as object code or source code) associated with conventional processor 138 for execution by processor 138. The incoming ciphertext (or plaintext) message (in digital form) is received at port 132 and stored in computer readable storage (memory 136 where it is coupled to processor 138. Processor 138 conventionally then partitions the message into suitable sized blocks at partitioning module 142. Another software (code) module in processor 138 is the decryption (or encryption) module 146 which carries out the decryption or encryption processes as set forth above, with associated computer readable storage (memory) 152.
Also coupled to processor 138 is a computer readable storage (memory) 158 for the resulting decrypted plaintext (or encrypted ciphertext) message. Storage locations 136, 140, 152, 158 may be in one or several conventional physical memory devices (such as semiconductor RAM or its variants or a hard disk drive). Electric signals conventionally are carried between the various elements of
Computing system 160 can also include a main memory 168 (equivalent of memories 136, 140, 152, and 158), such as random access memory (RAM) or other dynamic memory, for storing information and instructions to be executed by processor 164. Main memory 168 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 164. Computing system 160 may likewise include a read only memory (ROM) or other static storage device coupled to bus 162 for storing static information and instructions for processor 164.
Computing system 160 may also include information storage system 170, which may include, for example, a media drive 162 and a removable storage interface 180. The media drive 172 may include a drive or other mechanism to support fixed or removable storage media, such as flash memory, a hard disk drive, a floppy disk drive, a magnetic tape drive, an optical disk drive, a compact disk (CD) or digital versatile disk (DVD) drive (R or RW), or other removable or fixed media drive. Storage media 178 may include, for example, a hard disk, floppy disk, magnetic tape, optical disk, CD or DVD, or other fixed or removable medium that is read by and written to by media drive 72. As these examples illustrate, the storage media 178 may include a computer-readable storage medium having stored therein particular computer software or data.
In alternative embodiments, information storage system 170 may include other similar components for allowing computer programs or other instructions or data to be loaded into computing system 160. Such components may include, for example, a removable storage unit 182 and an interface 180, such as a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, and other removable storage units 182 and interfaces 180 that allow software and data to be transferred from the removable storage unit 178 to computing system 160.
Computing system 160 can also include a communications interface 184 (equivalent to part 132 in
In this disclosure, the terms “computer program product,” “computer-readable medium” and the like may be used generally to refer to media such as, for example, memory 168, storage device 178, or storage unit 182. These and other forms of computer-readable media may store one or more instructions for use by processor 164, to cause the processor to perform specified operations. Such instructions, generally referred to as “computer program code” (which may be grouped in the form of computer programs or other groupings), when executed, enable the computing system 160 to perform functions of embodiments of the invention. Note that the code may directly cause the processor to perform specified operations, be compiled to do so, and/or be combined with other software, hardware, and/or firmware elements (e.g., libraries for performing standard functions) to do so.
In an embodiment where the elements are implemented using software, the software may be stored in a computer-readable medium and loaded into computing system 160 using, for example, removable storage drive 174, drive 172 or communications interface 184. The control logic (in this example, software instructions or computer program code), when executed by the processor 164, causes the processor 164 to perform the functions of embodiments of the invention as described herein.
This disclosure is illustrative and not limiting. Further modifications will be apparent to these skilled in the art in light of this disclosure and are intended to fall within the scope of the appended claims.
This application claims priority to U.S. Provisional Application No. 61/530,355, filed Sep. 1, 2011, incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61530355 | Sep 2011 | US |