The present disclosure relates to white-box security implementations and related white-box ciphers that can be applied to SM4 encoding schemes.
The term “cryptography” refers to techniques for secure communication in the presence of third parties, sometimes referred to as “adversaries’ or “attackers.” Various protocols that prevent third parties or the public from reading private messages can be constructed. Applications of cryptography include electronic commerce, chip-based payment cards, digital currencies, computer passwords, and content streaming. Modern cryptographic algorithms are designed around computational hardness assumptions, making such algorithms hard to break in practice by any adversary. It is theoretically possible to break such a system, but it preferably is not pragmatic to do so in order to discourage potential attackers.
The “black-box” attack model of cryptography is based on the premise that the internal operation of a cipher and the key it uses are not accessible to an adversary, who only has access to its inputs and outputs. This model underlies the development and design of most modern cryptographic algorithms, including Data Encryption Standard (DES), Advanced Encryption Standard (AES) and many others. However, this premise of black box attack models, i.e. that the internal operations of devices are not accessible to an attacker, does not apply to many real-world security problems.
The “white-box” attack model assumes a greater level of visibility and control by the adversary. The white-box attack model can be applied in many more modern implementations, where the attacker may have full visibility and control of the operation, such as in a mobile phone or personal computer that may have debugging tools or installed malware. This presents a challenge to develop countermeasures to an adversary extracting information such as cryptographic keys or influencing the operation to produce undesired results in the white-box model. As a result, the study of countermeasures to subversion in the white-box model has become increasingly important.
Chow, S., Eisen, P., Johnson, H. and Van Oorschot, P. C., White-box cryptography and an AES implementation, International Workshop on Selected Areas in Cryptography, pp. 250-270, (Springer, Berlin, Heidelberg, August 2002) described countermeasures to white-box attacks and thus sensitized the academic community at large to the idea that countermeasures in the white-box scenario might be feasible. The general idea of a white-box AES implementation is to hide the secret key in the S-Boxes of AES, break AES into several steps and insert secret random bijections to obfuscate every step. To keep the implementation functionally equivalent to AES, the inserted parts will be canceled out in the end. However, AES white-box implementations have been successfully attacked. In 2004, Billet, Gilbert and Ech-Chatbi presented an efficient attack (referred to as the BGE attack) on and AES white-box implementation. The BGE attack extracted the embedded AES key with a work factor of 2330 the attacks, and thus is a pragmatic attack.
Many dedicated white-box implementations are known. Some implement a standard cipher in a white-box attack context and some focus on the designs of various non-standard ciphers which are expected to be secure under white-box attack. Such ciphers are referred to as “white-box ciphers.” However, research on cryptanalysis against white-box implementations has made significant progress. Some attacks are unique to the white-box model, because they require detailed structure for analysis. This includes algebraic attacks such as the BGE attack. Additionally, in some scenarios, a “lifting attack” is possible because the key does need to be determined since the lifted algorithm may be used as an oracle and sources of randomness can be overridden.
“Gray-box” attacks such as DPA (Differential Power Analysis) have been repurposed under the name DCA (Differential Computation Analysis) and could be reassessed for effectiveness in white-box scenarios. For example, the advantage of a simple sharing scheme that increases the number of traces that must be collected for analysis as the power of the number of shares is lost when there is no noise, as is the case in the white-box scenario. Some benefits of these countermeasures are retained. Any well-designed sharing scheme, including a threshold scheme applied for combating leakage due to hardware glitches, provides some advantage since identifying information leakage is made more complex through the hiding of direct correlations.
Due to its prevalence, AES has been the focus of analysis and countermeasures, and several white-box implementations of AES have been published and analyzed in the literature. SM4 is a newer standardized cipher and has not been analyzed much in the white-box context. SM4 is a block cipher used in the Chinese National Standard for Wireless LAN WAPI. The SM4 algorithm was invented by Lu Shuwang and it became a national standard in China (GB/T 32907-2016) in August 2016. The SM4 cipher has a block size of 128 bits. It uses an 8-bit S-box and the key size is 128 bits. The only operations used are 32-bit bitwise XOR, 32-bit circular shifts and S-box applications. Encryption or decryption of one block of data is composed of 32 rounds. Each round updates a quarter (i.e., 32 bits) of the internal state. A non-linear key schedule is used to produce the round keys. Decryption uses the same round keys as for encryption, except that they are in reversed order. The round structure of SM4 has several similarities with AES, including an 8-bit S-box determined by inversion in a finite field followed by linear diffusion between the output of four S-boxes.
The first white-box SM4 implementation was proposed by Xiao, Y.; Lai, X. White-Box Cryptography and a White-Box Implementation of the SMS4 Algorithm; Shanghai Jiaotong University: Shanghai, China, pp. 24-34, 2009. The cipher of Xiao et al. was proved to be unsecure from an attack similar to a BGE attack. Another white-box SM4 implementation was proposed in Shi, Y., Wei, W. and He, Z., Lightweight White-box Symmetric Encryption Algorithm Against Node Capture for WSNs, Sensors, 15(5), pp.11928-11952, 2015. This implementation uses the concept of dual ciphers and a randomly selected nonsingular matrix to construct a functionally equivalent white-box encryption algorithm of SM4. Lin, Tingting, et al. Security Evaluation and Improvement of a White-Box SMS4 Implementation Based on Affine Equivalence Algorithm, The Computer Journal 61.12: 1783-1790, 2018, presented an analysis of this implementation and described how an affine equivalence algorithm could be used to extract the key.
The implementations described herein are white-box implementations that can be applied to SM4. Techniques are applied in a novel manner to create a practical implementation of fixed-key SM4 resistant to white-box attacks. The techniques include the use of a composite fields made possible by the S-box structure, re-expressing the entire cipher in terms of 4-bit intermediate variables to reduce total table size, exclusive use of lookup-tables, and the application of an (n,n) threshold scheme where shares are generated using other parts of the processing state in the generation of the shares. The white-box SM4 implementations described herein are resistant against known white-box attacks, such as affine equivalence attacks, BGE-like attacks and DCA-like attacks.
One implementation includes a method, apparatus or computer-readable media for implementing a white-box block cipher in a software application to create a secure software application having the same functionality as the software application, the method comprising creating an implementation of a block cipher by:
applying the block cipher to at least a portion of the software application to create the secure software application and thereby increase security of a computing platform executing the secure software application.
Implementations of the invention will be described below in connection with the attached drawing in which:
Before describing the novel aspects of the disclosed implementations, SM4 will be described in greater detail. SM4 was selected to be used in the Wired Authentication and Privacy Infrastructure (WAPI) standard, is officially mandated in China and plays an important part in providing data confidentiality for WLAN products. SM4 has a 128-bit key size, a 128-bit block size and a 32-round unbalanced Feistel network structure. Let {X0, X1, X3}∈GF )232)4be the plaintext, Ki∈GF (232)4,(i=0=0, 1, 2, . . . , 31) be the round keys. The SM4 encryption operation is then shown at 100 in
The resulting output the cipher text is:
(Y0,Y1,Y2,Y3)=(X35,X34,X33,X32)
Where:
L(B)=B⊕(B«<2⊕(B«<10)⊕(B«<18)⊕(B«<24) FOR B∈GF232).
S(A)=(Sbox(α0),Sbox(α1),Sbox(α2),Sbox(α3)) for A=(α0,α1,α2,α3), S(A)∈GF(28)4.
Composite fields are often used in implementations of Galois Field arithmetic. A field GF(2k) is a composite field when k is not a prime and can be written as k=mn. The fields GF(2k) and GF((2m)n) are isomorphic to each other. With an isomorphism, the elements and the operations can be mapped from one to the other.
For an implementation, the composite field GF(28) is used. The elements are mapped in GF(28) to GF((24)2), where the arithmetic in GF((24)2) is constructed by following field polynomials P1(x) and P2(x), both being irreducible:
P
1(x)=x2+tx+n, over GF(24)
P
2(x)=x4+ux3+vx2+wx+N, over GF(2).
A general element g of GF((24)2) can be represented as a linear polynomial (in Y) over, GF(24), as g=y1Y⊕y0, with multiplication modulo polynomial P1(x). All the coefficients are in the 4-bit subfield GF(24). So the pair (y1,y0)represents g in terms of a polynomial basis [Y,1] , where Y is one root of P1(x).
The isomorphism mappings and the operation representations depend on the field polynomials and different bases. We are free to choose either type of basis at each level. The isomorphism mappings between GF(28) and GF((24)2) can be found in a known manner, such as is taught by Rudra, A., Dubey, P. K., Jutla, C. S., Kumar, V., Rao, J. R. and Rohatgi, P. Efficient Rijndael Encryption Implementation with Composite Field Arithmetic, International Work-shop on Cryptographic Hardware and Embedded Systems pp.171-184. Springer, Berlin, Heidelberg, May, 2001; Paar, C. Efficient VLSI Architectures for Bit-parallel Computation in Galois fields. PhD Thesis, Inst. for Experi-mental Math., Univ. of Essen, 1994; and Wong, M. M., Wong, M. L. D., Hijzin, I. and Nandi, A. K. Composite field GF (((2 2) 2) 2) AES S-Box with Direct Computation in GF (2 4) Inversion, Information Technology in Asia (CITA 11), 2011 7th International Conference on, pp. 1-6. IEEE. The details of a known method for representing the operations, such as multiplication and multiplicative inverse, can be found in Canright, D., 2005, August, A Very Compact S-box for AES, International Workshop on Cryptographic Hardware and Embedded Systems pp. 441-455, Springer, Berlin, Heidelberg, July, 2011.
Threshold Implementations are well known as a side channel attack countermeasure as taught by Nikova, S., Rechberger, C. and Rijmen, V. December, Threshold implementations Against Side-channel Attacks and Glitches, International conference on information and communications security pp. 529-545, Springer, Ber-lin, Heidelberg. Dec, 2006. Such an attack is based on secret sharing and multiparty computation. In a disclosed implementation an (n,n) threshold system, which requires a set of n functions ∫i to compute the outputs of a function ƒ, is used. The set of n outputs of the functions ƒ, are called the output shares.
Let =ƒ(A,B, . . . ) denote a function. A variable A is split into n shares Ai when A=ΣiAi. A secure threshold implementation can satisfy three properties:
z
1=ƒ1(A2,A3, . . . , An,B2,B3, . . . , Bn, . . . )
z
2=ƒ2(A1,A3, . . . , An,B1,B3, . . . , Bn, . . . )
z
n=ƒn(A1,A2, . . . , An−1,. . . , B2, . . . , Bn−1, . . . )
z=Σizi=Σiƒi.
The number of the input shares and output shares affects the properties of a threshold implementation. For example, the uniformity is not guaranteed if a threshold implementation is applied with three shares for each input x, Y and z, and three output shares to the function F (X,Y,Z)=X+YZ . The details of how to create a threshold implementation for such functions are well-known and not discussed further herein. For example, more details and constructions can be found in Bilgin B, Nikova S, Nikov V, Rijmen V, Stütz G, Threshold implementations of all 3×3 and 4×4 S-Boxes, International Workshop on Cryptographic Hardware and Embedded Systems, 9, pp. 76-91), Springer, Berlin, Heidelberg, September 2012; ]Nikova, S., Rechberger, C. and Rijmen, V. December, Threshold Implementations Against Side-channel Attacks and Glitches, International conference on information and communications security pp. 529-545, Springer, Berlin, Heidelberg. Dec, 2006; Nikova, S., Rijmen, V. and Schläffer, M. Secure Hardware Implementation of Non-linear Functions in the Presence of Glitches, International Conference on Information Security and Cryptology p.p 218-234), Springer, Berlin, Heidelberg, December 2008. The disclosed implementations use TI to indicate a threshold implementation.
Before describing the disclosed implementations in detail, several techniques are discussed below. Chow, S., Eisen, P., Johnson, H. and Van Oorschot, P.C., White-box cryptography and an AES implementation, International Workshop on Selected Areas in Cryptography, pp. 250-270. Springer, Berlin, Heidelberg. August 2002, teaches that the algebraic structure of an S -box can be represented as:
and the underlying irreducible polynomial for GF(28) over GF(2) is:
ƒ(x)=x8+x7+x6+x5+x4+x2+1.
However, applicant has discovered that this does not match with the S-box table of SM4. Therefore, the matrix A2 and the irreducible polynomial have been modified in a novel manner. Verification shows that following parameters and irreducible polynomial will correctly generate the lookup table for the SM4 S -box:
and the underlying irreducible polynomial for GF(28) over GF(2) is:
ƒ(x)=x8+x6+x4+x3+x2+x+1.
It may be noted that the polynomial is simply the reciprocal polynomial (reversed coefficients), and C1, A2 and C2 are the same, but the matrix A2 is substantially different.
In each round, four 8-bit S-Boxes are applied in parallel and the linear transformation L can be expressed as a block matrix composed of 8×8 matrices. This allows the entire SM4 operation to be expressed in terms of byte-wide operations. The 32×32 linear transformation L is expressed as a 4×4 block matrix of 8×8 matrices:
Thirty-two-bit inputs/outputs and round key in each round are presented as the concatenation of four bytes each:
X
t
=x
t0
∥x
t1
∥x
t2
∥x
t3(t={0, 1, . . . , 35})
K
r
=k
r0
∥k
r1
∥k
r2
∥k
r3)r={0, 1, . . . , 31})
The round key can be embedded in the S-boxes: sk
x
r+4)j
=x
rj
⊕l
j0
,S
k
(x(r+1)0⊕x(r+2)0⊕x(r−3)0)
⊕lj1,Sk
⊕lj2,Sk
⊕lj3,Sk
Where r=0, 1, 2, . . . , 31; j=0, 1, 2, 3.
For example, if the key byte is fixed in the S-box, the first byte of X4 can be calculated with process shown at 200 in
Multiplications in GF (24) can be defined with an irreducible polynomial of degree 4 over GF(2). Only three irreducible polynomials with degree 4 exist, of which any one can be used for the definition:
ƒ1(x)=x4+x3+x2+x+1,
ƒ2(x)=x4+x3+x+1,
ƒ3(x)=x4+x+1.
The field multiplication “*” is defined as:
The number of input shares and output shares for a threshold implementation may differ. When a threshold implementation of m input shares and n output shares is applied on a function F(X,Y,Z, . . . ) (denoted as TI[F(X,Y,Z,...)]), the following steps generate a lookup table:
As described above, it is assumed that such threshold implementations exist, and TI is used to indicate that a threshold implementation is applied.
Steps 1 and 2 can be repeated for all the possible inputs X,Y,Z, . . . , to obtain corresponding output shares. The lookup table, shown in
To provide resistance to existing white-box attacks, such as affine equivalence attacks, BGE attacks, and DCA-like attacks, the disclosed implementations can be created based on the following rules:
In the disclosed implementation, each intermediate variable is split into four-bit nibbles. With the operations in SM4, such as XOR, shift, and S-box, being divided into several steps, the 4-bit data as well as masks will be processed, and corresponding lookup tables will be generated. The threshold implementations are used to extend the output space of some lookup tables and produce masks at the same time. An isomorphic map is applied on the 4-bit data in S-box to obscure the inner operation. The whole encryption consists of only lookup tables, other calculations are not required.
In the first round, an example of implementing the first branch of
To compute x100⊕x200⊕x300 and x101⊕x201⊕xx301, the additions are accomplished with several steps. In each step, some masks are canceled out and new masks are added. First, the function F(X,Y,Z)=X⊕X*Y⊕Y⊕Z is used to generate a lookup table. Using the lookup table, we get αr(i) and α′r(i) , where r and i are the same as before. This data flow is depicted at 700 in
A threshold implementation of 2 shares of each input and 2 output shares (TI2) is then selected and applied to the function F(X,Y,Z)=X⊕Y⊕Z to generate a lookup table for TI2(X⊕Y⊕Z). Using the lookup table, we get br,j(i) from (α0(0),x300,ƒ0.1(0)), and b′r,j(i) from (α′0(0),x301,g0.1(0)), where r and i are the same as before, j=0,1. This data flow is shown at 800 in
As noted above, the algebraic structure of a fixed-key S-box can be represented as:
S
k
(x)=S(x⊕krj)=A2(A1x⊕A1krj⊕C1)−1⊕C2.
We compute “y=A1x” first.
where Aij is a 4×4 block matrix of A1, xi is a 4×1 vector.
The multiplication “A1x ” can be represented as:
Six lookup tables for can be generated for functions:
F
0(X,Y,Z)=A00X⊕A01Y⊕A00Z,
F
1(X,Y,Z)=X⊕A01Y⊕A00Z,
F
2(X,Y,Z)=X⊕A01Y⊕A00Z2,
G
0(X,Y,Z)=A11X⊕A10Y⊕A10Z,
G
1(X,Y,Z)=X⊕A11Y⊕A10Z,
G
2(X,Y,Z)=X⊕A11Y⊕A10Z2
The data flow 1000 of
d
0.3
(0)
=A
00(x100⊕x200⊕x300)⊕A01⊕(x101⊕x201⊕x301)⊕A01(b0,1(0))2 and
d′
0,3
(0)
=A
10(x100⊕x200⊕x300)⊕A11⊕(x101⊕x201⊕x301)⊕A11(b0,1(0))2,
The remaining portions of the standard S-box can be computed as:
T
k
(y)=A2.(y⊕A1.krj⊕C1)−1⊕C2.
Now a T-box can be created using the data flow 1100 shown in
(γ1Y+γ0)−1=[Fγ1]Y+[F(γ1τ+γ0)],
where Y is one root of P(x). We apply threshold implementation (TI4) of 4 shares of each input and 3 output shares to function Fγ1+d′3 and F(γ1τ+γ0)+d3, their outputs shares are (G0, G1, G2) and (N0,N1,N2), respectively. In Step 6 of
where T′ij is the 4×4 block matrix of T′, (C2-00,C2-01,C2-02) are three shares of the first 4-bit nibble of C2, (C2-10, C2-11,C2-12) are three shares of the last 4-bit nibble of C2. Compute tt(0)=T′00Gi⊕T′01N0⊕C2-0i and si(0)=T′10Gi⊕T′11N0⊕C2-1i, where i=0,1,2. The matrix T−1 is used to transform two 4-bit values of the composite field back to a standard 8-bit value; A2 and C2 are the linear and constant parts of the affine transformation of SM4, respectively. The result is that we have obtained six masked values (t0,0(0),t0,1(0),t0,2(0),s0,0(0),s0,1(0),s0,2(0)) from the first branch. With the same method used in above subsections, we get (t0,0(i),t0,1(i),t0,2(i),s0,0(i),s0,1(i),s0,2(i)) (i=1,2,3) for remaining three branches. To compute x40, we multiply l0i with outputs of each branch and add them together:
By adding part of the upper half part of each matrix, we get a masked value of x400. We can construct 16 lookup tables to implement this step. This data flow is depicted at 1200 in
It can be verified that
Let
where Pih-ju is a 4×4 block matrix, i=1,2,3; j=0,1; u=0,1.
Other three masked bytes of X4 are generated in a similar way. At this time, 16 lookup tables are constructed to compute
One can verif that
For a second round the first branch of this round is also used, as shown at 1300 in
50
0
=x
50
0
⊕P
03-01
s
1,2
(3)
50
1
=x
50
1
⊕P
03-11
s
1,2
(3)
,
51
0
⊕P
13-01
s
1,2
(3),
51
1
=x
51
1
⊕P
13-12
s
1,2
(3)
,
52
0
=x
52
0
⊕P
23-01
s
1,2
(3)
,
52
1
=x
52
1
⊕P
23- 11
s
1,2
(3)
,
53
0
=x
53
0
⊕P
33-01
s
1,2
(3),
53
1
=x
53
1
⊕P
33-13
s
1,2
(3).
The result of this round is:
60
0
=x
60
0
⊕P
03-01
s
2,2
(3)
60
1
=x
60
1
⊕P
03-11
s
2,2
(3)
,
61
0
⊕P
03-01
s
2,2
(3),
61
1
=x
61
1
⊕P
13-11
s
2,2
(3)
62
0
=x
62
0
⊕P
23-01
s
2,2
(3),
62
1
=x
62
1
⊕P
23-11
s
2,2
(3),
63
0
=x
63
0
⊕P
33-01
s
2,2
(3),
63
1
=x
63
1
⊕P
33-11
s
2,2
(3).
The processing of the first branch of the fifth round is shown at 1700 of
70
0
=x
70
0
⊕P
03-01
s
3,2
(3),
70
1
=x
70
1
⊕P
03-11
s
3,2
(3),
71
0
=x
71
0
⊕P
13-01
s
3,2
(3),
71
1
=x
71
1
⊕P
13-11
s
3,2
(3),
72
0
=x
72
0
⊕P
23-01
s
3,2
(3),
72
1
=x
72
1
⊕P
23-11
s
3,2
(3),
73
0
=x
73
0
⊕P
33-01
s
3,2
(3),
73
1
=x
73
1
⊕P
33-11
s
3,2
(3).
For remaining rounds, the data flow is the same as in the fourth round except in Compute X40 and Remaining three bytes of X4. Since, in the fifth round, all inputs are masked data. The two-input lookup tables in Part V: Compute X40 can be changed to three-input lookup tables to cancel an additional mask, and other lookup tables remain the same. The first nibble is used as an example and
After the last round, masked cipher texts (
Now eight lookup tables can be constructed for eight functions as shown at 2000 of
F
0(X,Y)=X⊕P03-01Y, F1(X,Y)=X⊕P03-11Y
F
2(X,Y)=X⊕P13-01Y, F3(X,Y)=X⊕P13-11Y F4(X,Y)=X⊕P23-01Y,
F
5(X,Y)=X⊕P23-11Y F6(X,Y)=X⊕P33-01Y F7(X,Y)=X⊕P31-11Y
The result is the following cipher text:
X
32
=x
(32)0
0
∥x
(32)0
1
∥x
(32)1
0
∥x
(32)1
1
∥x
(32)2
0
∥x
(32)2
1
∥x
(32)3
0
∥x
(32)3
1
X
33
=x
(33)0
0
∥x
(33)0
1
∥x
(33)1
0
∥x
(33)1
1
∥x
(33)2
0
∥x
(33)2
1
∥x
(33)3
0
∥x
(33)3
1
X
34
=x
(34)0
0
∥x
(34)0
1
∥x
(34)1
0
∥x
(34)1
1
∥x
(34)2
0
∥x
(34)2
1
∥x
(34)3
0
∥x
(34)3
1
X
35
=x
(35)0
0
∥x
(35)0
1
∥x
(35)1
0
∥x
(35)1
1
∥x
(35)2
0
∥x
(35)2
1
∥x
(35)3
0
∥x
(35)3
1
One important criterion for any white-box implementation is performance. The performance of the above implementation is evaluated below. As a first step, storage cost of the lookup tables was evaluated through the following algorithm.
For 0≤r≤31:
In Part III Compute A1x, there are six lookup tables for all rounds and all branches. The storage cost is 6×212×4 bits.
In Part IV Create a T-box, since the round key is different for each round, the T-box must be generated in each round and each branch. The storage cost is 4×212×24×32 bits.
In Part V: Compute X40 and Part VI: Remaining three bytes of X4, there are 32 lookup tables for one branch. After four rounds, eight lookup tables must be added.
The storage cost is (15×212×4+28×4)×8+8×212×4 bits=1.5 MB. As noted above, the implementation has 276 lookup tables. The last two parts have the highest storage requirements. In Part IV: Create T-box, a T-box is generated for each key byte to keep the secret key in the only non-linear component of the algorithm. To reduce the storage requirement, the key byte can be moved from the T-box and embedded into a small lookup table. The effect is that only one T-box will exist for all rounds and all branches. Another way to reduce storage requirements is to use matrix operations instead of lookup tables in Part V: Compute X40 and Part VI: Remaining three bytes of X4. Both methods trade security for performance. The above storage cost analysis was conducted on a PC (CPU E3-1240 v5 @ 3.50 GHz, Memory: 16 GB). The experiment showed that the throughput of the implementation is 119 KB/s.
It is axiomatic that a primary criterion for a block cipher is security. The three main cryptographic system attack models of interest include black-box, grey-box, and white-box attacks. Black-box is a traditional attack model in which an adversary only has access to the inputs and outputs of a cryptosystem. As an official encryption standard, SM4 has good performance in resisting classical attacks on block ciphers such as differential and linear attacks.
“Grey-box” is an attack model where the adversary can use leaked information to deploy side-channel cryptanalysis. Different leaked information can lead to different grey-box attacks. DCA (Differential computation analysis) is a powerful side-channel attack against white-box implementations of a cryptosystem. The main reason that DCA is successful is due to the nonnegligible correlation between the expected values (from the standard cipher specification) and the masked intermediate values (from the white-box implementation), which is caused by the linear imbalances in encodings used in white-box implementation. The disclosed implementation is DCA resistant because when the inputs are uniformly distributed, the outputs from the threshold implementation are uniformly distributed, so that the data correlations between SM4 and its white-box implementation are weakened.
In a white-box attack model, a practical symmetric encryption implementation cannot usually find a strict security proof. Instead of reducing the white-box security into solving a computationally infeasible mathematical problem, security of a white-box implementation is assessed by checking whether it is secure against known attacks. The security of the disclosed implementation against two well-known white-box attacks, the BGE attack and the affine equivalence attack, is evaluated below.
The earliest attack against a white-box implementation beyond grey-box attacks is the BGE attack. This attack was originally constructed to recover the round key of the Chow et al.'s white-box AES implementation. The white-box AES implementation includes several phrases (external input and output encodings are exclusive):
T
i,j
r(x)=S(x⊕ki,jr),
T
i,j
10(x)=S(x⊕ki,j10)⊕ki,j-i11.
A BGE attack process is summarized below.
The two cornerstones of the BGE attack are Phase 2 and Phase 3. The key point of Phase 3 is the existence of the affine relationship between yi(x0,x1,x2,x3) and yj(x0,x1, x2,x3). To apply a BGE-like attack on the disclosed implementation, the two phases are checked. Since non-linear encodings are not used in the implementation, Phase 2 can be skipped. For Phase 3, constructing a function set should be considered. The functions are key-dependent, so T-boxes must be included.
Considering the T-box on its own. Neither tenor s, can be represented as a function of inputs (d, b′,d′). The inputs and outputs of T-box 2500 are shown in
Lookup tables that follow the T-box are two sets of 16 tables generated in Part V above. To cancel out the effect of the threshold implementation, we can combine four T-boxes with 8×16tables from four branches as shown at 2600 in
Now a look is taken at Affine equivalence attack resistance. For two S-box s2 (x) and S1(x), the purpose of the affine equivalence algorithm is to test if there exists two invertible n×n matrices sand A1 and A2, and two n-dimensional vectors α1 and α2 such that S2(x)=A2S1[A1(x)⊕α1]⊕α2. Several algorithms are presented to solve the affine equivalence problems in Biryukov, A., De Canniere, C., Braeken, A. et al A Toolbox for Cryptanalysis: Linear and Affine Equivalence Algorithms. Advances in Cryptology-EUROCRYPT 2003, pp. 33-50, Springer, Berlin Heidelberg, May 2003. Usually, non-linear transformations or affine mappings are used to obscure an S-box. Since the Phase 2 in a BGE attack can be used to remove the non-linear part of the transformation and keep the affine part remaining, an obscured S-box (referred to as a T-box herein) can be affinely equivalent to the original S-box. Therefore, an affine equivalence algorithm is an efficient attack against most conventional white-box implementations. However, such affine equivalence attacks do not apply to the disclosed implementation because the sizes of input and output of T-box are 12 bits and 24 bits respectively, which are different than those of standard S-box, and the matrices A1 and A2, would be non-square.
Conventional white-box implementations split SM4 into several steps and use affine transformation to protect each step. However, the disclosed implementation described above adopts a different approach. The S-box is split into two steps to thereby obscure the boundary of the S-box and help protect the key when it is embedded in the S-box. Second, the elements and operations in GF(28) are mapped to a composite field GF(24)2, which increase the difficulty to identify the original operations. Third, threshold implementation techniques are used that, weaken the correlation between the white-box implementation and standard SM4. These techniques work together to protect the SM4 under BGE-like attacks, affine equivalence attacks and DCA attacks. Table 1 below compares the disclosed implementation conventional implementations. While the disclosed implementation has larger storage requirements, it has good performance on resisting popular white-box attacks.
The disclosed implementation is based on a composite field and a threshold implementation. The implementation works on 4-bit nibbles instead of on bytes. The threshold implementation makes the distribution of the masked values uniform and independent of the corresponding unmasked values. The operation in a smaller composite field reduce the size of the lookup tables. The disclosed implementation is resistant against traditional white-box attacks, such as the affine equivalence attacks, the BGE-like attacks and DCA-like attacks.
The above-described implantations may be accomplished by a computer or computing system 2700, as shown in
Processor(s) may include one or more of a digital processors, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Processor(s) may be configured to execute modules by software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on processor(s). As used herein, the term “module” may refer to any component or set of components that perform a specified functionality attributed to the module. This may include one or more physical processors 2714 during execution of processor readable instructions, the processor readable instructions, circuitry, hardware, storage media, or any other components.
Although the present technology has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the technology is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present technology contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation. Further, features may be added or removed from the implementations to correspond to the specific application.