Various exemplary embodiments disclosed herein relate generally to cryptographic arithmetic methods and systems resistant to fault attacks.
Cryptographic systems perform various arithmetic calculations. While many asymmetric cryptosystems are known to be provably secure with respect to mathematical assumptions, such cryptosystems may be vulnerable to implementation attacks such as fault attacks.
Provided are embodiments that enable a cryptographic arithmetic method and system that is resistant to fault attacks.
A brief summary of various exemplary embodiments is presented. Some simplifications and omissions may be made in the following summary, which is intended to highlight and introduce some aspects of the various exemplary embodiments, but not to limit the scope of the invention. Detailed descriptions of a preferred exemplary embodiment adequate to allow those of ordinary skill in the art to make and use the inventive concepts will follow in the later sections.
Various embodiments may also relate to a method for integrity protected calculation of a cryptographic function including: performing an operation c=a∘b in a cryptographic function f(x1, x2, . . . , xn) defined over a commutative ring R; choosing a′ and b′ corresponding to a and b such that a′ and b′ are elements of a commutative ring R′; computing c′=a′∘′b′; computing a″=CRT(a, a′) and b″=CRT(b, b′), where CRT is the Chinese Remainder Theorem; computing c″=a″∘″b″; mapping c″ into R′; and determining if the mapping of c″ into R′ equals c′.
Various embodiments may also relate to a machine-readable storage medium encoded with instructions for performing an integrity protected calculation of a cryptographic function comprising: instructions for performing an operation c=a∘b in a cryptographic function f(x1, x2, . . . , xn) defined over a commutative ring R; instructions for choosing a′ and b′ corresponding to a and b such that a′ and b′ are elements of a commutative ring R′; instructions for computing c′=a′∘′b′; instructions for computing a″=CRT(a, a′) and b″=CRT(b, b′), where CRT is the Chinese Remainder Theorem; instructions for performing c″=a″∘″b″; mapping c″ into R′; and instructions for determining if the mapping of c″ into R′ equals c′.
In order to better understand various exemplary embodiments, reference is made to the accompanying drawings wherein:
Referring now to the drawings, in which like numerals refer to like components or steps, there are disclosed broad aspects of various exemplary embodiments.
Many asymmetric cryptosystems are known to be provably secure with respect to certain mathematical assumptions. Unfortunately, many straightforward implementations of such cryptosystems often become vulnerable against implementation attacks, especially on power-limited devices. One type of attack that may be of interest is fault attacks.
A fault attack may inject faults into the cryptographic processing. The fault attack may inject faults into memories (e.g., RAM, registers, EEPROM) used by an algorithm or may even inject faults into entire operations (e.g., the Arithmetical Logical Unit) with the goal to extract secret information given a collection of correct and incorrect results. Fault attacks may also be combined with other types of attack such as a side-channel attack.
Therefore, there is a need for efficient countermeasures against fault attacks, especially in asymmetric cryptography using Finite Field Arithmetic (or more relaxed Ring Arithmetic) as a building block. Two widely used families of cryptosystems using such a building block include: 1) cryptosystems based on the Factoring Problem, i.e., doing operations over the Ring Zn, where n is the product of two or more primes, such as the RSA-cryptosystem or the Paillier-cryptosystem; and 2) cryptosystems based on the Discrete Logarithm (DL) Problem, where operations are done over a Group Gn of order n, such as the ElGamal-cryptosystem, the Cramer-Shoup cryptosystem, the Digital Signature Algorithm, or the Schnorr signature scheme. Gn can be, for instance, a multiplicative group Zp* of the field Zp, where p is a prime and hence n=p−1, or an Elliptic Curve Group of order n over a prime field or a binary extension field.
More advanced cryptographic techniques include schemes such as DL-based zero-knowledge proofs of knowledge (e.g., used for credential systems), DL-based verifiable secret sharing (e.g., used for access control to bank vaults or launching of nuclear missiles), DL-based secure multi-party computation (e.g., used to secure evaluate functions shared over several instances) or threshold cryptosystems (e.g., used to shared decryption or signature generation over a set of instances).
By Gn we denote a group of order n. By ord(R) we denote the order of the multiplicative group of the ring R.
In the following, the Chinese Remainder Theorem (CRT) is discussed in a simplified form, i.e., for moduli being the product of two distinct primes (or two distinct irreducible polynomials if CRT is considered for polynomial rings).
Let n=pq, where p and q are primes. Then for given xqεZq and xpεZp there exists a unique solution xnεZn such that xn≡xp (MOD p) and xn≡xq (MOD q). Given
one can compute
x
n≡(xp
having
Analogously, the CRT works for R=Z2[X]/(N(X)), where N(X) is the reduction polynomial being the product of two irreducible polynomials Q(X) and P(X). Henceforth, let CRTp,q(a,b)=(a
Given two primes p and q it is possible to efficiently compute the product n=pq. This may be done in polynomial time. However, given any n the best known algorithms to find the according factorization run in sub-exponential time. This is known as the Factoring Problem. Note that the practical difficulty of solving this problem strongly depends on the choice of the lengths and types of the prime factors.
Let, without loss of generality, Gn be a multiplicative group of order n. Given gεGn and an integer kεZn, computing y=gk can be mounted in polynomial time (e.g., with square and multiply). However, given y, g, Gn and n, the best known generic method of finding k has exponential running-time. This is known as the Discrete Logarithm Problem. Notice that the practical difficulty to solve this problem strongly depends on choice of Gn, g and n.
Let Gn be an Elliptic Curve group and PεGn a point of order n. The core operation of an Elliptic Curve cryptosystems is the scalar multiplication, i.e., given a secret integer kεZn computing Q=kP. Given P and Q finding kεZn is the Elliptic Curve Discrete Logarithm Problem.
Efficient Elliptic Curve Arithmetic using binary extension fields has been proposed and is very attractive for implementation on security ICs. However, this solution suffers from fault attacks in the following manner: one could modify the input point P of the scalar multiplication to a point P′ which lies on the twist of the original curve, i.e., that is a curve with a different domain parameter a2 (notice that a2 is not used by the group operations). If the group defined over the twist includes several subgroups then the intractability of finding the scalar (i.e., solving the Discrete Logarithm Problem) lies in the complexity class of solving the Discrete Logarithm Problem in the largest subgroup. It has been shown that in that case only 1 of the NIST-recommended curves (namely K-283) has a twist curve with an order of same strength. Others, such as B-163, normally having a bit-security of approximately 80 bits, have only a bit-security of approximately 50 bits on their twist.
For example, it has also been shown that the Schnorr identification scheme can be broken with a few erroneous executions.
In the CRT-based RSA implementation, a message Mε{0,1}* may be signed in three steps:
1. s1=H(M)d MOD p
2. s2=H(M)d MOD q
3. s=CRTp,q(s1,s2)
Thereby, H:{0,1}*→Zn is a cryptographic hash-function. The factorization of n may be found with a single successfully injected fault:
1. s1=H(M)d MOD p
2. s2′=H(M)d MOD q fault injected, such that s2′≠s2 (mod q)
3. s′=CRTp,q(s1,s2′)
Given now the correct signature s and the faulty one s′, computing gcd(s−s′,n) gives the prime factor p, thus the scheme may be broken, because the private key d may be computed efficiently. Other successful fault attacks for non-CRT based RSA have also been shown.
Therefore there remains a need to detect faults on a very low level, i.e., in the arithmetic layer where multiplications and additions may be performed over multi-precision integers or polynomials. The advantages of such a mechanism in such a low-level building block are that: 1) it may be applied to any cryptosystem which is based on that building block; 2) the computational overhead may be quite low; and 3) a fault may usually be detected earlier compared to mechanisms located on higher abstraction levels.
Let R be a commutative ring with 1 being the building block of a cryptosystem. For RSA that would be R=Zn. For ElGamal over ZP* it would be R=Zp where Zp* is the multiplicative group of R. In case of Elliptic Curve cryptography R=GF(q), where q=pn, p prime. The goal is to detect faults in ring elements and operations over ring elements. Hence, the operations + and · need to be protected accordingly.
The idea is to define an integrity protected algebraic structure where each operation contains an integrity check. Therefore, let R′ be a commutative ring with 1 (called the verification domain), such that one can define R′ over R and R′ with the help of the Chinese Remainder Theorem. Let ∘′ε{+′,·′} be the operations in R′ and ∘″ε{+″,·″} in R″ analogous to + and · in R. For example, R=Zp, R′=Zq and R″=Zpq. We refer to R″ as the integrity-protected domain. The following functions are defined:
ToIPDomain: R×R′→R″ (5)
ToIPDomain may be used to transfer an element of R to the integrity-protected domain R″ with the help of the Chinese Remainder Theorem.
ToVDomain: R″→R′ (6)
ToVDomain may be used to map an element of R″ back to R′. Notice an element of R′ may be interpreted as some kind of checksum or cyclic redundancy check (CRC) over the element of R″.
FromIPDomain: R″→R (7)
FromIPDomain is used to covert elements of R″ back to the original domain R.
ToVDomain(ToIPDomain(a,a′)∘″ToIPDomain(b,b′))=a′∘′b′ (8)
FromIPDomain(ToIPDomain(a,a′)∘″ToIPDomain(b,b′))=a∘b (9)
Now let ƒ(x1, . . . , xn)=r be a formula defined over R that needs to be evaluated with the capability to detect faults on any operation or element involved. The following steps may be used to calculate ƒ(x1, . . . , xn)=r and to detect faults:
If R is a Finite Field, one may want to also protect the multiplicative inversion as well. This may be possible straightforwardly with the method A if, for instance, Fermat's Little Theorem is used for inversion, i.e., inverting an element aεR is thereby done by setting ƒ(x)=xord(R)-1.
A second embodiment for calculating the cryptographic function ƒ may set xi′=α′ for all xi, where α′ is some constant in R′. This may result in a more efficient method for calculating the cryptographic function ƒ.
In the second embodiment, the following steps may be used to calculate ƒ(x1, . . . , xn)=r and to detect faults:
For the case that R=Z2[X]/(P(X)), where P(X) is an n-degree polynomial, a very efficient implementation may be implemented if the hardware provides a CRC co-processor. Let R′=Z2[X]/(Q(X)), such that Q(X) may be the generator polynomial implemented in hardware for the CRC co-processor. Thereby, let in be the bit-length of the polynomial Q(X). Then, we have R″=Z2[X]/(P(X)Q(X)) with the CRT. Denote the CRC-function executed by the HW as CRC:{0,1}*→Z2[X]/(Q(X)). Then,
ToIPDomain(A(X),A′(X))=(A(X)P(X)
ToVDomain(A(X)″)=CRC(A(X)″)=A(X)″ (11)
FromIPDomain(A(X)″)=A(X)″ MOD P(X)=A(X) (12)
Note that ToVDomain in this setting is very efficient and negligible in terms of computational costs compared to ToIPDomain and FromIPDomain.
Adding integrity protection may increase the amount of required resources in terms of time and space. However, when used for binary polynomial arithmetic, this overhead may be very small.
Let s be the number of words to store one element of R. Let t be the number of words to store one element of R′.
While the number of word-level multiplications grows with quadratic complexity for R, the overhead for integrity checking and hence the additional computations in R″ only grows with linear complexity. Additionally, one execution of the CRC co-processor per operation may be necessary. However, this may be negligible compared to the costs of the multiplication.
On a power-limited device with word-size of 16 bits, it might be sufficient to spend 1 word for integrity-checking. For Elliptic Curves over binary fields typical bit-lengths used nowadays are 163 up to 233 bits, having s=11 and s=15. Doing multiplications in the integrity protected domain R′ in this case would require around 15% more computational costs for s=11 and around 11% for s=15. Of course at the same time the detectability decreases as well. The corresponding trade-off depends on the application. Note that the aforementioned special case scenario gives no measurable advantage in terms of computational costs if a fast CRC co-processor is involved, but may save storage space because the integrity-values are not stored because there is only one constant value.
The following examples illustrate how the invention may be applied to a cryptosystem. Without loss of generality the general case is presented.
The scalar multiplication Q=kP is done as follows:
The CRT-based approach discussed above may be adapted as follows:
let ToIPDomaint(x,y)=CRTp,t(x,y) and
let ToIPDomains(x,y)=CRTq,s(x,y) for two coprime values s and t.
Accordingly, FromIPDomaint(x)=x MOD p and FromIPDomains(x)=x MOD q. A message Mε{0,1}* may then be signed as follows:
A basic countermeasure may be to verify in- and output points of computations. Such verifications are referred to Point Verification (PV) and may ensure that only points are returned that lie on the curve. Unfortunately, there are other fault attack models which are not covered by PV. For instance, one could successfully by-pass the PV because this is only one check in long chain of computations.
An additional or alternative solution is to include Concurrent Error Detection which checks whether an invariant is still valid after a single operation. If the scalar multiplication used is the Montgomery Powering ladder, one invariant is the fact that an intermediate point of the current step of the scalar multiplication loop differs from the previous one in the base-point's x-coordinate. The invariant may be checked at any point in time, but is usually quite costly, where the costs for a point operation may be roughly increased by 100% for the number of necessary Finite Field multiplications. In the efficient approach one could also perform a PV from time to time during the scalar multiplication. This may also be costly because it requires several further group operations for the check, i.e., it makes no sense to introduce such a check for each intermediate step but only at randomly chosen steps. Another approach may be to involve some kind of verification path in parallel to the main computation path which may be checked during and/or at the end of the computation or redundant coding with intermediate consistency checks, e.g., having integrity protection on field arithmetic level).
Research into a countermeasure against faults attacks on CRT-based RSA has resulted in a comparable running time to the described embodiments. However, the disadvantage of his approach may be that the integrity check is only done once at the end. If this check is skipped by an attack the scheme is broken. In contrast, the embodiments presented above have the advantage that if an intermediate check is skipped the fault may be detected in checks of subsequent operations.
The various exemplary embodiments may be employed in various devices and systems, for example, smart cards, security tokens, communication devices, wireless devices, computer nodes, network nodes, RFID chips, NFC chips, identity verification terminals, point of sale terminals, automatic teller machines, storage devices, etc. Further, the various exemplary embodiments may be implemented in computer software for secure communication across both public and private networks. Such computer software may include browsers or any other software that allows for communication across networks. Such embodiments may be used to calculate various cryptographic functions, for example, public key cryptography encryption, decryption, and key generation, hash functions, etc.
It should be apparent from the foregoing description that various exemplary embodiments may be implemented in hardware and/or firmware. Furthermore, various exemplary embodiments may be implemented as instructions stored on a non-transitory storage medium, which may be read and executed by at least one processor to perform the operations described in detail herein. A tangible machine-readable storage medium may include any mechanism for storing information in a form readable by a machine, such as a personal or laptop computer, tablet, a server, or other computing device. Further, special processors may be used to carry out the computations of the above described methods and may be included in these computing devices. Such processors may include signal processors, math coprocessors, specialized cryptographic processors, or other processors capable of implementing the cryptographic method described above. Such special cryptographic processors and other processors may be optimized to quickly and efficiently perform various calculations specific to the cryptographic methods by including specific functionality matched to the cryptographic functions to be calculated. Such special cryptographic processors may include a general processor that includes specific cryptographic processing capabilities that may be implemented in hardware, software, or a combination thereof. Thus, a tangible machine-readable storage medium may include read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, and similar storage media.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in machine readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
Although the various exemplary embodiments have been described in detail with particular reference to certain exemplary aspects thereof, it should be understood that the invention is capable of other embodiments and its details are capable of modifications in various obvious respects. As is readily apparent to those skilled in the art, variations and modifications can be affected while remaining within the spirit and scope of the invention. Accordingly, the foregoing disclosure, description, and figures are for illustrative purposes only and do not in any way limit the invention, which is defined only by the claims.