The present invention relates to a method for the multi-exponentiation π=1d gie
In asymmetric encryption methods or public key cryptosystems which are based on the insolvability of the discrete logarithm problem in Abelian groups, the exponentiation gn of a group element g or the multi-exponentiation gln
The possibility of precomputing powers of the group element g presents the problem that in this case the group element g which is used must be known beforehand. This is not the case for example in the case of signature verification in the D[igital]S[ignature]A[lgorithm] or in the E[lliptic]C[urve] D[igital]S[ignature]A[lgorithm] or in the Diffie-Hellman key exchange method. Added to this is the fact that, on smart cards for example, there is not enough storage space to store a sufficiently large number of precomputed elements.
Another possibility lies in recoding the exponent used; this possibility is independent of the choice of group element g and is therefore particularly attractive for accelerating the abovementioned signature and key exchange methods.
The techniques for recoding the exponent used in algorithms for (multi-)exponentiation are based on the fundamental idea that an integer is rewritten in a different form than the usual binary representation, namely with a lower density and with coefficients in a finite set of integers C which contains at least the elements 0 and 1.
If, in the specific group in which the computation is carried out, the inversion of an element is “gratis”, that is to say if the computational complexity for the inversion is very low compared to the other group operations, and if use is made of signed coefficients, then it can always be assumed that cεC also implies −cεC. If the inversion is complicated in computational terms, all the elements of the set C are non-negative integers.
A so-called “square-and-multiply” exponentiation algorithm for the computation of ge, wherein g is a group element and e is an integer, then operates in a known manner as follows:
The number of group operations is then approximately equal to the number of non-vanishing coefficients ei in the representation Σi=0n ei2i of the exponent e (these group operations are multiplications either by precomputed or given group elements or, if the inversion of group elements is fast, by the inverses thereof) plus
A good match between the size of C and the density of the representation is the path to optimal performance in the representation of the exponent.
Examples of exponent recoding include:
With regard to exponent recoding, however, it should be considered that this recoding may in many cases not take place “online”, that is to say during the exponentiation itself; for this reason, the recoded exponents must first be stored. However, this storage requirement is disadvantageous in particular in extremely restricted environments, such as in smart cards for example, since in such an extremely restricted environment each byte of the memory is “precious”.
Based on the abovementioned disadvantages and shortcomings, and with reference to the outlined prior art, it is an object of the present invention to further develop a method of the type mentioned above in such a manner that the requirement in terms of storage space for recoded exponents or scalars is reduced as much as possible even and especially in extremely restricted environments, such as in smart cards for example.
This object is achieved by a method having the features specified in claim 1. Advantageous embodiments and expedient developments of the present invention are characterized in the dependent claims.
The present invention is thereby based on the principle of almost-online recoding for single exponentiation or single scalar multiplication or for multi-exponentiation or multi-scalar multiplication in restricted environments; in this connection, “almost-online” recoding means that the exponent or scalar is split into sections which are individually recoded and the recoding of which takes place in layers between parts of the (multi-)exponentiation or the (multi-)scalar multiplication.
The technique of “almost-online” recoding may be used to reduce the storage requirement for the recoded exponents or for the recoded scalars. The effects of almost-online recoding on the total running time of the (multi-)exponentiation or the (multi-)scalar multiplication are usually minimal.
Based on the abovementioned exemplary recoding operations, in the method according to the present invention it is assumed that the recoding in the case of multi-exponentiation or multi-scalar multiplication is of the form ei=Σj=0n bi,j2j; in the case of (single) exponentiation or (single) scalar multiplication, which is a special case of multi-exponentiation or multi-scalar multiplication, the assumed basis is accordingly taken as e=Σj=0n bj2j, wherein n=|log2e| is the bit length of e, that is to say this bit length n is at most one bit longer than the binary representation. In other words, this means that n+1 is to be understood as the maximum length of any exponent or scalar ei=Σj=0n bi,j2j.
It is furthermore assumed that the recoded algorithm depends—possibly not explicitly—on a parameter w which usually corresponds to the width of a window over which the bits of the exponents or scalars ei are read, or to the upper limit of such a width.
On this basis, according to the teaching of the present invention, the multi-exponentiation which can be expressed by symbols in the notation πi=1d gie
wherein gi is an element of the group G and
The special case of (single) exponentiation is obtained above for d=1, that is to say when there is a single element g and a single exponent e assigned to the element g, which can de facto be equated with omitting the index i; in this case, an element g is therefore exponentiated by an exponent e, in particular an integer exponent, having a maximum bit rate n or bit length, to form a power ge, wherein the element g once again derives from a multiplicatively notated Abelian group G.
In an analogous manner, according to the teaching of the present invention, the multi-scalar multiplication which can be expressed by symbols in the notation Σi=1d eigi, in the case of an additively notated group, in particular an Abelian group, G, takes place in the following steps:
all multiples c·gi,
wherein c is a permissible positive coefficient and
The special case of (single) scalar multiplication is obtained above for d=1, that is to say when there is a single element g and a single scalar e assigned to the element g, which can de facto be equated with omitting the index i; in this case, an element g is therefore multiplied by a scalar e, in particular an integer scalar, having a maximum bit rate n or bit length, to give a product e·g, wherein the element g once again derives from an additively notated Abelian group G.
According to one preferred further embodiment of the present invention,
The present invention furthermore relates to a microprocessor which operates in accordance with a method of the type described above.
The present invention furthermore relates to a device, in particular a chip card and/or in particular a smart card, having at least one microprocessor of the type described above.
The present invention finally relates to the use
As already mentioned above, there are various possibilities for advantageously implementing and developing the teaching of the present invention. In this respect, on the one hand reference is made to the claims dependent on claim 1 and on the other hand further embodiments, features and advantages of the present invention will be described in more detail below on the basis of the exemplary implementation of five examples of embodiments, wherein
The five examples of embodiments shown below in respect of the present invention are used for a general technique in the form of so-called almost-online recoding, which can be used to considerably reduce the storage requirement of
The technique of almost-online recoding may be very useful in extremely restricted environments, such as in chip cards or in smart cards for example, wherein the saving in terms of storage space may depend considerably on the specific situation (possibly, a throughput loss which is nevertheless very low may occur, particularly when the exponent or scalar is divided into too many small parts (=into too many small “chunks”); the effect on performance may then be noticeable).
If G is an Abelian group with an order of 2″, and it is assumed that an element gεG and an integer e are given, the aim according to the invention is to compute x=ge as quickly as possible. The recoding according to the invention makes the exponentiation very quick, but this recoding cannot be used online, that is to say cannot take place during the exponentiation itself; this is the case for example in the w[indow]N[on]A[djacent]F[orm].
The technique used in almost-online recoding consists in dividing the exponents e into a number of “exponent chunks”, that is to say into a number of exponent sections or into a number of exponent parts which are considerably longer than w bits but also much shorter than e. The chunks or parts are then recoded individually, used once, and then the memory in which the chunks or parts were stored is reused to recode the next chunk or the next part, so that the total storage space required for the exponents n can be significantly reduced.
The almost-online recoding shown below takes place under the assumption that the chunks or parts have a length of L bits. The reason that L is much greater than w is that the estimates for the number of non-vanishing coefficients in recoded exponents are usually given asymptotically, but the actual number of non-vanishing coefficients in recoded exponents is sometimes greater on account of a small additive constant, and this is shown below on the basis of a specific example.
Hereinbelow, within the context of the first example of embodiment of almost-online recoding, an algorithm is presented in which the following are entered:
It should be noted here that it may happen after L bits that the above algorithm carries out two group multiplications in a row instead of only one group multiplication. This happens if one of the chunks ei(=one of the exponent parts ei) represents an uneven number and if the recoding of the following chunk ei+1(=of the following exponent part ei+1) is one coefficient longer (bL not equal to zero).
Using a specific example in which the selected recoding is the w[indow]N[on]A[djacent]F[orm], it can now be shown that the loss in terms of speed is minimal and that the saving in terms of storage space may be quite great:
For n=160, the optimal value of w is equal to 5 (cf. H. Cohen, “Analysis of the flexible window powering algorithm”, advance copy obtainable at http://www.math.u-bordeaux.fr/˜cohen/); seven powers g3, g5, g7, g9, g11, g13, g15 of the basic element g thus have to be precomputed, and g2 is also temporarily required. At least five bits per recoded coefficient are required, but the implementor uses presumably complete signed bytes.
Two recoded exponents require 320 bytes of R[andom]A[ccess]M[emory], but two recoded 32-bit chunks (=32-bit sections or 32-bit parts) require only 66 bytes of R[andom]A[ccess]M[emory]. The 254 bytes of R[andom]A[ccess]M[emory] which are saved may be used to store six points of an elliptic curve in affine coordinates.
Cohen has now proven (cf. H. Cohen, “Analysis of the flexible window powering algorithm”, advance copy obtainable at http://www.math.u-bordeaux.fr/˜cohen/) that the average Hamming weight of the w[indow]N[on]A[djacent]F[orm] of an integer having n bits (which is the average number of multiplications in the corresponding exponentiation plus one) is equal to
n/(w+1)+1−0.5(w−1)(w+2)/(w+1)2+O(p−n),
wherein p=p(w) is a real number greater than one which is dependent only on w and not on n. In numerical terms,
p=21/2=1.414 . . . for w=3,
p=1.2157 . . . for w=4 and
p=1.1296 . . . for w=5.
The above set with regard to the average Hamming weight of the w[indow]N[on]A[djacent]F[orm] implies that, when an integer is split into r chunks or into r parts, the total Hamming weight of the r chunks or r parts is
(r−1)(1−0.5(w−1)(w+2)/(w+1 )2)
times greater than the Hamming weight of the original integer.
In the case where n=160, there may be selected L=32 and consequently r=5. The “flexible window” method requires on average 22/9=2.44 fewer group operations than the almost-online method according to the present invention. This difference is approximately 1.26 percent of the overall running time of the exponentiation algorithm (over the 193 group operations, including the time for the precomputations); however, the storage requirement for the recoded exponents has been reduced by approximately eighty percent.
The above algorithm from the first example of embodiment (single exponentiation) can be transformed into a multi-exponentiation method.
If group elements gi, . . , gdεG and exponents el, . . . , ed where d>1 are given and πi=1d gi e
Firstly, all the powers gic are computed and stored, wherein c is a permissible positive coefficient. A temporary variable x is then set to 1εG. For j=n, n−1, . . . , 0, x is first squared, and for i=1, . . . , d the squared x is multiplied by gie
This method is also referred to as fast exponentiation; as in the situation according to the first example of embodiment, it is once again desirable to retain the advantages of a good right-to-left recoding without having to use too much memory.
The following variant carries out recoding “almost-online”, that is to say almost during the fast multi-exponentiation or shortly after the fast multi-exponentiation, wherein the following are entered in the algorithm
The comments made in respect of the algorithm according to the first example of embodiment are also relevant here, that is to say in the case of elliptic curves over a finite field where n=160 and L=32, 2.44d group operations are used, wherein d is the number of powers which are to be multiplied by one another. Although this is more than in the case of single fast exponentiation, 254d bytes of R[andom]A[ccess]M[emory] can be saved, that is to say storage for 6d precomputed points in affine coordinates.
In the third example of embodiment, the use of almost-online recoding is 1 5 described in a generalization (cf. R. Avanzi, “On the complexity of certain multi-exponentiation techniques in cryptography”, published in Journal of Cryptology) of an algorithm by Yen, Laih and Lenstra (cf. S.-M. Yen, C.-S. Laih and A. K. Lenstra, “Multi-exponentiation”, IEE Proc. Comput. Digit. Tech., Volume 141, No. 6, November 1994).
In this connection, this third example of embodiment described below serves predominantly to explain the basic principles of the described algorithm; the increase in efficiency which can be achieved must be deemed to be rather small. The algorithm is essentially a variant of the trick by Shamir using a sliding window and is shown below:
The following are entered in the algorithm:
In this respect, it should be noted that fi at the start of step 2.(c) is the integer represented by a chain of w successive bits of the exponent e. After the standardization step 2.(e), at least one of the fi is uneven.
If in the group G the inversion of elements takes place quickly, the N[on]A[djacent]F[orm] is selected as the recoding. It can easily be seen that the number of signed integers having w bits in the N[on]A[djacent]F[orm] is Iw=(2w+2−(−1)w)/3. The set E contains all the elements of the form Πi=1d giki such that
The parameters w=2=d are then fixed and the N[on]A[djacent]F[orm] is selected for recoding the exponents. The reason for this is the production of digital signatures with elliptic curves (cf. American National Standards Institute, “ANSI X9.62: Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), 1999):
In this case, d=2, and for the relevant size of the exponents, namely from n=160 to n=240, the Parameter w=2 is optimal (cf. R. Avanzi, “On the complexity of certain multi-exponentiation techniques in cryptography”, published in Journal of Cryptology). The above algorithm from the third example of embodiment is thus used for almost-online multi-exponentiation with d=2=w and the N[on]A[djacent]F[orm], wherein the following are entered in the algorithm
It should be noted here that in step 3 the two interleaved loops of the above algorithm from the first example of embodiment and the simultaneous sequential interrogation of the above first algorithm from the third example of embodiment can be seen.
In steps 3.(c)(ii), 3.(c)(iii), 3.(c)(iv), 3.(c)(v), 3.(c)(vi), windows of width 2 are formed via the coupled N[on]A[djacent]F[orm]s of two chunks or of two parts having L bits.
Two “carry-overs” a1 and a2 store the values of a non-vanishing column if the following column is also non-vanishing, so that the values can be doubled during the next iteration and added to the values in the next column; cf. step 3.(c)(iii) . Steps 3.(c)(iv) and 3.(c)(vi) are carried out by a multiplication or by a division.
If two integers b1 and b2 are then written as bi=Σi=1m bi,j2j, a column consists of a pair of coefficients (b1,t, b2,t) from the above representations. The ordered sequence of such columns is the common representation of b1 and b2. The number of non-vanishing columns in a common representation is referred to as the Hamming weight of the representation, and the density thereof is the quotient of the Hamming weight to the length m.
The average Hamming weight of a joint representation of two N[on]A[djacent]F[orm]s is 5/9. It is possible to demonstrate that the number of multiplications to be expected in the main loop of the above second algorithm from the third example of embodiment is 11 n/27 (cf. R. Avanzi, “On the complexity of certain multi-exponentiation techniques in cryptography”, published in Journal of Cryptology), wherein the additional group operations which may be caused by the almost-online technique are ruled out.
The assumption that L is either the native word length of the C[entral]P[rocessing]U[nit] of the smart card or a small multiple thereof, for example L=32, also allows simpler implementation.
Using exponents having 160 bits and talking account of the fact that a N[on]A[djacent]F[orm] can efficiently be stored with only two bits per coefficient, approximately sixteen bytes of R[andom]A[ccess]M[emory] are required to store the two recoded 32-bit chunks (=the two recoded 32-bit sections or the two recoded 32-bit parts) instead of the eighty bytes for the full exponents. The saving in terms of storage space corresponds to the storage requirement of one point in projective coordinates on an elliptic curve over a finite field having 160 bits, and is thus not as considerable as in the two preceding examples of embodiments.
Based on a computer program which counts the number of windows formed by the above second algorithm from the third example of embodiment on pairs of numbers of given length, the average of the results from one hundred thousand run-throughs of the program can then be computed:
The average number of windows on pairs of numbers having 160 bits is 65.81153 (it should be noted that (11/27)·160=65.185), the average number of windows on pairs of numbers having 32 bits is 13.64216 (it should be noted that (11/27)·32=13.037). Consequently, it is to be expected, if n=160 and L=32, that the almost-online algorithm requires only 5·13.64216−65.81153=2.39927, that is to say about 2.4 more group operations than the above first algorithm from the third example of embodiment.
Since 235 is the total number of group operations of the above first algorithm from the third example of embodiment which is to be expected in the case where n=160, it may be estimated that the loss in terms of performance caused by the almost-online technique used according to the invention is approximately one percent.
There is an alternative representation to the N[on]A[djacent]F[orm] with the same Hamming weight, which can be computed by a simple algorithm that operates from left to right (cf. M. Joye and S.-M. Yen, “Optimal left-to-right binary signed-digit recoding”, IEEE Transactions on Computers 49 (7), 2000, pages 740 to 748). The question may be raised as to whether this representation could not be used instead of the almost-online recoding. The reason for the negative response is that this alternative does not have the N[on]A[djacent]F[orm] property, that is to say two successive coefficients should not both vanish.
The associated effects on the storage requirement are very poor. In the present case where w=2=d, the set E would consist of the elements g1a·g2b with either 0<a≦3 and −3≦b≦3, wherein a and/or b is uneven, or a=0 and b=1 or b=3; accordingly, the set E would have the cardinality 20; this would make the storage requirement of the above first algorithm of the third example of embodiment too great.
A similar consideration arises in respect of Solinas' “J[oint]S[parse]F[orm] —joint sparse representation” (cf. J. A. Solinas, “Low-Weight Binary Representations for Pairs of Integers”, Centre for Applied Cryptographic Research, University of Waterloo, Combinatorics and Optimization Research Report CORR 2001-41, 2001, obtainable at http://www.cacr.math.uwaterloo.ca/techreports/2001/corr2001-41.ps):
The joint sparse representation recodes the two exponents at the same time and in a manner dependent on one another. The average density of the J[oint]S[parse]F[orm] is ½ and the number of group operations in the main loop of the above first algorithm from the third example of embodiment with w=2=d is 3n/8 (as before, without including the precomputations and costs of almost-online recoding).
The number of precomputed points is twelve, and this is much greater than the number eight in the variant proposed above, without the throughput of the algorithm being considerably improved with inputs from 160 bits to 256 bits. For a more detailed discussion and for corresponding evidence, reference may be made to Sections 3.3 and 4.4 of H. Cohen, “Analysis of the flexible window powering algorithm”, advance copy obtainable at http://www.math.u-bordeaux.fr/˜cohen/.
Single scalar multiplication in an additively written Abelian group G is obtained, in comparison to the above first example of embodiment (single exponentiation), by obvious replacements [<--> neutral element “0”, “doubling”, “sum” in scalar multiplication instead of neutral element “1”, “squaring”, “product” in exponentiation] and is shown below in the context of the fourth example of embodiment of almost-online recoding as an algorithm in which the following are entered
Analogously to the first example of embodiment, it should be noted here that it may happen after L bits that the above algorithm carries out two group multiplications in a row instead of only one group multiplication. This happens if one of the chunks ei(=one of the exponent parts ei) represents an uneven number and if the recoding of the following chunk ei+1(=of the following exponent part ei+1) is one coefficient longer (bL not equal to zero).
The above algorithm from the fourth example of embodiment (single scalar multiplication) can be transformed into a multi-(scalar) multiplication method. Here, the multi-scalar multiplication is obtained in an additively written Abelian group G, in comparison to the above second example of embodiment (multi-exponentiation), by obvious replacements [<--> neutral element “0”, “doubling”, “sum” in multi-scalar multiplication instead of neutral element “1”, “squaring”, “product” in multi-exponentiation] and is shown below in the context of the fifth example of embodiment of almost-online recoding as an algorithm.
If group elements g1, . . . , gdεG and exponents e1, . . . , ed where d>1 are given and Σi=1d ei·gi is to be computed, firstly a decision is made to use a sparse recoding of the exponents e1, . . . , ed; use is then made of a “square-and-multiply” loop:
Firstly, all the multiples c·gi are computed and stored, wherein c is a permissible positive coefficient. A temporary variable x is then set to 0εG. For j=n, n−1, . . . , 0, x is first doubled, and for i=1, . . . , d the operand ei,j·gi is added to the doubled x, wherein ei,j is the coefficient of 2j in the recoding of ei. At the end, the temporary variable x contains the desired result.
This method is also referred to as fast multiplication; as in the situation according to the fourth example of embodiment, it is once again desirable to retain the advantages of a good right-to-left recoding without having to use too much memory.
The following variant carries out recoding “almost-online”, that is to say almost during the fast multi-scalar multiplication or shortly after the fast multi-scalar multiplication, wherein the following are entered in the algorithm
As a final part of the description, a list is given below of the numbers, elements, exponents, groups, indices, coefficients, sets, parameters, scalars, variables and digits mentioned in the present text:
bi,j coefficient
bi,L coefficient assigned to the highest power of two 2L
c permissible positive coefficient
C finite set of integers
d number of (basic or group) elements gi from the group G=number of exponents or scalars ei assigned to the (basic or group) elements gi
e exponent, in particular integer exponent, in the case of single exponentiation or scalar, in particular integer scalar, in the case of single scalar multiplication
ei exponent, in particular integer exponent, in the case of multi-exponentiation or scalar, in particular integer scalar, in the case of multi-scalar multiplication
ei,k−1 (exponent or scalar) chunk or (exponent or scalar) part following the (exponent or scalar) chunk or (exponent or scalar) part ei,k
ei,k (exponent or scalar) chunk or (exponent or scalar) part of the divided exponent or scalar ei
g (basic or group) element in the case of single exponentiation or in the case of single scalar multiplication
gi (basic or group) element in the case of multi-exponentiation or in the case of multi-scalar multiplication
G group, in particular Abelian group
i index
j index, in particular summation index
k variable, in particular indexed variable
L (exponent or scalar) chunk width or (exponent or scalar) part width, in particular bit rate of the (exponent or scalar) chunk width or of the (exponent or scalar) part width
n maximum bit rate or maximum bit length
r number of (exponent or scalar) chunks or (exponent or scalar) parts ei,k
w parameter
x temporary variable
| Number | Date | Country | Kind |
|---|---|---|---|
| 04100873.1 | Mar 2004 | EP | regional |
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/IB05/50614 | 2/18/2005 | WO | 00 | 8/16/2007 |