Details and additional information regarding the modes of implementing the two solutions proposed by Fisher and Seifert are proposed in D1.
As mentioned above, these two solutions make it possible to perform modular operations on numbers of at most 2×n bits using a processor which is intrinsically limited to numbers of n bits. However, these two solutions consume a lot of power since each MultModDiv or MultModDivInit function requires a certain number of elementary operations (additions, subtractions of register contents, shifting of bits in a register, etc.).
It should be noted that the implementation of a MultModDivInit function usually requires a greater number of elementary operations than the implementation of a MultModDiv function. (See in particular D1 for examples of implementation of MultModDiv and MultModDivInit functions). At best, with some processors, the implementation of the MultModDivInit operation is as costly as the MultModDiv operation (see, for example, Sedlak's algorithm).
The implementation of functions of the MultModDiv or MultModDivInit type requires the performance of elementary operations of type Q=└(X×Y)/Z┘ and R=(X×Y) mod Z, on numbers of n bits. Processors dedicated to cryptographic calculations usually have integrated hardware means for calculating modular operations of the type R=(X×Y) mod Z, but not always hardware means for performing Euclidean operations of the type Q=└(X×Y)/Z┘. In this case, use is usually made of emulating means (software) to perform the Euclidean operations on the basis of set of modular elementary operations.
For example, D1 describes a method which makes it possible to calculate Q=└(X×Y)/Z┘ based on operations of the type R1=(X×Y) mod Z and R2=(X×Y) mod (Z+1). More specifically, in D1, Q is calculated by the following relation:
Q=((X*Y) mod Z)−((X*Y) mod (Z+1) if this value is positive, or
Q=((X*Y) mod Z)−((X*Y) mod (Z+1))+(Z+1)
The operation R=(X×Y) mod Z is performed using the Montgomery algorithm, which is well known. However, although this algorithm is particularly effective (in terms of accuracy and calculation time) when the modulo Z is an odd integer, this is not the case when Z is an even integer. This means that the method proposed in D1 is not effective since one or the other of the numbers Z and Z+1 is necessarily even.
One object of the invention is to perform the same operations (A×B mod N, with A, B, N of 2×n bits) as the algorithms proposed by Fisher and Seifert, but by using a smaller number of elementary functions of the MultModDiv or MultModDivInit type, so as to provide a faster result while consuming less power.
Another object of the invention is a method of performing an operation of the type S=└(X×Y)/Z┘ on numbers of n bits which is more effective than the method proposed in D1. The method according to the invention can be used to perform a modular multiplication according to the invention.
The invention thus relates to a method of performing a modular multiplication of type A×B mod N, A, B, N being numbers of 2×n bits. Regardless of the mode of implementation of a method according to the invention, the numbers A, B, N are broken down into words of n bits. Then, during the method, operations are performed on the numbers A1, A0, B1, B0, N1 and N0 of n bits.
According to a first embodiment of the invention, the numbers A, B, N are broken down into a base 2n in the form: A=A1×2n+A0, B=B1×2n+B0 and N=N1*2n+N0. In other words, the n bits of high weight of A, B, N respectively form the word A1, B1, N1, respectively, and the n bits of low weight of A, B, N respectively form the word A0, B0, N0, respectively. Six elementary functions of MultModDiv type are then performed, the sixth providing the result of the modular multiplication.
Specifically, the following method A1 is carried out:
Input: A, B, N, integers of 2×n bits,
broken down in the form
A=A
1×2n+A0; B=B1×2n+B0; N=N1×2n+N0;
Output: A×B mod N
Perform:
(Q1, R1)=MultModDiv(A1, B1, N1)
(Q2, R2)=MultModDiv(Q1, N0, 2n)
(Q3, R3)=MultModDiv(A1+A0, B1+B0, 2n−1)
(Q4, R4)=MultModDiv(A0, B0, 2n)
(Q5, R5)=MultModDiv(2−1, R1+Q3−Q2−Q4, N1)
(Q6, R6)=MultModDiv(Q5, N0, 2n)
Return (R3+R5−Q6−R2−R4)×2n+(R2+R4−R6)
The above algorithm A1 indeed performs the modular multiplication A×B mod N. According to Karatsuba's lemma (see in particular D2: “Multiplication Of Multidigit Numbers On Automata” Soviet Physics—Doklady, volume 7, pages 595-596, 1963), we have:
A×B=2n(2−1)A1×B1+2n(A1+A0)×(B1+B0)−(2n−1)A0×B0
Since N=N1×2n+N0, we have N1×2n ηN −N0, where ηN is the equivalence relation modulo N.
From the definition of Q1 to Q6, R1 to R6 in algorithm A1 and from the definition of the MultModDiv function, it can be deduced that:
2n(2n−1)×A1×B1ηN2n(2n−1)(Q1×N1×R1)
ηN−(2n−1)×(Q1×N0)+2n(2n−1)>R1
ηN−(2n−1)(Q2×2n+R2)+2n(2n−1)R1
ηN2n(2n−1)(R1−Q2)−(2n−1)×R2
2n(A1+A0)(B1+B0)=2n((2n−1)Q3+R3)=2n(2n−1)Q3+2n×R3
(2n−1)×A0×B0=(2n−1)(2n×Q4+R4)=2n(2n−1)Q4+(2n−1)R4
The following is finally deduced therefrom:
A×Bη
N2n(2n−1)(R1+Q3−Q2−Q4)+2n×R3−(2n−1)(R2+R4)
ηN2n(Q5×N1+R5)+2n×R3−(2n−1)(R2+R4)
ηN−Q5×N0+2n(R3+R5)−(2n−1)(R2+R4)
ηN−(Q6×2n+R6)+2n(R3+R5)−(2n−1)(R2+R4)
ηN2n(R3+R5−Q6−R2−R4)+(R2+R4−R6)
and thus:
A×B mod N=2n(R3+R5−Q6−R2−R4)+(R2+R4-R6) which is the result produced by the algorithm A1.
It will be noted that the method A1 according to the invention uses one MultModDiv function less than the method FS1 known from the prior art. Thus, for the same result, there is a reduction in the total number of operations to be performed and consequently a reduction in total time and in the overall power consumed for executing the method. Specifically, since one MultModDiv operation performs 2 modular multiplications on numbers of n bits, the algorithm A1 in this case uses 12 modular multiplications on numbers of n bits instead of 14 in the algorithm FS1, hence a gain in calculation time of (14−12)/14=14%.
According to a second embodiment of the invention, the numbers A, B, N are likewise broken down into a base 2n in the form: A=A1×2n+A0, B=B1×2n+B0 and N=N1*2n+N0. One function of MultModDivInit type and four elementary functions of MultModDiv type are then carried out, the fourth providing the result of the modular multiplication.
Specifically, the following method A2 is carried out:
Input: A, B, N, integers of 2×n bits, broken down in the form
A=A
1×2nA0; B=B1×2n+B0; N=N1×2n+N0;
Output: A×B mod N
Perform:
(Q1,R1)=MultModDiv(A1,B1,N1)
(Q2,R2)=MultModDiv(A1+A0,B1+B0,2n−1)
(Q3,R3)=MultModDiv(A0,B0,2n)
(Q4,R4)=MultModDivInit(Q1,N0,Q3−R1−Q2,N1)
(Q5,R5)=MultModDiv(N0+N1,Q4,2n)
Return (R2+Q5−R3−R4)×2n+(R3+R4+R5)
The algorithm A1 indeed performs the modular multiplication A×B mod N. From the definitions of Q1 to Q5, R1, to R5, we have:
2n(2n−1)×A1×B1ηN2n(2n−1)Q1×N1+R1)
(2n−1)(−Q1×N0+R1×2n)
2n(A1+A0)(B1+B0)=2n((2n−1)Q2+R2)=2n(2n−1)Q2+2n×R2
(2n−1)A0×B0=(2n−1)(2n×Q3+R3)=2n(2n−1)Q3+(2n−1)×R3
From Karatsuba's lemma (D2), we have:
A×B=2n(2n−1)A1×B1+2n(A1+A0)×(B1+B0)−(2n−1)A0×B0
The following is thus deduced therefrom (since N1(2n−1) ηN−N0−N1):
A×Bη
N−(2n−1)(Q1×N0+(Q3−R1−Q2)×2n)+2n×R2−(2n−1)R3
(2n−1)(Q4×N1+R4)+2n×R2−(2n−1)R3
(N0−N1)Q4+2n×R2−(2n−1)(R3+R4)
(2n×Q5+R5)+2n×R2−(2n−1)(R3+R4)
2n(R2+Q5−R3−R4)+(R3+R4+R5)
and finally:
A×B mod N=2n(R2+Q5−R3−R4)+(R3+R4+R5)
which is the result produced by the algorithm A2.
It will be noted that the method A2 according to the invention uses one MultModDiv function less than the method FS2 known from the prior art. Thus, in this case too, for the same result, there is a reduction in the total number of operations to be performed and consequently a reduction in total time and in the overall power consumed for executing the method. Specifically, since one MultModDiv operation or one MultModDivInit operation performs 2 modular multiplications on numbers of n bits, the algorithm A2 in this case uses 10 modular multiplications on numbers of n bits instead of 12 in the algorithm FS2, hence a gain in calculation time of (12−10)/12=16% compared to the equivalent method FS2 of D1.
According to a third embodiment of the invention, the numbers A, B, N are broken down into a base U, U being other than 2n, such that: A=A1×U+A0, B=B1×U+B0 and N=N1×U+N0. A1, A0, B1, B0, N1 and N0 are words of n bits. Elementary operations of MultModDiv type are then carried out on the words A1, A0, B1, B0, N1 and N0.
In a first example, the following method A3 is carried out:
Input: A, B, N, integers of 2×n bits, broken down in the form
A=A1×U+A0; B=B1×U+B0; N=N1×U+N0;
Output: A×B mod N
Perform:
(Q1, R1)=MultModDiv(A0 B0 U)
(Q2, R2)=MultModDiv(A1+A0, B1+B0, U)
(Q3, R3)=MultModDiv(A1, B1, U)
(Q4, R4)=MultModDiv(α, Q3, U)
(Q5, R5)=MultModDiv(α, −Q1+Q2−Q3+Q4+R3, U)
Return (R5+R1)+(R4−R,+Q,+R2−R3+Q5)×U
where α=U2 mod N.
In this first example, the algorithm A3 indeed performs the modular multiplication A×B mod N. Since α=U2 mod N, we have U2 ηN α. Moreover, from the definitions of Q1 to Q5, R1 to R5, we have:
(U−1)×A0×B0ηN(U−1)(Q1×U+R1)
ηNQ1×α−R1+(R1−Q1)×U
U×(A1+A0)(B1+B0)ηNU(Q2×U+R2)
ηNQ2×α+R2×U
U(U−1)×A1×B1ηNU2×A1×B1−U×A1×B1
ηNU×(R3×U+Q3×α)−U×(Q3×U+R3)
ηNR3×α+Q3×α×U−Q3×α+R3×U
ηN(−Q3R3)×α+(−R3+Q3×α)×U
From Karatsuba's lemma (D2), we obtain:
A×B=U(U−1)A1×B1+U×(A1+A0)×(B1+B0)−(U−1)A0×B0,
i.e.:
A×Bη
N(−Q3+R3)×α+(−R3+Q3×α)×U +Q2×α+R2×U−Q1×α+R1−(R1−Q1)×U
ηNα(−Q1+Q2−Q3+R3)+R1+U(−R1+Q1+R2−R3+Q3×α)
ηNα(−Q1+Q2−Q3+Q4+R3)+R1+U(R4−R1+Q1+R2−R3)
ηN(R5+R1)+U(R4−R1+Q1+R2−R3+Q5)
which is the result produced by algorithm A3.
It will be noted that, in this first example, the method A3 according to the invention uses an even smaller number of MultModDiv functions than the known methods or even than the first embodiment or the second embodiment of the invention. We thus again have, for the same result, an even greater reduction in the total number of operations to be performed and consequently a reduction in total time and in the overall power consumed for executing the method. Specifically, since one MultModDiv operation or one MultModDivInit operation performs 2 modular multiplications on numbers of n bits, the algorithm A3 in this case uses 10 modular multiplications on numbers of n bits instead of 14 in the algorithm FS1, hence a gain in calculation time of (14−10)/14=28% compared to the equivalent method FS1 of D1.
In order to carry out this first example, it would be possible for example to select U=┌√N┐, where √N is the square root of N and ┌√N┐ is the rounded-up integer part of √N (in other words, U is the rounded integer immediately greater than EN). It would also be possible to select U=┌√(k.N)┐, where k is an integer. Preferably, k is selected such that α=U2 mod N is as small as possible.
In a second example, U is defined by the relation: U2 ηN α+δ×U. α and δ are integers which are preferably constant and as small as possible (less than 256 bits). α and δ are preferably selected such that α+δ2 is also a constant and small integer.
Ideally, δ=1 and α=−1, 2 or 3. Such values α, δ can be obtained by selecting a suitable number N, that is to say by selecting a suitable key generation method (it will be recalled that N here is an element of a key for a cryptographic algorithm).
The method A3 can be simplified to give the following method A4:
Input: A, B, N, integers of 2n bits, broken down in the form
A=A
1
×U+A
0
; B=B
1
×U+B
0
; N=N
1
×U+N
0;
Output: A×B mod N
Perform:
(Q1,R1)=MultModDiv(A0,B0,U)
(Q2,R2)=MultModDiv(A1,+A0,B1+B0,U)
(Q3,R3)=MultModDiv(A1,B1,U)
Return α×(−Q1+Q2−Q3+R3+δ×Q3)+R1+U×[(−R1−R3+Q1+R2+Q3(α+δ2)+(−Q3+R3−Q1+Q2)×δ]
The algorithm A4 indeed performs the modular multiplication A×B mod N. By using U2 ηN α+δ×U and Karatsuba's lemma, we obtain:
(U−1)×A0×B0ηN(U−1)(Q1×U+R1)
ηN−Q1×U−R1+R1×U+Q1(α+δ×U)
ηNQ1×α−R1+(R1−Q1+Q1×δ)×U
U×(A1+A0)(B1+B0)ηNU(Q2×U+R2)
ηNR2×U+Q2×(α+δU)
ηNα×Q2+(R2+δ×Q2)U
U×A
1
×B, η
N
U(Q3×U+R3)ηNαQ3+(R3+δ×Q3)U
U
2
×A
1
×B
1ηNU(α×Q3+(R3+δ×Q3)×U)
ηNα×U×Q3+(R3+δ×Q3)×(α+δ×U)
ηNα×(R3+δ×Q3)+((δ×(R3+δ×Q3)+α×Q3)×U
U(U−1)×A1×B1ηNα×(R3−Q3+(δ×Q3)+((R3+δ×Q3)×(δ−1)+α×Q3)×U
ηNα×(R3−Q3+δ×Q3)+[−R3+δ×(−Q3+R3)+Q3×(δ2+α)]×U
and thus:
A×Bη
Nα×(−Q1+Q2−Q3+R3+δ×Q3)+R1+U×(−R1−R3+Q1+R2+Q3×(α+δ2)+δ×(−Q3+R3−Q1+Q2)
It will be noted that the method A4 according to the invention uses an even smaller number of MultModDiv functions than the known methods or even than the first embodiment or the second embodiment of the invention. We thus again have, for the same result, an even greater reduction in the total number of operations to be performed and consequently a reduction in total time and in the overall power consumed for executing the method.
In a third example, the following method A5 is carried out:
Input: A, B, N, integers of 2n bits,
broken down in the form
A=A
1
×U+A
0
; B=B
1
×U+B
0
; N=N
1
×U+N
0;
Output: A×B mod N
Perform:
(X1, Y1, Z1, R1)=Coefficients(A, B, U)
(Q2, R2)=MultModDiv(α, X1, U)
(Q3, R3)=MultModDiv(α, Y1+Q2, U)
Return R1+R3+(R2+Z1+Q3)×U
The Coefficient function calculates in a base U the coefficients of the polynomial produced from two integers A, B broken down into the base U (A1×X+A0 and B=B1×X+B0):
Coefficients (A, B, U)=(C3, C2, C1, C0) such that
C3=f; C2=d+3f; C1=e+2f; C0=R0; where:
R0=(A mod U) (Bmod U) mod U
R1=(A mod(U+1))(Bmod(U+1))mod(U+1)
R2=(A mod(U+2))(Bmod(U+2))mod(U+2)
R3=(A mod(2U+3))(Bmod(2U+3))mod(2U+3)
and
a=(R0−R2+((R0−R2) mod 2)(U+2))/2 mod U+2
b=R0−R1 mod (U+1)
c=(2(R0−R3)+(2(R0−R3) mod 3)(2U+3))/3 mod (2U+3)
d=((b−a) mod (U+1))
e=a+2d
f=−6d+4e−4c mod (2U+3)
C3, C2, C1, C0 verify A×B=C3×U3+C2×U2+C1×U+C0
It should be noted that, since A=A1>U+A0 and B=B1>U+B0, R0, R1, R2, R3 can easily be calculated from the following relations:
R0=A0B0 mod U
R1=(A0−A1) (B9−B1) mod (U+1)
R2=(A0−2A1) (B0−2B1) mod (U+2)
R3=(A0+(A1 mod 2)U−3(A1 div 2))×(B0+(B1 mod 2)U−3 (B1 div 2)) mod (2U+3)
As above, the cost, in terms of calculation time, of the auxiliary operations such as additions, subtractions, etc. is negligible.
The algorithm A5 indeed performs the modular multiplication A×B mod N. By using the definition of the Coefficients function, we have:
A×B=X
1
×U
3
+Y
1
×U
2
+Z
1
×U+R
1
ηNR1+α×Y1+(α×X1+Z1)×U
ηNR1+α×Y1+(Q2×U+R2+Z1)×U
ηNR1+α×(Y1+Q2)+(R2+Z1)×U
ηNR1+R3+(R2+Z1+Q3)×U
In this third example, it would be possible to select U=┌√N┐. It would also be possible to select U=┌√(k.N)┐ where k is an integer.
k is preferably selected to be as small as possible such that U is odd and cannot be divided by three. To this end, for example, a number of increasing values of k will be tested until a satisfactory value of U is obtained. It would also be possible to select k to be as small as possible such that U is odd and cannot be divided by three and such that α=U2 mod N is as small as possible.
Finally, the invention relates to a method of performing an operation of type S=└(X×Y)/Z┘, where X, Y, Z are numbers of n bits. This method can be used to perform a modular multiplication as described above and more generally a MultModDiv function. During the method, the following steps are carried out:
E1: a variable Δ↑ is initialized to zero and two data items are calculated:
C=X×Y mod Z and Cβ=X×Y mod (Z+β), β being a predefined positive integer of n bits.
E2: the result S=[C−Cβ−Δβ(Z+β)]/β is calculated.
E3: if the intermediate data item is not an integer, the variable Δβ is incremented by 1 and step E2 and then step E3 are repeated.
To justify the method as described above, the following text will demonstrate firstly that an integer Δβ exists such that S=└XY/Z┘=[C−Cβ−Δβ(Z+β)]/β. It will then be shown that the above method makes it possible to find the correct value of Δβ and S.
Let β be a positive integer of n bits. We define:
C=X×Y mod Z
Cβ=X×Y mod (Z+β)
Δβ=└XY/Z┘−└XY/(Z+β)┘
Δβ verifies: 0≦Δβ≦β. In fact, since Z<Z+β, we have XY/(Z+β)<XY/Z and we deduce therefrom that └XY/(Z+β)┘<└XY/Z┘. Furthermore:
XY/Z=[XY/(Z+β)]×(1+β/Z)≦XY/(Z+β)+[(Z−1)2/((Z+β)Z)]<XY/(Z+β)+β
Since β is an integer, ┌β┐=β and hence:
└XY/Z┘≦└XY/(Z+β)┘+┌β┐=└XY/(Z+β)┘+β
These last two inequalities allow us to conclude that └XY/(Z+β)┘≦└XY/Z┘≦└XY/(Z+β)┘+β, i.e.:
0≦Δβ≦β
Then, by definition of C and Cβ:
i.e. C=└XY/Z┘×β+Δβ×(Z+β)+Cβ and hence:
└XY/Z┘=[C−Cβ−Δβ×(Z+β)]/β
(calculation of step E3 of the method)
From the last relation, it can be deduced that └XY/Z┘ is equal to (C−Cβ)/β less a corrective term Δβ(Z+β)/β. Furthermore, by definition, └XY/Z┘ is an integer, as are β and Δβ. Thus, by calculating=[C−Cβ−Δβ×(Z+β)]/β systematically using the various possible values of Δβ, and by verifying whether the result is an integer, the correct value of β is calculated and thus the correct value of └XY/Z┘.
As shown above, 0≦Δβ≦β. Thus, by selecting a small parameter β, there are a limited number of possible values for Δβ and thus a limited number of attempts necessary to obtain the correct value of └XY/Z┘.
Experience shows that the best results (in terms of calculation speed in particular) are obtained for values of b selected to be equal to a power of 2, among which values β=2 give the best results. This is because, with β=2, modular reductions (C=X×Y mod Z and Cβ=X×Y mod (Z+β)) are carried out only with moduli (Z and Z+2) having the same parity as Z and thus any difficulty associated with calculating a modular multiplication using an even modulo is avoided. Furthermore, division by β=2 is not costly in terms of calculation time, since such an operation in practice comes down to shifting the content of a register one bit to the right. Finally, at most two values of Δβ have to be tested (0 and 1).
Number | Date | Country | Kind |
---|---|---|---|
0310060 | Aug 2003 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR04/50387 | 8/20/2004 | WO | 00 | 11/1/2007 |