This application claims the benefit of Great Britain Patent Application No. 0412084.6, filed on 29 May 2004, which is hereby incorporated by reference.
The present invention relates to a method and apparatus for calculating a modular inverse.
The background of the invention will now be described with reference to the accompanying tables in which:
Table 4 provides a pseudo-code listing of the steps involved in the implementation of the Savas and Koç method of calculating a classical modular inverse and a Montgomery modular inverse.
Recent years have seen rapid growth in the area of electronic communications and electronic commerce (e.g. email, online shopping and online banking). With this growth, there has been increased demand for mechanisms of ensuring the security of such communications. Public key encryption systems are useful in this context as they provide the features of confidentiality, authentication, data integrity and non-repudiation.
Accordingly, the problem facing the security industry is that of producing high-speed, low-cost and robust cryptographic products in order to satisfy customer demands for real-time encryption and repel cryptanalytic attacks.
Modular arithmetic is a key ingredient of many public key crypto-systems. It provides finite structures (called “rings”) which have all the usual arithmetic operations of integers and which can be easily implemented with existing computer hardware.
Given an integer a and a k-bit integer p (2k−1≦p<2k), a−1 (mod p) is the modular (multiplicative) inverse of a (mod p) and is classically defined as the integer ModInv(a) such that
a*ModInv(a)=1(mod p) (1)
The above expression only has a unique solution if a and p are relatively prime.
Modular multiplicative inversion has a number of uses in cryptography. In particular, one its main uses is in the generation of private keys from public keys in accordance with the well-known RSA algorithm. These private keys are used to decrypt a message encrypted (by the RSA algorithm) with the public key. Such private keys may also be used as digital signatures to enable the identification of the originator of a digital communication and to guarantee the integrity of the communication.
Modular multiplicative inversion is also used in a wide variety of elliptic curve cryptosystems. For instance, the El Gamal algorithm is based on the multiplication of a secret integer with a point on an elliptic curve to generate a public key. A digital communication is then encrypted by further scalar multiplication with the public key. The above scalar point multiplications can be represented as a number of point addition and doubling operations which are based on the calculation of modular multiplicative inverses.
Modular inverses have been traditionally calculated using the extended Euclidean algorithm. However, this algorithm is iterative in nature and thus, may be slow to calculate the modular inverse of a large number. This feature is becoming increasingly problematic as ever larger keys are used to make it more difficult for unauthorised persons to crack encryption schemes. In view of the problems with the extended Euclidean algorithm and the demand for high-speed or real-time encryption, one of the main objectives of the present invention is to provide a mechanism for rapidly calculating modular inverses for use in RSA key generation and elliptic curve cryptography.
Modular inverses are also used for calculating modular exponents that are used in the RSA algorithm, Diffie Hellman key exchange scheme and El Gamal encryption scheme. One method of performing modular exponentiation is to break it up into a series of modular multiplication operations in an addition-subtraction chaining approach. Using this approach, given integers integer a, c and p where a<p, the modular exponent ac (mod p) can be calculated by multiplying intermediate values starting with a and a−1(mod p).
The Montgomery multiplication algorithm (P. L. Montgomery, Math. Computation (44) 519-521) is a technique that provides an efficient mechanism for implementing modular multiplication. In particular, given an integer a<n, where p is a k-bit integer (2k−1≦p<2k), A is said to be its p-residue with respect to r=2k if,
A=a*r(mod p) (2)
Likewise, given an integer b<p, B is said to be its p-residue with respect to r if,
B=b*r(mod p) (3)
The Montgomery product of the two residues A and B can then be defined as the scaled product,
MP=A*B*2−1(mod p) (4)
where r−1 is the multiplicative inverse of r modulo p (i.e. r*r−1=1 (mod p)).
However, since r=2k, the Montgomery product can also be represented as
MP=A*B*2−k(mod p) (5)
From this expression it can be seen that the Montgomery multiplication algorithm effectively replaces the step of division by p in an ordinary modular multiplication process with a division by a power of two (i.e. a shift operation).
Consequently, the Montgomery multiplication algorithm is particularly suited to the inherently binary nature of general-purpose computers and provides a simpler and faster method of performing modular multiplication than more traditional methods.
The above-described Montgomery multiplication algorithm can also be used to calculate modular exponents in an addition-subtraction chaining approach. Using this approach, the modular exponent ac (mod p) may be calculated from intermediate values starting with a*2k(mod p) and a−1*2k (mod p).
Using the representation scheme employed in the Montgomery multiplication algorithm the Montgomery modular inverse of an integer a (henceforth referred to as MonInv(a)) is defined as
MonInv(a)=a−1*2k (mod p) (6)
At present, there are two methods available for calculating a Montgomery modular inverse, namely the Kaliski method and the Savas and Koç method. Both of these methods will be discussed in more detail below.
(a) Kaliski Method of Calculating a Montgomery Modular Inverse
Kaliski (B. S. Kaliski, IEEE Trans. Computers 44(8), 1064-1065) developed a two stage algorithm for calculating the Montgomery modular inverse. In the first stage an “Almost Montgomery Inverse” is calculated, wherein the “Almost Montgomery Inverse” (Phase1 (a)) is defined as
Phase1(a)=a−1 2z(mod p) (7)
where z is an integer and k≦z≦2k.
The second stage of Kaliski's algorithm completes the operation by using the “Almost Montgomery Inverse” (Phase1 (a)) and z to calculate MonInv(a).
A variant of the Kaliski algorithm can be used for calculating a classical modular inverse. Accordingly, there are two separate Kaliski algorithms, the first of which (Kaliski ModInv( )) provides a mechanism of calculating a classical modular inverse and the second of which (Kaliski MonInv( )) provides a mechanism of calculating a Montgomery modular inverse of an integer already in the Montgomery domain.
The input and output variables to the two Kaliski algorithms are outlined in Table 1. The steps involved in the implementation of the two Kaliski algorithms are shown in Table 2. Referring to Table 2 it can be seen that both Kaliski algorithms employ recurrence loops to achieve inversion.
b) Savas and Koç Method of Calculating a Montgomery Modular Inverse
Savas and Koç (E. Savas and C. K Koç: IEEE Trans. on Computers, 49(7), 763-766) suggested that Montgomery multiplication could be used to replace the iterative loops in the Kaliski algorithms. In particular, if m is defined to be an integer multiple of the word size (w) of the host computer system and m≧k, the output z from Phase 1 of the Kaliski method is an integer satisfying k≦z≦k+m. The Savas and Koç algorithms further assume that R2=22m(mod p) and the inputs to the Montgomery product function (MP) are m-bit integers.
In a similar fashion to the Kaliski algorithms, a variant of the Savas and Koç algorithm can be used for calculating a classical modular inverse. Accordingly, there are two separate Savas and Koç algorithms, the first of which (Savas/Koç ModInv( )) provides a mechanism of calculating a classical modular inverse and the second of which (Savas/Koç MonInv( )) provides a mechanism of calculating a Montgomery Modular Inverse.
The input and output variables to the two Savas and Koç algorithms are outlined in Table 3. Table 4 outlines the steps involved in the implementation of the two Savas and Koç algorithms.
Referring to Table 4 it can be seen that the Savas and Koç ModInv( ) algorithm involves one or two Montgomery multiplication operations. Similarly, the Savas and Koç MonInv( ) algorithm involves two or three Montgomery multiplication operations.
Both the Kaliski and Savas and Koç algorithms were originally developed for software implementation. If these algorithms were to be implemented in hardware, then two separate circuit architectures would be required.
According to the invention there is provided a method of calculating a classical modular inverse or a Montgomery modular inverse of an integer a (mod p), where p is a k-bit integer, comprising the steps of:
Preferably, the first input variable is a and the second input variable is one when calculating a classical modular inverse; and the first input variable is a2k mod(p) and the second input variable is R2 when calculating the Montgomery modular inverse.
According to a second aspect of the invention there is provided an apparatus for calculating a classical modular inverse or a Montgomery modular inverse of an integer a (mod p), where p is a k-bit integer, comprising:
Preferably, the first input variable is a and the second input variable is one when calculating a classical modular inverse and the first input variable is a2k (mod p) and the second input variable is R2 when calculating a Montgomery modular inverse.
Preferably, the apparatus further comprises a means of transmitting the output of the fourth calculating means.
Preferably, the second calculating means is implemented in a control unit that further comprises a logic unit which compares z and k.
Desirably, the second, third and fourth calculating means comprise a multiplier unit, an addition unit and a subtraction unit.
Desirably, the addition unit and the subtraction unit employ fast carry chains and two's complement addition.
Desirably, the multiplier unit comprises a plurality of cascaded unsigned multiplier units.
Preferably, the multiplier unit comprises a means of adding the outputs from the unsigned multiplier units employing look-ahead carry chains.
Preferably, the apparatus is a field programmable gate array.
Optionally, the apparatus is an application specific integrated circuit.
Preferably, the apparatus operates on 256 bit data.
According to a third aspect of the invention there is provided a method of generating a private encryption key from a public encryption key by calculating the modular inverse of the public encryption key with the method of the first aspect.
Preferably, the private encryption key is employed in an RSA algorithm.
According to a fourth aspect of the invention there is provided an apparatus for generating a private encryption key from a public encryption key comprising a means for performing the method of the third aspect.
According to a fifth aspect of the invention there is provided a digital signature generated from the private key produced by the method of the third aspect.
According to a sixth aspect of the invention there is provided a method of encrypting data comprising the steps of:
According to a seventh aspect of the invention there is provided an apparatus for encrypting data comprising:
The present invention improves on the algorithms developed by Kaliski and Savas and Koç by providing a single unified algorithm that can compute both the classical modular inverse and the Montgomery modular inverse of an integer. Accordingly, the present invention provides a mechanism for substantially reducing the silicon usage of hardware implementations of traditional modular inversion algorithms.
Whilst the present invention, in common with the Kaliski and Savas and Koç algorithms, performs Montgomery modular multiplication, it achieves a 33% reduction in the number of such multiplication operations compared with the prior art algorithms.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. In the drawings:
Reference will now be made in detail to the preferred embodiment of the present invention, example of which is illustrated in the accompanying drawings and the following tables:
Table 9 lists the results of the comparative analysis of the performance of the hardware implementations of the method according to the first aspect and the conventional Kaliski and Savas and Koç algorithms.
For the sake of brevity, the method of calculating a classical modular inverse and a Montgomery modular inverse of an integer in accordance with the invention will be known henceforth as the unified inversion algorithm. Accordingly, the following description will first describe the unified inversion algorithm and will provide evidence of its advantages by way of a hardware implementation.
A. Unified Inversion Algorithm
As previously mentioned, the unified inversion algorithm provides a single, efficient algorithm for computing both the classical modular inverse and the Montgomery modular inverse of an integer already in the Montgomery domain. The algorithm is a two-stage process in which the output from the first stage (in common with the output from the first stage of the Kaliski algorithms) is the integer satisfying k≦z≦2k. The unified inversion algorithm further assumes that R2=22k(mod p) and the inputs to the Montgomery modular multiplication function (MP( )) are k-bit integers.
For the sake of clarity, the following discussion will separately discuss the classical modular inverse calculation steps from those of the Montgomery modular inverse calculations. However, it will be realised that in actuality, these calculations are embraced within the same single algorithm and that the separation of these calculations in the following discussion is solely for the purpose of clarifying the description. Consequently, the following description should be in no way construed as meaning that there are two separate algorithms for calculating the modular inverses.
Referring to Table 5, it will be noted that in order to calculate the classical modular inverse (ModInv(a)) the pair of variables (a, 1) is input to the unified inversion algorithm. Similarly, in order to calculate the Montgomery modular inverse (MonInv(a)) of an integer already in the Montgomery domain, the pair (a2k (mod p), R2) are input to the unified inversion algorithm.
It will be recalled the Savas and Koç algorithm for calculating a Montgomery modular inverse required a maximum of three Montgomery multiplication operations. Referring to Table 6 it will be noted that the unified inversion algorithm is more efficient than the Savas and Koç algorithm for computing a Montgomery modular inverse, since the unified inversion algorithm requires at most only two Montgomery multiplication operations. Consequently, the unified inversion algorithm provides at least a 33% saving in the required number of Montgomery multiplication operations.
Furthermore, the unified inversion algorithm is particularly suited for hardware (and indeed software) implementations, since a single circuit architecture can be used to compute both types of modular inverse.
B. Field Programmable Gate Array (FPGA) Hardware Implementation of the Unified Inversion Algorithm
The following discussion will provide a broad overview of an example of a hardware implementation of the unified inversion algorithm. This will be followed with a more detailed description of a hardware implementation of a Montgomery multiplication component of the unified inversion algorithm. The description will finish with experimental results providing a comparative analysis of the performance of the hardware implementation of the unified inversion algorithm, with hardware implementations of the conventional Kaliski and Savas and Koç algorithms. For the sake of brevity, the hardware implementation of the unified inversion algorithm will be known henceforth as the unified inversion circuit.
The following discussion describes an example of a 256-bit hardware implementation of the unified inversion algorithm. It will be appreciated that the unified inversion algorithm is not limited to the specific details of the hardware implementation described below and that other hardware implementations of the algorithm are possible.
1. Overview
Referring to
Returning to the example depicted in
The resulting values Phase1(a) and z are then stored in a control unit 16. The value 22k−z is also calculated in the control unit 16. A comparison between z and k is also performed in the control unit 16 to determine the inputs to a 256-bit Montgomery multiplier 18. In particular, if z=k then only one modular multiplication is required. However, if z≠k two multiplication operations are required and the output variable r1 from the Montgomery multiplier 18 is fed back into the control unit 16 to be reused as an input to the Montgomery multiplier 18. Once the necessary modular multiplications have been completed, the variable a−1 is output 20 from the unified inversion circuit 5 at 32-bits per clock cycle over 8 cycles.
2. Montgomery Multiplier (18)
2(a) Overview
The steps performed in the Montgomery multiplier 18 are shown in Table 8. It will be noted that the Montgomery multiplication algorithm assumes that n is the k-bit modulus of integers A and B, r=2k, rr−1−nn′=1 and r−1r=1 (mod n). The main calculations performed in the Montgomery multiplication algorithm include three full-word multiplications, one full-word addition, and a conditional full-word subtraction. In practice, the full-word addition and subtraction operations are performed using fast carry chains and two's-complement addition.
2(b) Hardware Implementation
Referring to
Referring to
Returning to
The t-REG/UPDA TE REG/CONTROL component 32 stores the product t=A*B and the results of the other multiplication and addition operations, which are then fed back into the control unit 26 to be re-used as inputs to the 256×256-bit multiplier unit 28 or Addition/Subtraction component 30. The t-REG/UPDATE REG/CONTROL component 32 also performs the trivial mod and div operations. Once the conditional subtraction has been performed, the variable u is output from the output register 34 at a rate of 32-bits per cycle over a period of 8 clock cycles.
3. Comparative Performance Analysis
The performance of the unified inversion algorithm compared with the Kaliski and Savas and Koç algorithms was investigated by capturing the algorithms in VHDL and implementing the algorithms on a Xilinx Virtex2 Pro XC2VP125 FPGA (using a 256-bit operand length). The Montgomery multiplication calculations were performed using the algorithm described by Koç et al. (C. K. Koç, T. Acar and B. S. Kaliski, IEEE Micro, 16(3), 26-33).
Table 7 shows the results of experiments (obtained using Xilinx Foundation software v6.1.03i) comparing the performance of the above algorithms. Referring to Table 7, it can be seen that using the unified inversion algorithm instead of Kaliski's algorithms to calculate the classical inverse and the Montgomery modular inverse of an integer already in the Montgomery domain, results in a speed up of 17.8% and 27.1% respectively.
Furthermore, because a modular inverter circuit can be implemented using a single unified inversion circuit in place of the two separate circuits required to implement both of Kaliski's algorithms, an overall reduction of 18.8% in the number of slices used is achievable. This percentage is a relative measure calculated by computing the difference between the number of slices required to implement Kaliski's algorithms and the number of slices required by the unified inversion circuit; and then dividing the difference by the number of slices required to implement Kaliski's algorithms.
Whilst the unified inversion algorithm does not provide a significant speed-up if used instead of the Savas/Koc algorithms, nonetheless, the unified inversion circuit provides a 49.9% reduction in the silicon area usage compared with the Savas and Koç algorithms (using the same relative measurement as used when comparing against the Kaliski algorithms).
Similar speed-ups and savings in source code/silicon area are attainable if the unified inversion, Kaliski and Savas and Koç algorithms are implemented in-software or alternative hardware media, e.g. modern application specific integrated circuit (ASIC) devices, since the unified inversion algorithm has an inherently less complicated structure than the other algorithms.
Modifications and alterations may be made to the above without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
0412084.6 | May 2004 | GB | national |