A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This invention relates to increasing the efficiency of performing modular exponentiation operations which, for example, are integral to cryptographic key operations.
With the prevalence of public computer networks used to transmit confidential data for personal, business, and governmental purposes, many computer users need cryptographic systems to control access to their data.
Cryptographic systems are commonly used to restrict unauthorized access to messages communicated over otherwise insecure channels. In general, cryptographic systems use a unique key, such as a series of numbers, to control an algorithm used to encrypt a message before it is transmitted over an insecure communication channel to a receiver. With a private key cryptographic system, both the sender and receiver must have access to the same key in order to encode and decode encrypted messages. The key can be exchanged in advance over a secure channel. However, secure communication of the key is hampered by the unavailability and expense of secure communication channels. Moreover, the need to communicate the key in advance impedes the spontaneity of business communications.
Overcoming the difficulty and inconvenience of communicating the key over a secure channel, a public key cryptographic system permits a key to be communicated over an insecure channel without jeopardizing security. This system utilizes a pair of keys in which one is publicly communicated, i.e., a public key, and the other is kept secret by a receiver, i.e., a private key. While the private key is mathematically related to the public key, it is extraordinarily difficult to derive the private key from the public key alone. Using this system, a sender uses the public key to encrypt a message, and a receiver uses the private key to decrypt the message. This procedure has the added benefit of permitting the publication and dissemination of the public key, allowing any number of senders to communicate in a secure manner with the holder of the private key.
Such cryptographic systems require computation of modular exponentiations of the form:
C=Me mod n and
M=Cd mod n
in which exponent e and modulus n are large numbers, e.g., having a length of 1024, 2048, or 4096 binary digits or bits.
However, modular exponentiation calculations of this magnitude are a daunting task even to an authorized receiver using a high speed computer. The difficulty of modular exponentiation calculations drains computer resources and degrades data throughput rates, and thus represents a major impediment to the widespread adoption of commercial cryptographic systems.
Techniques have been developed to reduce this task to a more manageable, although still computationally intensive, undertaking For example, modular exponentiation is often implemented in hardware. One hardware technique, of interest in this patent application, is termed multiplication by shifting or binary multiplication.
The algorithm may be stated as follows:
Yet even with the method of binary multiplication, solving a modular exponentiation problem is still computer intensive. Accordingly, a critical need exists for a high speed modular exponentiation method and apparatus to provide a sufficient level of communication security while minimizing the demand for computer system resources, including data throughput, CPU size, and electric power. This application focuses on increasing the efficiency of binary multiplication. Where speed is paramount, up to requiring the employment of all available resources, this invention is compatible with and complementary to other schemes for more rapidly executing public key cryptographic system calculations.
To calculate the equation y=be mod n, integral to solving cryptographic problems, much computing power is required despite elegant algorithms that greatly reduce numbers of calculations involved. Operations needed to compute this equation include shifting bits, comparing values, subtracting, and adding. This invention provides an improvement over prior calculation methods by pinpointing places where the number of required computing cycles can be reduced.
One embodiment of this invention involves reversing the order of accessing “rows” and “columns” of memory registers or locations. Instead of fetching one row at a time of a named set of registers (e.g., a row of temporary registers) in sequence, a row of dissimilar registers (e.g., a row containing one temporary register, a multiplier register, and a multiplicand register) is fetched.
The details of the present invention, both as to its structure and operation, and many of the attendant advantages of this invention, can best be understood in reference to the following detailed description, when taken in conjunction with the accompanying drawings, in which like reference numerals refer to like parts throughout the various views unless otherwise specified, and in which:
a,
8
b,
8
c,
8
d,
8
e,
8
f, and 8g are detailed flowcharts showing the inventive method described herein.
In the following examples, “n” refers to the product of two, or more, distinct prime numbers. The value “e” is a public key exponent and “d” is a private key exponent. “M” is a message sent from a sender to a receiver and “C” is computed ciphertext.
During the encryption stage, the controller 310 receives data input 315 of clear message M, data input 320 of exponent e, and data input 325 of modulus n and performs the following equation (A) to generate data output 355 of encrypted message C:
C≡M
e mod n. (A)
During the decryption stage, the controller 310 receives data input 315 of encrypted message C, data input 320 of exponent d, and data input 325 of modulus n, and performs the following equation (B) to generate decrypted output data 355 of clear message M:
M≡Cd mod n. (B)
In one embodiment, the exponentiator state machine 335 controls operations of the modulus multiplier 350 to perform modulus exponentiation functions efficiently. Depending on the inputs received from the CPU 305, the exponentiator state machine 335 commands the modulus multiplier 350 to perform encryption, decryption, or authentication using memory registers or other types of memory (such as RAM or Flash memory). In another embodiment, a general purpose CPU performs the functions of an exponentiator state machine and modulus multiplier using memory registers or other types of memory.
A major task associated with public key calculations is resolving the equations (A) and (B) in an efficient manner in terms of resources and time required. In one embodiment, memory 340 on the controller 310 is configured to reduce the number of cycles required to perform the equations (A) and (B). Alternately, the functions of the controller may be executed by a CPU with a portion of general purpose memory or register memory likewise configured. In either case, the structure of the memory used during performance of the calculation of equations (A) and (B) plays an integral role in terms of the speed and resources required.
The techniques of “exponentiation by squaring” and “binary multiplication,” when used in conjunction, convert the task of exponentiation into more simple register shift and addition routines. To complete the modulus multiplication procedure, required for public key calculations, comparison and subtraction routines are employed.
To illustrate the concept, an example exponent (multiplicand) is 1024 bits long.
Operations such as addition, subtraction and comparison are performed at a sub-block level. For example, to add the value of multiplication register represented by B 406 with the value of temporary register, represented by E 412, the exponentiator state machine 335, or computer, fetches the value B1 and fetches the value E1, using two different fetch cycles, one for row B and one for row E, and then performs an addition operation. The resultant carry value is then added to values of B2 and E2, and written to temporary register 412. Then two additional fetch cycles are used to fetch B2 and E2 to perform the next addition operation. The process is repeated along the row to the last values B8 and E8.
In total, the addition of B to E requires at least 16 cycles (one each for B1 to B8 and one each for E1 to E8) just to fetch data from B and E. In traditional systems, when operations such as add, subtract, and compare are performed, each sub-block is addressed separately, increasing the number of cycles required and thus adding latency to the process.
Designing memory to reduce resources as well as time required to perform calculations associated with computing equations (A) and (B) improves the efficiency of public key calculations. Shown in
A memory block 340b, configured in accordance with the present invention and shown in
The mcreg 514 is a modular multiplier register which stores the initial multiplicand input (denoted as A in
Addressing a row sub-block in
After performing an addition operation, the resultant value of carry can be added to the corresponding values of B2 and E2. Thus, the addition of B and E using the
Including addressing 502, adder/subtractor circuitry 504, and comparator circuitry 503 also increases the speed of calculation.
Equations (A) and (B) are solved by performing the following three arithmetic operations:
In the arithmetic operations 1 and 3 involving subtraction, it is efficient to perform the comparison and subtraction in parallel. In
a,
8
b,
8
c,
8
d,
8
e,
8
f, and 8g illustrate the operation of the controller 310 (or a computer system) to compute the equations (A) and (B). On receiving power, the controller 310 can be programmed to operate in the idle state (step 802). Exponentiator state machine 335 verifies if the data inputs 315, 320 and 325 are received from the CPU 305 on predetermined time intervals. If all the inputs are not received, the controller 310 returns to the idle state (step 804). On the other hand, if all the inputs from the CPU 305 are received, multiplicand A, multiplier B, and modulus n are loaded into appropriate registers (step 806). The data, exponent, and modulus are divided into j blocks of k bit lengths, and i is initialized to zero (step 808).
Exponentiator state machine 335 commands the modulus multiplier 350 to fetch k bits of data (i.e., multiplicand, multiplier, and modulus) and initialize square operation (step 810). The square operation is performed after receiving the inputs (step 812). The method of performing the square and multiply operations (square and multiply operations are performed using the same circuitry as they involve multiplying of two values) are explained in detail in
If either square or multiply process is initiated, the modulus multiplier 350 determines if the value of the multiplier (mpreg 518) is zero (step 828). If the value of the multiplier (mpreg 518) is zero, the modulus multiplier 350 proceeds to step 814. If the value of multiplier (mpreg 518) is not equal to zero, the modulus multiplier divides the data into p segments each x bits long and initializes q to zero (step 832). Modulus multiplier 350 fetches x bits of data and performs arithmetic operation 1 (step 834). The modulus multiplier 350 performs both comparison and subtraction operations of the values stored in mcreg 514 and modreg 516 in parallel (steps 836 and step 840). If the value of the modulus is greater than the multiplicand then the subtraction is skipped (step 844) and the multiplicand value is not updated (step 846). If the value of the modulus is not greater than the value of the multiplicand, the subtraction is completed and the value is saved in tempreg 512 (step 838) and the multiplicand value (mcreg 514) is updated to the value stored in tempreg 512 (step 842).
Once the multiplicand value is updated, the LSB of the multiplier is verified (step 848). If the LSB of the multiplier is not equal to ‘1’ then the multiplier is right shifted (step 850) and the value of q is incremented by 1 (step 868). If the LSB of the multiplier is equal to ‘1’ then the multiplier is right shifted (step 852) and the value of the multiplicand is added to the value of the product register 510 and the value of the product register 510 is updated with resulting sum (step 854).
Modulus multiplier 350, after performing arithmetic operation 2 in step 854, performs both comparison and subtraction operations of the values of product register 510 and modulus register 516 in parallel (step 856 and step 860). If the value of the modulus is greater than the product, then the subtraction is skipped (step 864) and the product value (prodreg 510) is not updated (step 866). If the value of the modulus is not greater than the value of the product, the subtraction is completed and the value is saved in the tempreg 512 (step 858) and the product value (prodreg 510) is updated to the value stored in the tempreg 512 (step 862).
After the new value of the product is determined, the value of q is incremented by 1 (step 868). The value of q is compared with value of p and if they are equal, the modulus multiplier 350 returns to step 834 (step 870). Otherwise, the value of i is incremented by 1 (step 872). The value of i is compared with the value of j and if they are equal, the modulus multiplier 350 proceeds to step 802 and if they are not equal, the modulus multiplier 350 returns to step 810 (step 874).
The method of this invention is further illuminated by reference to the following pseudocode:
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and that the breadth and scope of the invention should not be limited by any of the above-described exemplary embodiments, but should instead be defined only in accordance with the following claims and their equivalents. While the particular SYSTEM AND METHOD FOR MOD-EXPONENTIATOR as herein shown and described in detail is fully capable of attaining the above-described objects of the invention, it is to be understood that it is the presently preferred embodiment of the present invention and is thus representative of the subject matter which is broadly contemplated by the present invention, that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular means “at least one”. All structural and functional equivalents to the elements of the above-described preferred embodiment that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present invention for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims.
This application claims priority to and the benefit of provisional patent application U.S. Ser. No. 61/102,107, filed Oct. 2, 2008, hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61102107 | Oct 2008 | US |