Various exemplary embodiments disclosed herein relate to efficient modular multiplication modulo 223−213+1.
The new digital signature scheme Dilithium works with objects from the polynomial ring (/q
) [X]/(X256+1) where arithmetic with the coefficients is done modulo q=223−213+1.
A summary of various exemplary embodiments is presented below.
Various embodiments relate to a data processing system including instructions embodied in a non-transitory computer readable medium, the instructions for performing a modular multiplication of a first operand a and a second operand b in a Dilithium digital signature algorithm in a processor, the instructions, including: calculate S=c1·213−c1+c0 (mod q) wherein a·b1=c1·223+c0, 0≤a·b1<233, modulus q=223−213+1, and b=b1·213+b0; calculate T=d1−d0·210+d0 (mod q) wherein d=a·b0=d1·213+d0, 0≤d<236, 0≤d1<223, and 0≤d0<213; calculate c=S+T≡a·b·2−13 (mod q); and calculate a digital signature of a message using the calculated a·b.
Various embodiments are described, wherein the instructions further include: calculate c=c−q once or twice, when c is greater than or equal to q and until c is less than q; and calculate c=c+q, when c is less than 0.
Various embodiments are described, wherein the calculation of S and T are performed in parallel.
Further various embodiments relate to a data processing system including instructions embodied in a non-transitory computer readable medium, the instructions for performing a modular multiplication of a first operand a and a second operand b in a Dilithium digital signature algorithm in a processor, the instructions, including: split the second operand b into b0 and b1, wherein b1 is an upper 10 bits of the second operand b and b0 is the lower 13 bits of the second operand b; calculate C=a·b1; split C into c0 and c1, wherein c1 is the upper 10 bits of C and c0 is the lower 23 bits of C; calculate d=a·b0; split d into d0 and d1, wherein d1 is an upper 23 bits of d and d0 is the lower 13 bits of d; calculate a·b·2−13=((c1<<13)|d0)−((d0<<10)|c1)+c0+d1 (mod q), wherein << indicates a left bit shift of a specified number of bits and | is bitwise-or operation; and calculate a digital signature of a message using the calculated a·b.
Various embodiments are described, wherein the instructions further include: calculate c=c−q once or twice, when c is greater than or equal to q and until c is less than q; and calculate c=c+q, when c is less than 0.
Various embodiments are described, wherein the calculation of c and d are performed in parallel.
Further various embodiments relate to a method for performing a modular multiplication of a first operand a and a second operand b in a Dilithium digital signature algorithm, including: calculating S=c1·213−c1+c0 (mod q) wherein a·b1=c1·223+c0, 0≤a·b1<233, modulus q=223−213+1, and b=b1·213+b0; calculating T=d1−d0·210+d0 (mod q) wherein d=a·b0=d1·213+d0, 0≤d<236, 0≤d1<223, and 0≤d0<213; calculating c=S+T≡a·b·2−13 (mod q); and calculating a digital signature of a message using the calculated a·b.
Various embodiments are described, wherein the instructions further include: calculate c=c−q once or twice, when c is greater than or equal to q and until c is less than q; and calculate c=c+q, when c is less than 0.
Various embodiments are described, wherein the calculation of S and T are performed in parallel.
Further various embodiments relate to a method for performing a modular multiplication of a first operand a and a second operand b in a Dilithium digital signature algorithm, including: splitting the second operand b into b0 and b1, wherein b1 is an upper 10 bits of the second operand b and b0 is the lower 13 bits of the second operand b; calculating C=a·b1; splitting C into c0 and c1, wherein c1 is the upper 10 bits of C and c0 is the lower 23 bits of C; calculating d=a·b0; splitting d into d0 and d1, wherein d1 is an upper 23 bits of d and d0 is the lower 13 bits of d; calculating a·b·2−13=((c1<<13)|d0)−((d0<<10)|c1)+c0+d1 (mod q), wherein << indicates a left bit shift of a specified number of bits and | is bitwise-or operation; and calculating a digital signature of a message using the calculated a·b.
Various embodiments are described, wherein the instructions further include: calculating c=c−q once or twice, when c is greater than or equal to q and until c is less than q; and calculating c=c+q, when c is less than 0.
Various embodiments are described, wherein the calculation of c and d are performed in parallel.
The foregoing has outlined rather broadly the features and technical advantages of examples according to the disclosure in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter. The conception and specific examples disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Such equivalent constructions do not depart from the scope of the appended claims. Characteristics of the concepts disclosed herein, both their organization and method of operation, together with associated advantages will be better understood from the following description when considered in connection with the accompanying figures. Each of the figures is provided for the purposes of illustration and description, and not as a definition of the limits of the claims.
So that the above-recited features of the present disclosure can be understood in detail, a more particular description, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects. The same reference numbers in different drawings may identify the same or similar elements.
Various aspects of the disclosure are described more fully hereinafter with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to any specific structure or function presented throughout this disclosure. Rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Based on the teachings herein one skilled in the art should appreciate that the scope of the disclosure is intended to cover any aspect of the disclosure disclosed herein, whether implemented independently of or combined with any other aspect of the disclosure. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method which is practiced using other structure, functionality, or structure and functionality in addition to or other than the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.
Several aspects of lattice-based cryptographic systems will now be presented with reference to various apparatuses and techniques. These apparatuses and techniques will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, modules, components, circuits, steps, processes, algorithms, and/or the like (collectively referred to as “elements”). These elements may be implemented using hardware, software, or combinations thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.
It should be noted that while aspects may be described herein using terminology commonly associated with lattice-based cryptographic technologies, aspects of the present disclosure can be applied in other cryptographic systems that use modular multiplication.
The National Institute of Standards and Technology (NIST) post-quantum cryptographic standardization process selected Dilithium as a new digital signature standard. This digital signature scheme works with objects from the polynomial ring (/q
)[X]/(X256+1) where arithmetic with the coefficients is done modulo q=223−213+1. It will be shown how one can adjust the bipartite modular multiplication method, disclosed in Marcelo E. Kaihara and Naofumi Takagi, Bipartite modular multiplication, CHES 2005 (Josyula R. Rao and Berk Sunar, eds.), LNCS, vol. 3659, Springer, Heidelberg, August/September 2005, pp. 201-210.
It will be described how to achieve multiplication modulo the prime 223−213+1 that is a core operation in Dilithium. This approach is particularly useful for hardware implementations because its parts may be run in parallel and no multiplications are required in the reduction.
The new digital signature scheme Dilithium works with objects from the polynomial ring (/q
)[X]/(X256+1) where arithmetic with the coefficients is done modulo q=223−213+1.
Montgomery multiplication is a method for performing fast modular multiplication. Montgomery multiplication relies on a special representation of numbers denoted by the Montgomery form. Montgomery reduction is used in Montgomery multiplication. The main idea behind Montgomery reduction is to change the representatives of the residue classes and change the modular multiplication accordingly such that the expensive division needed in the reduction is replaced by a cheaper multiplication. Let q be an odd w-bit integer. More precisely, instead of computing the modular multiplication a·b mod q, the Montgomery multiplication computes MontMul(a, b)=a·b·2−w mod q. In order to use this modular multiplication method one needs to change the representation of the inputs. Given a w-bit modulus q, define the Montgomery form of an integer a to be ã=a·2w mod q. This change of residue class ensures that the multiplication of two inputs in Montgomery form corresponds to the desired result in Montgomery form because:
This change of representation is performed because computing the Montgomery multiplication may be done efficiently on modern computer architectures where multiplications and exact divisions by powers of two correspond to shifting the number to the left or right, respectively.
Montgomery multiplication uses the pre-computed value
Then, one can compute the Montgomery reduction of an integer c such that 0≤c<q2 using
After this division by 2w, the result has been reduced to at most 2q. This means a completely reduced result in [0, q] can be computed with an additional conditional subtraction.
The bipartite modular multiplication method of Kaihara is a method which enables the splitting of the multiplier into two parts: one using regular modular reduction and one using Montgomery reduction. These can be run in parallel.
Let q be an odd w-bit integer. Given two integers a, b∈q where 0≤a, b<q the idea is to split one of the operands b such that b=b1·2w/2+b0 where 0≤b0, b1<2w/2. Now compute
using a regular modular reduction algorithm. Concurrently, one could compute
using a Montgomery reduction approach. Finally combining the two gives
where instead of multiplying with 2−w one uses 2−w/2.
It will now be shown how to compute an efficient modular multiplication with the prime used in the Dilithium post-quantum secure digital signature scheme. The prime proposed for usage in Dilithium is q=223−213+1 and hence one has 223≡213−1 (mod q). While this form will not help a lot when a number up to q2 needs to be reduced to size q, it may help in the setting of the bipartite modular multiplication.
Let a, b∈q where 0≤a, b<q. Write b=b1·213+b0 where 0≤b0<213 and 0≤b1<210. It will be shown how to efficiently compute S=a·b1 mod q. First, compute the multiplication part a·b1=c1·223+c0 such that 0≤c<233, then:
When 0≤a, b<q then
Because 0≤c0<223 and 0≤c1≤210−2<210 the following is obtained:
Hence, at most one further subtraction with q is needed to reduce S between 0 and q.
Similarly it will be shown how to efficiently compute the other part required by a bipartite multiplication using Montgomery multiplication. Let a, b∈q where 0≤a, b<q. Write b=b1·213+b0 where 0≤b0<213 and 0≤b1<210.
Next, use Montgomery multiplication to compute T≡d·2−13 ≡a·b0·2−13 mod q First compute the multiplication part d=a·b0=d1·213+d0 such that 0≤d<236 and the bounds for di are 0≤d1<223 and 0≤d0<213. The Montgomery pre-computed constant is μ≡q−1 ≡1 mod 213, note the sign difference compared to equation (1). This means that the Montgomery reduction is adjusted to MontRed(d)=(d−(d·μ mod 213)·q)/213 (compare with equation.(2)), and a conditional addition is needed instead of a conditional subtraction. This may be done using:
When 0≤a, b, <q then 0≤d1<q and 0≤d0<213. This means that −213(210−1)<T<q, and at most one addition with q is needed to bring T to the range between 0 and q.
The only remaining step is to add S=c1·213−c1+c0 and T=d1−d0·210+d0 modulo q: this means c=S+T mod q. Because we know that d0<213 and c1<210 one can even further optimize this by computing:
where “<<” means a bit-shift to the left and “|” is the bitwise-or operation, a·b1=c1·223+c0 and a·b0=d1·213+d0. Given the previous size estimates it is known that −q<c<2q, and hence one conditional addition and two conditional subtractions are needed when the results need to be between 0 and q. When c<0, calculate c=c+q. When q≤c, calculate c=c+q once or twice until c<q. Hence, the modular reduction can be done completely without multiplications as described in equation 8.
The processor 120 may be any hardware device capable of executing instructions stored in memory 130 or storage 160 or otherwise processing data. As such, the processor may include a microprocessor, microcontroller, graphics processing unit (GPU), neural network processor, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), or other similar devices.
The memory 130 may include various memories such as, for example L1, L2, or L3 cache or system memory. As such, the memory 130 may include static random-access memory (SRAM), dynamic RAM (DRAM), flash memory, read only memory (ROM), or other similar memory devices.
The user interface 140 may include one or more devices for enabling communication with a user such as an administrator. For example, the user interface 140 may include a display, a touch interface, a mouse, and/or a keyboard for receiving user commands. In some embodiments, the user interface 140 may include a command line interface or graphical user interface that may be presented to a remote terminal via the network interface 150.
The network interface 150 may include one or more devices for enabling communication with other hardware devices. For example, the network interface 150 may include a network interface card (NIC) configured to communicate according to the Ethernet protocol or other communications protocols, including wireless protocols. Additionally, the network interface 150 may implement a TCP/IP stack for communication according to the TCP/IP protocols. Various alternative or additional hardware or configurations for the network interface 150 will be apparent.
The storage 160 may include one or more machine-readable storage media such as read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, or similar storage media. In various embodiments, the storage 160 may store instructions for execution by the processor 120 or data upon with the processor 120 may operate. For example, the storage 160 may store a base operating system 161 for controlling various basic operations of the hardware 100. The storage 160 may store instructions 162 for carrying out the efficient modular multiplication method described above.
It will be apparent that various information described as stored in the storage 160 may be additionally or alternatively stored in the memory 130. In this respect, the memory 130 may also be considered to constitute a “storage device” and the storage 160 may be considered a “memory.” Various other arrangements will be apparent. Further, the memory 130 and storage 160 may both be considered to be “non-transitory machine-readable media.” As used herein, the term “non-transitory” will be understood to exclude transitory signals but to include all forms of storage, including both volatile and non-volatile memories.
The system bus 110 allows communication between the processor 120, memory 130, user interface 140, storage 160, and network interface 150.
While the host device 100 is shown as including one of each described component, the various components may be duplicated in various embodiments. For example, the processor 120 may include multiple microprocessors that are configured to independently execute the methods described herein or are configured to perform steps or subroutines of the methods described herein such that the multiple processors cooperate to achieve the functionality described herein. Further, where the device 100 is implemented in a cloud computing system, the various hardware components may belong to separate physical systems. For example, the processor 120 may include a first processor in a first server and a second processor in a second server.
The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the aspects to the precise form disclosed. Modifications and variations may be made in light of the above disclosure or may be acquired from practice of the aspects.
As used herein, the term “component” is intended to be broadly construed as hardware, firmware, and/or a combination of hardware and software. As used herein, a processor is implemented in hardware, firmware, and/or a combination of hardware and software.
As used herein, satisfying a threshold may, depending on the context, refer to a value being greater than the threshold, greater than or equal to the threshold, less than the threshold, less than or equal to the threshold, equal to the threshold, not equal to the threshold, and/or the like. It will be apparent that systems and/or methods described herein may be implemented in different forms of hardware, firmware, and/or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the aspects. Thus, the operation and behavior of the systems and/or methods were described herein without reference to specific software code—it being understood that software and hardware can be designed to implement the systems and/or methods based, at least in part, on the description herein.
As used herein, the term “non-transitory machine-readable storage medium” will be understood to exclude a transitory propagation signal but to include all forms of volatile and non-volatile memory. When software is implemented on a processor, the combination of software and processor becomes a specific dedicated machine.
Because the data processing implementing the embodiments described herein is, for the most part, composed of electronic components and circuits known to those skilled in the art, circuit details will not be explained in any greater extent than that considered necessary as illustrated above, for the understanding and appreciation of the underlying concepts of the aspects described herein and in order not to obfuscate or distract from the teachings of the aspects described herein.
Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative hardware embodying the principles of the aspects.
While each of the embodiments are described above in terms of their structural arrangements, it should be appreciated that the aspects also cover the associated methods of using the embodiments described above.
Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of various aspects. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of various aspects includes each dependent claim in combination with every other claim in the claim set. A phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Furthermore, as used herein, the terms “set” and “group” are intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and/or the like), and may be used interchangeably with “one or more.” Where only one item is intended, the phrase “only one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” and/or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.