The present disclosure relates to electronic circuit systems, and more particularly, to high performance systems and methods for modular multiplication.
Decentralized blockchains have become common across many applications systems. One method for making decentralized blockchains more resistant to hacking involves using verifiable delay functions (VDFs). VDFs are complex functions that take a large quantity of operations to compute, but that use a relatively small number of operations to verify. The operations computed by VDFs typically cannot be parallelized.
One example of a variable delay function (VDF) is a modular exponentiation of an input that uses repeated modulo squaring or multiplication as a core function.
Modular exponentiation, especially for very large integers of hundreds or thousands of bits, is a commonly used function in cryptography. If very large word sizes are used, the algorithm for modular exponentiation is complex and may require many operations or a large amount of logic circuits. In most applications, the calculations for performing the algorithm for modular exponentiation requires multiple clock cycles to complete. However, blockchain algorithms have recently required very low-latency implementations of modular multiplications.
The definition of a generic modular multiplication is provided in equation (1) below, where A, B, and N are input numbers, and the output Q equals the result of a modulo (mod) operation (i.e., the reminder of the division of A times B divided by N). Equation (2) below shows the modular exponentiation, which is comprised of multiple modular multiplications, and where R equals the result of a modulo (mod) operation (i.e., the remainder of the division of CE divided by N).
Q=AB mod N (1)
R=C
E mod N (2)
The modular exponentiation right-to-left algorithm used for computing the expression in equation (2) iterates over the bits of the exponent E from right (least significative bit) to left (most significative bit). When the scanned bit for E is ‘1’, two modular multiplications are performed, as opposed to a single modular multiplication when the scanned bit for E is ‘0’. The base C in equation (2) is squared on the first iteration, and the output of each modular multiplication is then squared on each successive modular multiplication.
Multiplier circuits are expensive, in both circuit area and performance time. Field programmable gate array (FPGA) integrated circuits have many embedded multiplier circuits. Many VDF functions require multiplying very large numbers, e.g., having 1000 bits. The multiplier circuits in an FPGA typically have far fewer than 1000 inputs. Therefore, one multiplier circuit in an FPGA cannot by itself multiply a multiplicand or multiplier having 1000 bits. A large multiplier circuit can be constructed by assembling many smaller multipliers. For example, a 1000 bit multiplier may use well over 1000 digital signal processing (DSP) blocks in an FPGA. Many previously known multiplication algorithms require considerable amounts of pre-processing and post-processing in the form of additions and subtractions that add many layers of calculations to the multiplication, which greatly increases latency.
Modular multiplication can also be performed by applying modular reduction directly to a single multiplication. According to this technique, the multiplication is never completed, because the modular reduction operates on sums of groups of many smaller multipliers that comprise a larger multiplier. In this technique, the direct assembly of a large multiplier from smaller multiplier components is implemented by a polynomial multiplication method. The input numbers A and B in equation (1) above are (d+1)w-bit wide unsigned integers, and are initially viewed as (d+1) radix R=2w digits. A and B are defined below in equations (3) and (4), respectively, according to the polynomial multiplication method.
From the radix-R digit notation, the polynomial notation (x=2w) can be expressed for A(x) and B(x) as shown in equations (5) and (6), respectively, below. In equations (5) and (6), ai and bi are the coefficients of the polynomials that correspond to the radix-R digits from the original representation shown in equations (3) and (4). Equations (7) and (8) are degree-4 (i.e., d=4) polynomials for A and B, respectively.
The product P of two degree-d polynomials A and B is a degree 2d polynomial. The partial products ai bj are 2w-bit wide values that can be expressed in terms of two w-bit values as shown in equation (9) below.
a
i
b
j
=P
ij=(PijH×2w)+PijL (9)
Figure (FIG.) 1 is a diagram that illustrates examples of calculations that can be performed according to equations (3)-(9) to generate partial products for a degree 4 (i.e., d=4) polynomial multiplication. The degree-4 (i.e., d=4) polynomials for A and B from equations (7)-(8) are shown in the upper portion of
Given that x=2w, the partial product alignments are such that the high partial product Pi,jH overlaps with the low partial product Pk,lL, where k+l=i+j+1, and the low partial product Pi,jL overlaps with Pm,nH, where m+n=i+j−1.
The second part of the modular multiplication involves reducing P mod N according to equation (12) below, where P is the multiplication output in polynomial form as shown in equation (11).
M=P mod N (12)
Equations (13) and (14) below show two identities used for computing M from the individual C coefficients from equation (11) for a given modulus value N.
α+β mod N≡(α mod N)+(β mod N) (13)
≡((α mod N)+β)mod N (14)
The polynomial P can be split into two parts as shown in equation (15) below. This split is then used to apply the identity in equation (14).
The first part of equation (15) corresponding to the a term in equation (14) is composed of (d+1) radix 2w+1 digits. For each of these digits from the first part of equation (15), the reduced value mod N can be computed by tabulation, as shown in equation (16) below.
M
i
=C
i
x
i mod N,i∈[d+1,2d+1] (16)
Additionally, each Mi can be viewed as a degree-d polynomial, with coefficients Mi,j, radix 2w digits. Therefore, equation (12) can be rewritten using equations (15) and (16) as shown in equation (17) below.
If w is chosen to match the sizes of the multiplier in some circuit architectures, then obtaining Mi by simple table lookup would involve addressing tables with the w+1 bits of Ci. In many cases, the resulting tables would be too large to practically implement. An alternative to further decomposing each column of Ci using lookup tables is disclosed herein with respect to
Equation (16) can be rewritten as shown below in equation (18).
M
i
=C
i(xi mod N),i∈[d+1,2d+1] (18)
Thus, Mi can be calculated as a multiplication of two numbers. Number Ci in equation (18) is an output of the system of
The reduction of the upper coefficients C5-C8 shown in
The DSP rows 201-204 can include logic circuits (i.e., multiplier circuits) that implement the multipliers. Each of the 4 DSP rows 201-204 can include multiple multiplier circuits that each generate a partial product of the system of
According to an exemplary implementation of the system of
The bit positions of the sets of bits 211-214, 221-224, 231-234, and 241-244 for each product are based on the corresponding bit positions in the respective 4 sets of bits representing xi mod N. For example, the most significant bits (MSBs) 211, 221, 231, and 241 of each product correspond to the MSBs of the corresponding value of xi mod N, and the least significant bits (LSBs) 214, 224, 234, and 244 of each product correspond to the LSBs of the corresponding value of xi mod N. Each of the 16 sets of bits 211-214, 221-224, 231-234, and 241-244 is vertically aligned with the coefficients C0, C1, C2, C3, and C4 as shown by the vertical dashed lines in
The system of
Algorithm 1 is provided below that describes the computation of the constants (xi mod N) used by the DSP-based multipliers in the system of
1. Input: Modulus N
2. Output: Precomputed DSP coefficients Coef[L][dN]
3. for i from dN+2 to dC do
4. T(x)=inttopoly(2w*i (mod N))
5. for j from 0 to dN do
6. Coef[i−(dN+2)][j]=Tj
7. end for
8. end for
Algorithm 2 provided below describes the modular reduction operation mechanism of the system of
As with the system of
As shown in
The multiplier circuits 351-354 in DSP rows 301-304 then perform the multiplications described above with respect to the system of
A modular exponentiation system can be constructed using the modular multiplication disclosed herein with respect to
In the system of
The values of C3, C4, and C5 are provided to inputs of lookup tables 411, 412, and 413, respectively. Lookup tables (LUTs) 411, 412, and 413 output the values of Ci×xi mod N as segments of bits 421-423, 424-426, and 427-429 in response to the values of C3, C4, and C5, respectively. For example, LUT 411 outputs the value of C3×x3 mod N as 3 segments of bits 421, 422, and 423. LUT 412 outputs the value of C4×x4 mod N as 3 segments of bits 424, 425, and 426. LUT 413 outputs the value of C5×x5 mod N as 3 segments of bits 427, 428, and 429.
The system of
The upper half 450 then performs a multiplicative expansion of (A0, A1, A2)2 to produce 9 partial products 401-409, as shown in
In response to the start signal being de-asserted, the input multiplexers 431, 432, and 433 provide the values of B0, B1, and B2 to modular reduction portion 460 as C0, C1, and C2, respectively. The values of B3, B4, and B5 are provided to the modular reduction portion 460 as C3, C4, and C5, respectively. Subsequently, lookup tables 411, 412, and 413 provide the values of Ci×xi mod N as segments of bits 421-423, 424-426, and 427-429 in response to the values of C3, C4, and C5, respectively, as described above. The values in the 3 columns are then summed together by adders to generate the values of D0, D1, and D2, as described above. The values of D0, D1, and D2 are output as O0, O1, and O2, respectively. The output values O0, O1, and O2 are successively squared by upper half 450 and portion 460 until the modular exponentiation is computed.
According to another example of modular multiplication, the multiplicative expansion is optimized to take advantage of the squaring operation in the modular exponentiation to reduce the number of adder tree addends.
In the system of
The segments of bits that equal the partial products of (x0, x1, x2, x3, and x4)2 are arranged in 11 columns that are delineated in
Some of the partial product values generated by squaring (x0, x1, x2, x3, and x4)2 have the same values. For example, x0×x1=x1×x0. Therefore, the value represented by the three segments of bits p01L, p01M, and p01H equals the value represented by the three segments of bits p10L, p10M, and p10H, respectively. As another example, x1×x2=x2×x1. Therefore, value represented by the three segments of bits p12L, p12M, and p12H equals the value represented by the three segments of bits p21L, p21M, and p21H, respectively.
Instead of calculating the partial products that have the same values twice, the system of
For example, the system of
The system of
In the example of
As with the system of
The DSP blocks 701-709 include multipliers (e.g., multiplier circuits) that perform polynomial multiplication of the A0, A1, and A2 segments of bits to generate partial products of the squaring operation (A0, A1, and A2)2, as described above with respect to
Although the rank order of DSP blocks 701-709 is shown in a column format in
According to other examples, the delay of the blocks used to implement modular multiplication can be reduced by subdividing the w-wide digits into k bytes (where w=wl+ . . . +wk) and composing k adder trees per digit (i.e., one adder tree per byte). In these examples, wi must be large enough so that wi≥log d, where d is the maximum adder tree depth. In an exemplary implementation that is not intended to be limiting, k=3, and wi=9. According to one exemplary architecture, sliced adder trees are used in the upper half of the multiplicative expansion (e.g., upper half 450 or 700). According to another exemplary architecture, sliced adder trees are used in both the upper half of the multiplicative expansion and in the modular reduction portion of the modular multiplication (e.g., portion 460 or the portions shown in
According to further examples, the DSP row based modular reduction structure shown in
In addition, the programmable integrated circuit 800 may have input/output elements (IOEs) 802 for driving signals off of programmable integrated circuit 800 and for receiving signals from other devices. Input/output elements 802 may include parallel input/output circuitry, serial data transceiver circuitry, differential receiver and transmitter circuitry, or other circuitry used to connect one integrated circuit to another integrated circuit. As shown, input/output elements 802 may be located around the periphery of the IC. If desired, the programmable integrated circuit 800 may have input/output elements 802 arranged in different ways. For example, input/output elements 802 may form one or more columns of input/output elements that may be located anywhere on the programmable integrated circuit 800 (e.g., distributed evenly across the width of the programmable integrated circuit). If desired, input/output elements 802 may form one or more rows of input/output elements (e.g., distributed across the height of the programmable integrated circuit). Alternatively, input/output elements 802 may form islands of input/output elements that may be distributed over the surface of the programmable integrated circuit 800 or clustered in selected areas.
The programmable integrated circuit 800 may also include programmable interconnect circuitry in the form of vertical routing channels 840 (i.e., interconnects formed along a vertical axis of programmable integrated circuit 800) and horizontal routing channels 850 (i.e., interconnects formed along a horizontal axis of programmable integrated circuit 800), each routing channel including at least one track to route at least one wire.
Note that other routing topologies, besides the topology of the interconnect circuitry depicted in
Furthermore, it should be understood that examples disclosed herein may be implemented in any type of integrated circuit. If desired, the functional blocks of such an integrated circuit may be arranged in more levels or layers in which multiple functional blocks are interconnected to form still larger blocks. Other device arrangements may use functional blocks that are not arranged in rows and columns.
Programmable integrated circuit 800 may contain programmable memory elements. Memory elements may be loaded with configuration data (also called programming data) using input/output elements (IOEs) 802. Once loaded, the memory elements each provide a corresponding static control signal that controls the operation of an associated functional block (e.g., LABs 810, DSP 820, RAM 830, or input/output elements 802).
In a typical scenario, the outputs of the loaded memory elements are applied to the gates of field-effect transistors in a functional block to turn certain transistors on or off and thereby configure the logic in the functional block including the routing paths. Programmable logic circuit elements that may be controlled in this way include parts of multiplexers (e.g., multiplexers used for forming routing paths in interconnect circuits), look-up tables, logic arrays, AND, OR, NAND, and NOR logic gates, pass gates, etc.
The memory elements may use any suitable volatile and/or non-volatile memory structures such as random-access-memory (RAM) cells, fuses, antifuses, programmable read-only-memory memory cells, mask-programmed and laser-programmed structures, combinations of these structures, etc. Because the memory elements are loaded with configuration data during programming, the memory elements are sometimes referred to as configuration memory or programmable memory elements.
The programmable memory elements may be organized in a configuration memory array consisting of rows and columns. A data register that spans across all columns and an address register that spans across all rows may receive configuration data. The configuration data may be shifted onto the data register. When the appropriate address register is asserted, the data register writes the configuration data to the configuration memory elements of the row that was designated by the address register.
Programmable integrated circuit 800 may include configuration memory that is organized in sectors, whereby a sector may include the configuration RAM bits that specify the function and/or interconnections of the subcomponents and wires in or crossing that sector. Each sector may include separate data and address registers.
In general, software and data for performing any of the functions disclosed herein may be stored in non-transitory computer readable storage media. Non-transitory computer readable storage media is tangible computer readable storage media that stores data for later access, as opposed to media that only transmits propagating electrical signals (e.g., wires). The software code may sometimes be referred to as software, data, program instructions, instructions, or code. The non-transitory computer readable storage media may, for example, include computer memory chips, non-volatile memory such as non-volatile random-access memory (NVRAM), one or more hard drives (e.g., magnetic drives or solid state drives), one or more removable flash drives or other removable media, compact discs (CDs), digital versatile discs (DVDs), Blu-ray discs (BDs), other optical media, and floppy diskettes, tapes, or any other suitable memory or storage device(s).
One or more specific examples are described herein. In an effort to provide a concise description of these examples, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
Additional examples are now described. Example 1 is a circuit system for performing modular reduction of a modular multiplication, the circuit system comprising: multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication, wherein the multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products; and first adder circuits that add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.
In Example 2, the circuit system of Example 1 further comprises: second adder circuits, wherein each of the second adder circuits adds together two sets of bits that are each generated by summing portions of at least two of the partial products of the multiplication operation to generate one of the coefficients in the first subset.
In Example 3, the circuit system of any one of Examples 1-2 may optionally include, wherein the multiplier circuits are in digital signal processing blocks in an integrated circuit.
In Example 4, the circuit system of any one of Examples 1-3 may optionally include, wherein the multiplier circuits are arranged in subsets, and wherein the multiplier circuits in each of the subsets multiply one of the coefficients in the first subset by one of the constants that equals a remainder of one of the divisions to generate one of the products.
In Example 5, the circuit system of any one of Examples 1-4 may optionally include, wherein each of the multiplier circuits multiplies one of the coefficients in the first subset by a subset of bits that represent one of the constants to generate one of the segments of bits representing one of the products.
In Example 6, the circuit system of Example 5 may optionally include, wherein the first adder circuits add each of the coefficients in the second subset to portions of the segments of bits representing each one of the products to generate one of the sums.
In Example 7, the circuit system of any one of Examples 1-6 may optionally include, wherein the first adder circuits generate each of the sums by adding together one of the coefficients in the second subset and a subset of the segments of bits representing each of the products generated by the multiplier circuits.
Example 8 is a circuit system comprising: first logic circuitry for performing multiplicative expansion for modular multiplication to generate sums of partial products; second logic circuitry for performing modular reduction of the modular multiplication to generate output values; and multiplexers for providing input values to the second logic circuitry during a first iteration of the modular reduction, wherein the output values of the modular reduction are provided to the first logic circuitry for performing the multiplicative expansion to generate the sums of the partial products, and wherein the multiplexers provide at least a subset of the sums of the partial products to the second logic circuitry during a second iteration of the modular reduction.
In Example 9, the circuit system of Example 8 may optionally include, wherein the second logic circuitry provides the input values as the output values during the first iteration of the modular reduction.
In Example 10, the circuit system of any one of Examples 8-9 may optionally include, wherein the second logic circuitry comprises lookup tables that generate constant values in response to receiving a first subset of the sums of the partial products during the second iteration, and wherein the second logic circuitry adds the constant values provided from the lookup tables to a second subset of the sums of the partial products received through the multiplexers to generate the output values during the second iteration.
In Example 11, the circuit system of any one of Examples 8-10 may optionally include, wherein the first logic circuitry squares a number represented by the output values to generate the sums of the partial products during the second iteration.
In Example 12, the circuit system of any one of Examples 8-11 may optionally include, wherein the multiplexers select between the input values and at least the subset of the sums of the partial products in response to a start signal.
In Example 13, the circuit system of Example 10 may optionally include, wherein each of the lookup tables outputs one of the constant values as multiple segments of bits, and wherein the second logic circuitry adds one of the segments of bits for each of the constant values to one of the sums of the partial products in the second subset to generate each of the output values.
Example 14 is a circuit system comprising: first logic circuitry for performing multiplicative expansion for modular exponentiation of an input number represented as first segments of bits to generate partial products represented as second segments of bits, wherein the first logic circuitry generates the partial products by multiplying together the first segments of bits, wherein the first logic circuitry causes at least one of the partial products to equal twice a product of a first one of the first segments of bits multiplied by a second one of the first segments of bits; and second logic circuitry for adding together groups of the second segments of bits representing the partial products to generate sums.
In Example 15, the circuit system of Example 14 may optionally include, wherein the first logic circuitry generates each of the partial products as at least two of the second segments of bits, and wherein the second logic circuitry adds the at least two of the second segments of bits for each of the partial products in different ones of the groups to generate the sums.
In Example 16, the circuit system of any one of Examples 14-15 further comprises: third logic circuitry for bit shifting each of the partial products in a subset of the partial products to generate a doubled partial product that equals twice one of the partial products in the subset.
In Example 17, the circuit system of any one of Examples 14-16 further comprises: third logic circuitry for multiplying a first subset of the sums by constants that equal remainders of divisions to generate products and to add a second subset of the sums and third segments of bits representing the products that are aligned with respective ones of the second subset of the sums.
Example 18 is a circuit system comprising: multiplier circuits for performing a squaring operation for modular exponentiation of an input number represented as first segments of bits to generate partial products represented as second segments of bits, wherein each of the multiplier circuits generates one of the second segments of bits by multiplying at least one of the first segments of bits; first storage circuits for storing subsets of the first segments of bits provided as inputs to a first subset of the multiplier circuits that are outside critical paths in the modular exponentiation; and second storage circuits for storing subsets of the second segments of bits generated by a second subset of the multiplier circuits that are in the critical paths of the modular exponentiation.
In Example 19, the circuit system of Example 18 further comprises: adder circuits for adding together groups of the second segments of bits generated by the multiplier circuits to generate sums, wherein each of the groups of the second segments of bits are added together based on an alignment determined by which of the first segments of bits are multiplied to generate the second segments of bits in each of the groups.
In Example 20, the circuit system of any one of Examples 18-19 may optionally include, wherein the multiplier circuits are in digital signal processing blocks in a programmable logic integrated circuit.
In Example 21, the circuit system of any one of Examples 18-20 may optionally include, wherein each of the first storage circuits and each of the second storage circuits is a sequential circuit responsive to a clock signal.
In Example 22, the circuit system of any one of Examples 18-21 may optionally include, wherein an embedded function has either the first storage circuits or the second storage circuits enabled, depending on where a logical depth of the embedded function is located in the circuit system.
Example 23 is a method for performing modular reduction of a modular multiplication, the method comprises: receiving a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication; multiplying the coefficients in the first subset by constants that equal remainders of divisions using multiplier circuits to generate products; and adding a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients using first adder circuits to generate sums.
In Example 24, the method of Example 23 further comprises: adding together sets of bits that are each generated by summing portions of a subset of the partial products of the multiplication operation using second adder circuits to generate the coefficients in the first subset.
In Example 25, the method of any one of Examples 23-24 may optionally include, wherein multiplying the coefficients in the first subset by the constants further comprises multiplying each of the coefficients in the first subset by one of the constants that equals a remainder of one of the divisions to generate one of the products.
In Example 26, the method of any one of Examples 23-25 may optionally include, wherein multiplying the coefficients in the first subset by the constants further comprises multiplying one of the coefficients in the first subset by a subset of bits that represent one of the constants to generate one of the segments of bits representing a portion of one of the products.
In Example 27, the method of Example 26 may optionally include, wherein adding the second subset of the coefficients and the segments of bits of the products further comprises adding each of the coefficients in the second subset to unique subsets of the segments of bits representing each one of the products to generate one of the sums.
Example 28 is a method for modular multiplication comprising: providing input values to first logic circuitry during a first iteration using multiplexer circuits; providing output values of the first logic circuitry to second logic circuitry; performing multiplicative expansion for the modular multiplication to generate sums of partial products using the second logic circuitry; providing at least a subset of the sums of the partial products to the first logic circuitry using the multiplexer circuits during a second iteration; and performing modular reduction of the modular multiplication using the sums of the partial products to generate the output values using the first logic circuitry.
In Example 29, the method of Example 28 further comprises: providing the input values as the output values using the first logic circuitry during the first iteration.
In Example 30, the method of any one of Examples 28-29 may optionally include, wherein performing the modular reduction to generate the output values further comprises generating constant values from lookup tables in the first logic circuitry in response to receiving a first subset of the sums of the partial products during the second iteration; and adding the constant values provided from the lookup tables to a second subset of the sums of the partial products received through the multiplexer circuits to generate the output values during the second iteration using adder circuits in the first logic circuitry.
In Example 31, the method of Example 30 may optionally include, wherein generating the constant values from the lookup tables comprises generating each of the constant values as multiple segments of bits; and adding one of the segments of bits for each of the constant values to one of the sums of the partial products in the second subset to generate one of the output values.
In Example 32, the method of any one of Examples 28-31 may optionally include, wherein performing the multiplicative expansion for the modular multiplication further comprises squaring a number represented by the output values to generate the sums of the partial products during the second iteration.
In Example 33, the method of any one of Examples 28-32 may optionally include, wherein providing the input values to the first logic circuitry during the first iteration comprises selecting between the input values and at least the subset of the sums of the partial products in response to a start signal using the multiplexer circuits.
Example 34 is a method comprising: performing multiplicative expansion for modular exponentiation of an input number represented as first segments of bits by multiplying together the first segments of bits to generate partial products represented as second segments of bits using first logic circuitry; causing at least one of the partial products to equal twice a product of a first one of the first segments of bits multiplied by a second one of the first segments of bits using the first logic circuitry; and adding together groups of the second segments of bits representing the partial products to generate sums using second logic circuitry.
In Example 35, the method of Example 34 may optionally include, wherein performing the multiplicative expansion for the modular exponentiation comprises generating each of the partial products as at least two of the second segments of bits, and wherein each of the at least two of the second segments of bits is grouped into separate ones of the groups.
In Example 36, the method of Example 35 may optionally include, wherein adding together the groups of the second segments of bits comprises adding together the second segments of bits in each of the groups to generate one of the sums.
In Example 37, the method of any one of Examples 34-36 may optionally include, wherein causing at least one of the partial products to equal twice the product of the first one of the first segments of bits multiplied by the second one of the first segments of bits comprises bit shifting the at least one of the partial products to generate a doubled partial product that equals twice the product of the first one of the first segments of bits multiplied by the second one of the first segments of bits.
In Example 38, the method of any one of Examples 34-36, may optionally include, wherein causing at least one of the partial products to equal twice the product of the first one of the first segments of bits multiplied by the second one of the first segments of bits comprises bit shifting each of the partial products in a subset of the partial products to generate a doubled partial product that equals twice one of the partial products in the subset.
Example 39 is a method comprising: performing a squaring operation for modular exponentiation of an input number represented as first segments of bits to generate partial products represented as second segments of bits using multiplier circuits; storing subsets of the first segments of bits provided as inputs to a first subset of the multiplier circuits that are outside critical paths in the modular exponentiation in first storage circuits; and storing subsets of the second segments of bits generated by a second subset of the multiplier circuits that are in the critical paths of the modular exponentiation in second storage circuits.
In Example 40, the method of Example 39 further comprises: adding groups of the second segments of bits using adder circuits to generate sums by adding each of the groups of the second segments of bits based on an alignment determined by which of the first segments of bits are multiplied to generate the second segments of bits in each of the groups.
In Example 41, the method of any one of Examples 39-40 may optionally include, wherein the multiplier circuits are in digital signal processing blocks in a programmable logic integrated circuit.
In Example 42, the method of any one of Examples 39-41 may optionally include, wherein each of the first storage circuits and each of the second storage circuits is a sequential circuit responsive to a clock signal.
The foregoing description of the examples has been presented for the purpose of illustration. The foregoing description is not intended to be exhaustive or to be limiting to the examples disclosed herein. In some instances, features of the examples can be employed without a corresponding use of other features as set forth. Many modifications, substitutions, and variations are possible in light of the above teachings.
This patent application claims the benefit of U.S. provisional patent application No. 63/287,896, filed Dec. 9, 2021, which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63287896 | Dec 2021 | US |