MASKED KRONECKER SUBSTITUTION FOR POLYNOMIAL MULTIPLICATION

Description

FIELD OF THE DISCLOSURE

Various exemplary embodiments disclosed herein relate to masked Kronecker substitution for sparse polynomial multiplication.

BACKGROUND

Recent significant advances in quantum computing have accelerated the research into post-quantum cryptography schemes: cryptographic algorithms which run on classical computers but are believed to be still secure even when faced against an adversary with access to a quantum computer. This demand is driven by interest from standardization bodies such as the call for proposals for new public-key cryptography standards by the National Institute of Standards and Technology (NIST). The selection procedure for this new cryptographic standard has started and has further accelerated the research of post-quantum cryptography schemes.

In July 2022 NIST announced its first selection of winners: CRYSTALS-Kyber in the category of key encapsulation mechanisms (KEMs), and CRYSTALS-Dilithium as the primary winner in the digital signature (DS) category. Besides those two algorithms, Falcon and SPHINCS+ have been selected as alternative digital signature schemes, while various KEMs (mostly code-based proposals) have been maintained in the competition as part of a 4th round from which further winners might be selected.

SUMMARY

A summary of various exemplary embodiments is presented below.

Various embodiments relate to a data processing system including instructions embodied in a non-transitory computer readable medium, the instructions for a cryptographic operation using polynomials for lattice-based cryptography in a processor, the instructions, including: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial; applying a Kronecker substitution to a second polynomial; multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output; converting the shares of the first output to arithmetic shares of a polynomial representation; converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation; arithmetically adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; and carrying out a cryptographic operation using the Boolean shares of the second output.

Various embodiments are described, wherein the instructions further include: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.

Various embodiments are described, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:

$A, 2^{k^{'}} = SecB 2 A_{k^{'}}^{d} (B, ⌈ \log_{2} (2 η + 1) ⌉), where A, 2^{k^{'}}$

are arithmetic shares of the first polynomial, d is a number of shares, SecB2A_k′^dis a secure Boolean to arithmetic shares conversion function, custom-character ^B,┌log²^(2η+1)┐ are Boolean shares of the first polynomial, η defines a range [−η, η] of coefficients of the first polynomial ŝ₁, and k′ is the arithmetic modulus.

Various embodiments are described, wherein the Kronecker substitution is a Kronecker plus substitution.

Various embodiments are described, wherein multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial includes calculating

$R^{A, 2^{k^{'}}} = C \cdot S_{1}^{A, 2^{k^{'}}}, where R^{A, 2^{k^{'}}} are the S_{1}^{A, 2^{k^{'}}}$

arithmetic shares of a first output, C is a Kronecker representation of the second polynomial, and

are the arithmetic shares of a Kronecker representation of the first polynomial.

where {circumflex over (R)}^B,log²^(4γ¹⁾are Boolean shares of the second output, SecAdd_log₂_(4γ₁₎^dis a secure add function, ŷ^B,log²^(2γ¹⁾are the Boolean shares of a third polynomial, {circumflex over (R)}^B,log²^(2γ¹⁾are the Boolean shares of the polynomial representation, and γ1 defines a range [−γ₁−1,γ₁] of the coefficients of the third polynomial ŷ.

Further various embodiments relate to a data processing system including instructions embodied in a non-transitory computer readable medium, the instructions for a cryptographic operation using polynomials for lattice-based cryptography in a processor, the instructions, including: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial; applying a Kronecker substitution to a second polynomial; multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output; converting the shares of the first output to arithmetic shares of a polynomial representation; converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation; subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; and carrying out a cryptographic operation using the Boolean shares of the second output.

Various embodiments are described, wherein the instructions further include: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.

Various embodiments are described, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:

$A, 2^{k^{'}} = SecB 2 A_{k^{'}}^{d} (B, ⌈ \log_{2} (2 η + 1) ⌉), where A, 2^{k^{'}}$

are arithmetic shares of the first polynomial, d is a number of shares, SecB2A_k′^dis a secure Boolean to arithmetic shares conversion function, custom-character ^B,┌log²^(2η+1)┐ are Boolean shares of the first polynomial, η defines a range [−η,η] of coefficients of the first polynomial ŝ₂, and k′ is the arithmetic modulus.

Various embodiments are described, wherein the Kronecker substitution is a Kronecker plus substitution.

$R^{A, 2^{k^{'}}} = C \cdot S_{2}^{A, {2^{k}}^{'}}, where R^{A, 2^{k^{'}}} are the$

$S_{2}^{A, {2^{k}}^{'}}$

arithmetic shares of a first output, C is a Kronecker representation of the second polynomial, and

are the arithmetic shares of a Kronecker representation of the first polynomial.

Various embodiments are described, wherein subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}^B,log²^(4γ²⁾←SecSub_log₂_(4γ₂₎^d( custom-character ^B,log²^(2γ²⁾,{circumflex over (R)}^B,log²^(2γ²⁾), where {circumflex over (R)}^B,log²^(4γ²⁾are the Boolean shares of the second output, SecAdd_log₂_(4γ₁₎^d) is a secure add function, ^B,log²^(2γ²⁾are the Boolean shares of a third polynomial, {circumflex over (R)}^B,log²^(2γ²⁾are the Boolean shares of the polynomial representation, and γ₂defines a range (−γ₂,γ₂) of the coefficients of the third polynomial custom-character .

Further various embodiments relate to a method for a cryptographic operation using polynomials for lattice-based cryptography, including: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial; applying a Kronecker substitution to a second polynomial; multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output; converting the shares of the first output to arithmetic shares of a polynomial representation; converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation; arithmetically adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; and carrying out a cryptographic operation using the Boolean shares of the second output.

Various embodiments are described, wherein the instructions further include: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.

Various embodiments are described, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:

$A, 2^{k^{'}} = Sec B 2 A_{k^{'}}^{d} (B, ⌈ \log_{2} (2 η + 1) ⌉), where A, 2^{k^{'}}$

are arithmetic shares of the first polynomial, d is a number of shares, SecB2A_k′^dis a secure Boolean to arithmetic shares conversion function, custom-character ^B,┌log²^(2η+1)┐ are Boolean shares of the first polynomial, n defines a range [−η,η] of coefficients of the first polynomial ŝ₁, and k′ is the arithmetic modulus.

Various embodiments are described, wherein the Kronecker substitution is a Kronecker plus substitution.

$R^{A, 2^{k^{'}}} = C \cdot S_{1}^{A, {2^{k}}^{'}}, where R^{A, 2^{k^{'}}} are the$

$S_{1}^{A, {2^{k}}^{'}}$

arithmetic shares of a first output, C is a Kronecker representation of the second polynomial, and

are the arithmetic shares of a Kronecker representation of the first polynomial.

Various embodiments are described, wherein adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}^B,log²^(4γ¹⁾=SecAdd_log₂_(4γ₁₎^d(ŷ^B,log²^(2γ¹⁾,{circumflex over (R)}^B,log²^(2γ¹⁾), where {circumflex over (R)}^B,log²^(4γ¹⁾are Boolean shares of the second output, SecAdd_log₂_(4γ₁₎^dis a secure add function, ŷ^B,log²^(2γ¹⁾are the Boolean shares of a third polynomial, {circumflex over (R)}^B,log²^(2γ¹⁾are the Boolean shares of the polynomial representation, and γ₁defines a range [−γ₁−1,γ₁] of the coefficients of the third polynomial ŷ.

Further various embodiments relate to a method for a cryptographic operation using polynomials for lattice-based cryptography in a processor, the instructions, including: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial; applying a Kronecker substitution to a second polynomial; multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output; converting the shares of the first output to arithmetic shares of a polynomial representation; converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation; subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; and carrying out a cryptographic operation using the Boolean shares of the second output.

Various embodiments are described, wherein the instructions further include: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.

Various embodiments are described, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:

$A, 2^{k^{'}} = Sec B 2 A_{k^{'}}^{d} (B, ⌈ \log_{2} (2 η + 1) ⌉), where A, 2^{k^{'}}$

are arithmetic shares of the first polynomial, d is a number of shares, SecB2A_k′^dis a secure Boolean to arithmetic shares conversion function, custom-character ^B,┌log²^(2η+1)are Boolean shares of the first polynomial, η defines a range [−η,η] of coefficients of the first polynomial ŝ₂, and k′ is the arithmetic modulus.

Various embodiments are described, wherein the Kronecker substitution is a Kronecker plus substitution.

$R^{A, 2^{k^{'}}} = C \cdot S_{2}^{A, {2^{k}}^{'}}, where R^{A, 2^{k^{'}}}$

are the arithmetic shares of a first output. C is a Kronecker representation of the second polynomial, and

$S_{2}^{A, {2^{k}}^{'}}$

are the arithmetic shares of a Kronecker representation of the first polynomial.

Various embodiments are described, wherein subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}^B,log²^(4γ²⁾←SecSub_log₂_(4γ₂₎^d( custom-character ^B,log²^(2γ²⁾,{circumflex over (R)}^B,log²^(2γ²⁾), where {circumflex over (R)}^B,log²^(2γ²⁾are the Boolean shares of the second output, SecAdd_log₂_(4γ₁₎₎^dis a secure add function, ^B,log²^(2γ²⁾are the Boolean shares of a third polynomial, {circumflex over (R)}^B,log²^(2γ²⁾are the Boolean shares of the polynomial representation, and γ₂defines a range (−γ₂,γ₂) of the coefficients of the third polynomial custom-character .

The foregoing has outlined rather broadly the features and technical advantages of examples according to the disclosure in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter. The conception and specific examples disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Such equivalent constructions do not depart from the scope of the appended claims. Characteristics of the concepts disclosed herein, both their organization and method of operation, together with associated advantages will be better understood from the following description when considered in connection with the accompanying figures. Each of the figures is provided for the purposes of illustration and description, and not as a definition of the limits of the claims.

BRIEF DESCRIPTION OF DRAWINGS

So that the above-recited features of the present disclosure can be understood in detail, a more particular description, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects. The same reference numbers in different drawings may identify the same or similar elements.

FIG. 1 illustrates a flow diagram of a method to calculate y+cs₁.

FIG. 2 illustrates an exemplary hardware diagram 200 for implementing cryptographic operation such as y+cs₁or w₀−cs₂.

DETAILED DESCRIPTION

Various aspects of the disclosure are described more fully hereinafter with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to any specific structure or function presented throughout this disclosure. Rather, these aspects are provided so that this disclosure will be thorough and complete and will fully convey the scope of the disclosure to those skilled in the art. Based on the teachings herein one skilled in the art should appreciate that the scope of the disclosure is intended to cover any aspect of the disclosure disclosed herein, whether implemented independently of or combined with any other aspect of the disclosure. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method which is practiced using other structure, functionality, or structure and functionality in addition to or other than the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.

Several aspects of cryptographic methods and systems will now be presented with reference to various apparatuses and techniques. These apparatuses and techniques will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, modules, components, circuits, steps, processes, algorithms, and/or the like (collectively referred to as “elements”). These elements may be implemented using hardware, software, or combinations thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.

Efficient implementation of the future NIST standards for asymmetric post-quantum cryptography is an important challenge that has to be addressed in order to enable large scale deployment of the standardized algorithms. In this context, the implementations must be simultaneously fast, memory efficient and protected against physical (i.e., side-channel, fault) attacks. In this disclosure, a method is described to compute y+cs₁or w₀−cs₂, a crucial part of the Dilithium signature generation algorithm, such that they are hardened against side-channel attacks yet remain efficient. In particular, platforms are targeted where arithmetic co-processors for large integer multiplication are available. The appropriate masking domain is selected for each of the operations, reducing the overhead for masking conversions and to get the most use out of the existing hardware. It will be shown how this can be done in a way that minimizes the memory requirements on the implementation, which is often a bottleneck for (hardened) embedded implementations.

The lattice-based mathematical problems that underlie Dilithium (and Kyber) are significantly different from the classical (elliptic-curve) discrete logarithm problem. However, the construction from a hard problem into a signature scheme follows a similar approach: starting from an interactive sigma protocol, the Fiat-Shamir paradigm can be applied to obtain a non-interactive authentication protocol that can be used for digital signatures. More concretely, the Dilithium protocol starts by generating an ephemeral secret y, which is used to generate a commitment w₁=HighBits (w,2γ₂) from w=Ay by “rounding away” the remainder modulo 2γ₂, where A is the lattice corresponding to the signer's key pair and γ₂is a parameter that is chosen as part of the Dilithium security level. The commitment is hashed together with the message digest u to obtain the challenge c=H(μ∥w₁). Finally, a response z=y+cs₁is generated where s₁is part of the secret key. It is crucial that no information is leaked about the ephemeral secret y or the long-term secret s₁. This is very similar to the Schnorr sigma and digital signature scheme based on elliptic curves (e.g., EdDSA), where this is a modular multiplication and addition. However, in the case of Dilithium y, c and s₁are all polynomials in custom-character [X]/(X²⁵⁶+1). Similarly, the value w₀−cs₂is computed where no information is to be leaked about the ephemeral secret value w₀and the long-term secret value s₂.

This disclosure describes a method to compute y+cs₁or w₀−cs₂such that they are hardened against side-channel attacks yet remain efficient. In particular, platforms are targeted where arithmetic co-processors for large integer multiplication are available. Such co-processors are typically used for RSA and/or ECC, so are present in many systems today. Moreover, the memory footprint of s₁is minimized as well as the number of operations on the secrets in unmasked form. This is done by selecting the appropriate masking domain for each of the operations, reducing the overhead for masking conversions, and being able to get the most use out of the existing hardware. FIG. 1 illustrates a flow diagram of a method to calculate y+cs₁at a high level. A completely analogous sequence of steps can be described for w₀−cs₂. Typical choices for k′ and L=2^l′ could be k′=8 or k′=9 and l′=16. To begin, the method 100 loads value s₁is into memory 102. Then the method 100 Boolean masks s₁at step 104. Then the Boolean masked s₁is converted to arithmetic masking at step 106. The method 100 then applies the Kronecker substitution on s₁at step 108. The method 100 also loads c into memory at step 110. Then the method 100 applies the Kronecker substitution on c at step 112. Next, the method 100 performs the integer multiplication of the Kronecker representations of s₁and c at step 114. Then the method retrieves c·s₁from the result of the integer multiplication at step 116. The value c·s₁is then converted to Boolean masking at step 118. The method 100 loads y into memory at step 120. Then the method computes y+c·s₁in the Boolean domain 122.

Masking allows for the protection of an intermediate variable x against side-channel attack by enforcing an implementation to replace manipulations on x by manipulations on d shares. There, each share is uniformly distributed such that any combination of d−1 shares is independent of x. The embodiments described herein makes use of two ways to split the sensitive variable, namely arithmetic masking and Boolean masking.

With arithmetic masking, a variable x∈ custom-character _pis protected for an arbitrary modulus p. The ensemble of d shares of x is denoted as the arithmetic sharing x^A,p∈_p^d. The i-th share is denoted as x_i^A,p∈_pfor all 0≤i<d. The relation between the shares and x are given such that the sum of all the shares over _pis x. Specifically,

$x = \sum_{i = 0}^{d - 1} x_{i}^{A, p} \mod p .$

Eventually, it is noted that computing in a protected manner z=x+y mod p with a public constant y∈Z_p, a sharing x^A,pand an output sharing z^A,pcan simply be computed. Indeed, the addition with y can be applied only to a single share in x^A,Pbecause

$z = \sum_{i = 0}^{d - 1} z_{i}^{A, p} = x_{0}^{A, p} + y + \sum_{i = 1}^{d - 1} x_{i}^{A, p} = y + \sum_{i = 0}^{d - 1} x_{i}^{A, p} = y + x .$

Similar to arithmetic masking, Boolean masking enables to protect a k-bit variable x. The ensemble of the d shares of x is denoted as the Boolean sharing x^B,k, and the i-th share is denoted as x_i^B,k. The sharing of the j-th bit of x is denoted as x^B,k[j]. The relation between x and its shares is given as:

$x = \oplus_{i = 0}^{d - 1} x_{i}^{B, k},$

where ⊕ denotes a bitwise exclusive OR.

In this disclosure, masked polynomials are used for which all the coefficients are either masked with Boolean masking or arithmetic masking. Polynomials are denoted with a hat, such as ĉ. As a result, a polynomial masked with arithmetic masking is denoted as ĉ^A,p, and similarly for Boolean masking it is ĉ^B,k. Polynomial multiplication is denoted with ∘. Unless mentioned otherwise, when an algorithm takes as input a polynomial, it is applied coefficient-wise.

In the embodiments described herein, both types of masking are leveraged. Hence, masking conversion algorithms are required. The first one enables for the conversion from arithmetic masking with p modulus to Boolean masking and is denoted as SecA2BModp_p^d. The second conversion algorithm enables the conversion from Boolean sharing to an arithmetic sharing. This algorithm is next denoted as SecB2AModp_p^d. When p=2^k, these algorithms are denoted such as SecA2B_k^dand SecB2A_k^drespectively. These power of two variants generally offer better performances than the variant for arbitrary p.

The embodiments described herein require performing additions between variables for which each bit is protected with Boolean masking. The embodiments described herein are independent of the specific implementation of these modules. The main building block is the secure full adder SecFullAdder. It takes as input three bits and returns two bits representing their addition. An addition on k bits, denoted as SecAdd_k^d, may be built by chaining such SecFullAdder's. For a concrete instantiation of these algorithms see Olivier Bronchain and Gactan Cassiers, Bitslicing arithmetic/boolean masking conversions for fun and profit with application to lattice-based kems, IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022 (2022), no. 4, 553-588, which is hereby incorporated for all purposes as if included herein. Similarly, an arithmetic subtraction can be performed by chaining together SecFullAdder's combined with a negation in two's complement. The details for this can be found in U.S. patent application Ser. No. 18/320,028 filed May 18, 2023, entitled “MASKED INFINITY NORM CHECK FOR CRYSTALS-DILITHIUM SIGNATURE GENERATION,” which is hereby incorporated for all purposes as if included herein.

In 1882, Kronecker introduced a method to reduce computational problems related to multivariate polynomials to univariate polynomials (see L. Kronecker, Grundzüge einer arithmetischenTtheorie der algebraischen Grösen, Journal für die reine und angewandte Mathematik 92 (1882), 1-122). A hundred years later, a similar technique was introduced by Schönhage to reduce polynomial multiplications in custom-character [X] to integer multiplication (multiplications in ) (see Arnold Schönhage, Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients, Computer Algebra (Jacques Calmet, ed.), Springer Berlin Heidelberg, 1982, pp. 3-15). This approach is known as the Kronecker substitution method.

Given two polynomials f,g∈ custom-character [X] of degree (up to) N−1∈, our goal is to compute the polynomial multiplication h=f·g. The idea is to evaluate the polynomials at a sufficient high power of two (e.g., f(2^l′) and g(2^l′)) and use the resulting integers as input for a regular integer multiplication by computing h(2^l′)=f(2^l′)·g(2^l′). The polynomial evaluation at 2^l′ is denoted by Kron(f,2^l′), as shown in Algorithm 1. Finally, the resulting integer h(2^l′) is converted back to its polynomial representation h. The result is correct if the coefficients of the resulting polynomial did not “mix” with each other, i.e., if the parameter l′∈ custom-character is sufficiently large. More precisely, if the coefficients of f and g are positive then l′ should be chosen so that 2^l′ is larger than the largest coefficient of f·g. If the coefficients of f and g are signed, then 2^l′ should be larger than 2·∥f·g∥_∞+1.

Algorithm 1 - Kron(ŝ, L)

Input: Polynomial ŝ = Σ_i=0ⁿ⁻¹s_i· xⁱand public integer L, n.

Output: The integer polynomial evaluation S = s(L).

1: return Σ_i=0ⁿ⁻¹s_i· Lⁱ

custom-character

Share-wise polynomial evaluation

The main advantage of this approach, computing a polynomial multiplication with an integer multiplication, is that well-studied and fast implementations of asymptotic integer multiplication methods can be used. Fast integer arithmetic is typically available in existing hardware designed for ECC and/or RSA.

Polynomial evaluation is a linear operation, so with an arithmetic sharing modulo 2^k′ with d shares on an input polynomial a masked version of Kron(f,2^l′) can be applied straightforwardly. This routine called SecKron_k′^d(x,2^l′) is summarized in Algorithm 2.

Algorithm 2 - {SecKron}_{k^{'}}^{d} ({\hat{s}}^{A, 2^{k^{'}}}, L)

Input : Arithmetically shared secret key polynomial {\hat{s}}^{A, 2^{k^{'}}} = ({\hat{s}}_{0}^{A, 2^{k^{'}}}, \dots, {\hat{s}}_{d - 1}^{A, 2^{k^{'}}}) according to

the Dilithium specification, and public integers K = 2^k^′, L, d.

Output : The evaluation of all shares of {\hat{s}}^{A, 2^{k^{'}}} at Kronecker evaluation point L .

1 : return (Kron ({\hat{s}}_{0}^{A, 2^{k^{'}}}, L), \dots, Kron ({\hat{s}}_{d - 1}^{A, 2^{k^{'}}}, L))

Share-wise polynomial evaluation

The Kronecker substitution method may be generalized to a more efficient algorithm called KroneckerPlus (or Kronecker+). In U.S. patent application Ser. No. 16/884,136 filed May 27, 2020, entitled “METHOD FOR MULTIPLYING POLYNOMIALS FOR A CRYPTOGRAPHIC OPERATION,” which is hereby incorporated for all purposes as if included herein, the observation is made that ζ=X^2n/tis a principal t-th root of unity in the ring custom-character [X]/(Xⁿ+1). Hence, the n-bit multiplication may be reduced through Kronecker to t multiplications of n/t bits each. Using the notation from above, the cost goes from M(·n)+O(·n) to t·M(·n/t)+O(·n). This is done by evaluating f and g at ζⁱ· for i=0, 1, . . . t−1 as opposed to only custom-character and multiplying the respective factors modulo X^n/t+1. More concretely, the integers

$h (ζ^{i} 2^{ℓ / t}) = f (ζ^{i} 2^{ℓ / t}) \cdot g (ζ^{i} 2^{ℓ / t}) \mod (2^{ℓ n / t} + 1), 0 \leq i \leq t - 1,$

are computed, and it is noted that

$h^{(i)} (2^{ℓ}) \equiv \frac{\sum_{j = 0}^{t - 1} ζ^{i (t - j)} h (ζ^{j} \cdot 2^{ℓ_{t}})}{2^{i ℓ_{t}} \cdot t} \mod (2^{ℓ_{t} n} + 1), where h^{(i)} (2^{ℓ}) = \sum_{j = 0}^{n / t - 1} h_{2 tj + i} 2^{j ℓ} .$

To recover h, the appropriate custom-character -bit limbs can be read off from the h⁽ⁱ⁾.

In what follows, regular Kronecker substitution is used in the descriptions. However, at any point the Kronecker substitution may be replaced by KroneckerPlus to possibly obtain more efficient instantiations.

The embodiments described herein disclose methods for computing ŷ+ĉŝ₁or ŵ₀−ĉŝ₂in a secure fashion, where:

- ŷ is a polynomial with coefficients in [−γ₁−1,γ₁] (of exactly 2γ₁bits);
- ŵ₀is a polynomial with coefficients in (−γ₂,γ₂);
- ĉ is a polynomial with exactly τ non-zero coefficients, which are either 1 or −1; and
- ŝ₁, ŝ₂are polynomials with coefficients in [−η,η].
  
  Here γ₁, γ₂, τ and η are constants defined by the parameter set of Dilithium, which are summarized in Table 1.

TABLE 1

NIST Security Level

2
3
5

q
8380417
8380417
8380417

τ
39
49
60

γ₁
217
2 19
219

γ₂
95232
261888
261888

η
2
4
2

β
78
196
120

Because c is a public challenge, no protection on it is required. All the other polynomials ŷ, ŵ₀, ŝ₁and ŝ₂are sensitive and require masking. The ephemeral secret value ŷ is generated as the output of the extendable Output Function (XOF) SHAKE-256, hence is generated in Boolean masked form. Similarly, ŵ₀is the output of a decomposition where it will typically be masked with Boolean shares. Hence, in this disclosure it will be assumed that the inputs are ŷ^B,log²^(2γ¹⁾and custom-character ^B,┌log²^(2γ²^)┐.

The long-term secrets ŝ₁and ŝ₂also require masking but might not be stored in masked form. This is because memory for storage of keys is often limited, while masking at least doubles the required memory (in case of 2 shares) and can be unnecessary if encryption is applied. Even worse, for arithmetic masking the size of the representation is more than doubled because the bit length of each share is greater than the bit length of the unmasked value (even up to 3 times). Moreover, because the representation defined by the Dilithium specification tightly packs the bits of the long-term secrets, several operations are required before an arithmetic mask can be generated. For example, secret keys for Dilithium2 include polynomial coefficients in [−2,2] that are serialized into 3 bit wide sequences, altogether using 3·256=768 bits per polynomial. To apply arithmetic masking modulo 2⁸each of the 3 bits has to be extracted using a combination of Boolean operations (e.g., shifts, bitwise ANDs/ORs) and padding needs to be added. This is a significant amount of computation on unmasked data and could therefore leak information about the secrets. To avoid this, a Boolean mask is first generated for ŝ₁and ŝ₂after loading them from long-term storage and then converted (or parts thereof) to arithmetic masks modulo K=2^k′. Because only the polynomials that are needed for the computation one at a time, a lot of memory may be saved by having most polynomials stored with the smaller Boolean shares. Here K should be chosen such that no reductions occur during the computation of ê·ŝ₁, because the (unreduced) integer values of the product are needed. This means that K≥2·β+1, where β is as described in Table 1. More concretely, k′≥8 for Dilithium2 and Dilithium5, and k′≥9 for Dilithium3. It is noted that the more obvious choice to mask ŝ_iwould be modulo q, because all arithmetic is performed modulo q anyway. However, the 23-bit q is much larger than the 8- or 9-bit K (requiring almost 3× as much memory to store the secrets), while mask conversions modulo primes are also more expensive compared to the analogous operations modulo a power of 2. Therefore, the choice for K=2^k′ is computationally advantageous and much more memory efficient.

Because c is unmasked and the ŝ_iare masked, the multiplication

$c \cdot A, 2^{k^{'}}$

is performed share-wise. Particularly interesting is the structure of ê, which is a very sparse polynomial. This makes it less suitable for multiplications using NTTs, which are not able to be used on this structure. On the other hand, Kronecker substitution may be instantiated very efficiently because the coefficients of ĉ· custom-character are small. This is especially true in the presence of hardware for integer multiplications. Each share of

$A, 2^{k^{'}}$

is a polynomial with (positive) coefficients in [0,K−1], and hence the coefficients of the product lie in [−τ·(K−1), τ·(K−1)]. Therefore, the Kronecker evaluation point is selected as L=2^l′≥2·τ·(K−1)+1. More concretely, l′≥15 for Dilithium2 and Dilithium5 and l′≥16 for Dilithium3.

Alternatively, one may avoid working with the signed representation of ĉ by also reducing it modulo K to have its non-zero coefficients be either 1 or K−1. In that case the coefficients of the product will be in [−τ·(K−1)²,τ·(K−1)²], so L≥2·τ·(K−1)²+1 may be chosen. More concretely, l′≥23 for Dilithium2 and Dilithium5 and l′≥25 for Dilithium3.

After finalizing the share-wise multiplication using Kronecker substitution, the result

${\hat{R}}^{A, 2^{k^{'}}} = c \cdot A, 2^{k^{'}}$

is arithmetically masked modulo K=2^k′, while ŷ and ŵ₀are Boolean masked. To perform the final addition or subtraction, the most obvious choice would be to convert ŷ and ŵ₀to arithmetic shares to perform the arithmetic addition/subtraction most easily. However, this would still require a mask conversion on

${\hat{R}}^{A, 2^{k^{'}}}$

because the mask length of k′ bits is not sufficient for ŷ and ŵ₀. Moreover, the subsequent operation performed on ŷ+ĉŝ₁and ŵ₀−ĉŝ₂will be to check their infinity norms, which is most easily done in Boolean masked form. Therefore,

${\hat{R}}^{A, 2^{k^{'}}}$

may be converted to a Boolean sharing instead and the arithmetic addition/subtraction is performed using a SecAdd. Because the bit-length of

${\hat{R}}^{A, 2^{k^{'}}}$

is shorter than ŷ and ŵ₀, this requires padding. The final length is at most 1 bit larger than the size of ŷ or ŵ₀, respectively.

The final algorithm for ŷ+ĉŝ₁is denoted SecChallengeMADD and summarized in Algorithm 3. The final algorithm for ŵ₀−ĉŝ₂is denoted SecChallengeMSUB and summarized in Algorithm 4. The application of SecKron may be replaced by an analogous instantiation of KroneckerPlus for selected integer t. This increases the complexity of the algorithm but may lead to faster implementations.

Algorithm 3 - SecChallengMADD_k′^d(ŷ^B,log₂^(2γ₁⁾, c, custom-character

, L)

Input: Secret key polynomial custom-character

, (public) challenge polynomial ĉ, Boolean masked

ephemeral secret polynomial ŷ^B,log₂^(2γ₁⁾according to the Dilithium specification, and public

integers K = 2^k^′, L, d such that K ≥ 2 · β + 1 and L ≥ 2 · τ · (K − 1) + 1.

Output: Boolean masked ŷ + c custom-character

in log₂(4γ₁) bits.

1 : B, ⌈ \underset{2}{\log} (2 η + 1) ⌉ = {Mask}_{⌈ \log_{2} (2 η + 1) ⌉}^{d} ()

Apply Boolean mask

2 : A, 2^{k^{'}} = SecB 2 A_{k^{'}}^{d} (B, ⌈ \log_{2} (2 η + 1) ⌉)

Convert

to arithmetic mask

3 : S_{1}^{A, 2^{k^{'}}} = {SecKron}_{k^{'}}^{d} (A, 2^{k^{'}}, L)

Apply share-wise Kronecker substitution

4: C = Kron(ĉ, L)

custom-character

Apply Kronecker substitution

5 : R^{A, 2^{k^{'}}} = C \cdot S_{1}^{A, 2^{k^{'}}}

Apply share-wise integer multiplication

6 : {\hat{R}}^{A, 2^{k^{'}}} = R^{A, 2^{k^{'}}}

Convert to polynomial representation

7 : {\hat{R}}^{B, \log_{2} (2 γ_{1})} = SecA 2 B_{k^{'}}^{d} ({\hat{R}}^{A, 2^{k^{'}}})

Convert to Boolean mask (pad to same size as ŷ)

8: {circumflex over (R)}^B,log₂^(4γ₁⁾= SecAdd_log₂_(4γ₁₎^d(ŷ^B,log₂^(2γ₁⁾, {circumflex over (R)}^B,log₂^(2γ₁⁾

custom-character

Add ŷ share-wise to ĉ custom-character

9: return {circumflex over (R)}^B,log₂^(4γ₁⁾

Algorithm 4 - SecChallengMSUB_k′^d( custom-character

^B,log₂^(2γ₂⁾, c, custom-character

, L)

Input: Secret key polynomial custom-character

, (public) challenge polynomial ĉ, Boolean masked

ephemeral secret polynomial custom-character

^B,log₂^(2γ₂⁾according to the Dilithium specification, and

public integers K = 2^k^′, L, d such that K ≥ 2 · β + 1 and L ≥ 2 · τ · (K − 1) + 1.

Output: Boolean masked custom-character

− c

in log₂(4γ₂) bits.

1: custom-character

^B,┌log₂^(2η+1)┐ = Mask_┌log₂_(2η+1)┐^d( custom-character

)

Apply Boolean mask

2 : A, 2^{k^{'}} = SecB 2 A_{k^{'}}^{d} (B, ⌈ \log_{2} (2 η + 1) ⌉)

Convert

to arithmetic mask

3 : S_{2}^{A, 2^{k^{'}}} = {SecKron}_{k^{'}}^{d} (A, 2^{k^{'}}, L)

Apply share-wise Kronecker substitution

4: C = Kron(ĉ, L)

custom-character

Apply Kronecker substitution

5 : R^{A, 2^{k^{'}}} = C \cdot S_{1}^{A, 2^{k^{'}}}

Apply share-wise integer multiplication

6 : {\hat{R}}^{A, 2^{k^{'}}} = R^{A, 2^{k^{'}}}

Convert to polynomial representation

7 : {\hat{R}}^{B, \log_{2} (2 γ_{2})} = SecA 2 B_{k^{'}}^{d} ({\hat{R}}^{A, 2^{k^{'}}})

Convert to Boolean mask (pad to same size as custom-character

)

8: {circumflex over (R)}^B,log₂^(4γ₂⁾= SecAdd_log₂_(4γ₂₎^d( custom-character

^B,log₂^(2γ₂⁾, {circumflex over (R)}^B,log₂^(2γ₂⁾

custom-character

Subtract ĉ custom-character

share-wise from custom-character

9: return {circumflex over (R)}^B,log₂^(4γ₂⁾

The advantages of the above described methods over other methods will now be described. The established method of masking Dilithium is to rely on masks modulo the prime q, because all arithmetic operations are performed in custom-character _q[X]/(X²⁵⁶+1) using NTTs. In this way representing a single polynomial of or requires 736 bytes if exactly 23 bits per coefficient are used, and 1024 bytes if a more implementation-friendly 32 bits per coefficient are used. If all of or were loaded into volatile memory at once it could require up to 1024·k=8192 bytes only to store one of the long-term secrets without masking. Using first-order masking with d=2 we would need 2·8192=16384 bytes only to store one of custom-character or . On the other hand, if or were Boolean masked they require only 3 bits per coefficient for Dilithium2 and Dilithium5, and 4 bits per coefficient for Dilithium3. With this method storing all of or requires at most 4·256·6/8=768 bytes across all parameter sets. This shows the huge benefit of an initial Boolean mask on the serialized long-term secrets, followed by a mask conversion on-the-fly whenever an element of the vector is needed.

Moreover, the packed representation of custom-character and represents an element in [−2,2] or [−4,4] (depending on the parameter set) in 3 or 4 bits, respectively. Converting this to a representation in 23 bits modulo q requires operations on the packed but unprotected bits of the long-term secret. This would result in significant leakage that is avoided by immediately applying a Boolean mask and unpacking afterwards.

Besides, it is noted that the 23 bits from q are more than necessary for masking. The only requirement is that there are no reduction in the operations c· custom-character and c·, and therefore it suffices to select a masking value that is at least 2·β+1. Because masked operations work most efficiently with powers of two, the smallest power of two 2^k′ that is larger than 2·β+1 is simply selected. More concretely, the SecA2B and SecB2A operations may simple be used instead of the slower SecA2BModp and SecB2AModp for the prime q.

One of the downsides of avoiding q is that NTTs are no longer available for the polynomial multiplication. However, with the availability of co-processors for integer multiplication, the Kronecker substitution method (or KroneckerPlus) leads to extremely fast multiplication routines. This could be replaced by other routines if integer multiplication hardware is not available, such as schoolbook multiplication.

Because an arithmetic mask modulo a power of 2 is used, the mask conversions to Boolean shares are fairly cheap. Therefore, instead of converting ŷ and ŵ₀to arithmetic masks, the product polynomials are converted to Boolean masks instead. This allows for performing the addition/subtraction in Boolean domain using a SecAdd or SecSub. Although not as efficient as when done in arithmetic domain, this may still be performed very efficiently by bitslicing the coefficients and performing all the additions in parallel. More generally, all operations except the arithmetic polynomial multiplication may be bitsliced and therefore be computed very efficiently. This has the added benefit that single-bit leakage is much harder to exploit.

Finally, the representations of the chosen masks fit most conveniently with the operations before (generating ŷ with SHAKE-256 and decomposition for ŵ₀) and afterwards (infinity norm checks). All in all this algorithm leads to a more memory friendly and efficient approach compared to masking modulo q as is done existing literature.

FIG. 2 illustrates an exemplary hardware diagram 200 for implementing cryptographic operation such as y+cs₁or w₀−cs₂. As shown, the device 200 includes a processor 220, memory 230, user interface 240, network interface 250, and storage 260 interconnected via one or more system buses 210. It will be understood that FIG. 2 constitutes, in some respects, an abstraction and that the actual organization of the components of the device 200 may be more complex than illustrated.

The processor 220 may be any hardware device capable of executing instructions stored in memory 230 or storage 260 or otherwise processing data. As such, the processor may include a microprocessor, microcontroller, graphics processing unit (GPU), neural network processor, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), or other similar devices. The processor may be a secure processor or include a secure processing portion or core that resists tampering.

The memory 230 may include various memories such as, for example L1, L2, or L3 cache or system memory. As such, the memory 230 may include static random-access memory (SRAM), dynamic RAM (DRAM), flash memory, read only memory (ROM), or other similar memory devices. Further, some portion or all of the memory may be secure memory with limited authorized access and that is tamper resistant.

The user interface 240 may include one or more devices for enabling communication with a user such as an administrator. For example, the user interface 240 may include a display, a touch interface, a mouse, and/or a keyboard for receiving user commands. In some embodiments, the user interface 240 may include a command line interface or graphical user interface that may be presented to a remote terminal via the network interface 250.

The network interface 250 may include one or more devices for enabling communication with other hardware devices. For example, the network interface 250 may include a network interface card (NIC) configured to communicate according to the Ethernet protocol or other communications protocols, including wireless protocols. Additionally, the network interface 250 may implement a TCP/IP stack for communication according to the TCP/IP protocols. Various alternative or additional hardware or configurations for the network interface 250 will be apparent.

The storage 260 may include one or more machine-readable storage media such as read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, or similar storage media. In various embodiments, the storage 260 may store instructions for execution by the processor 220 or data upon with the processor 220 may operate. For example, the storage 260 may store a base operating system 261 for controlling various basic operations of the hardware 200. The storage 262 may include instructions for carrying out cryptographic operations such as for example y+cs₁or w₀−cs₂.

It will be apparent that various information described as stored in the storage 260 may be additionally or alternatively stored in the memory 230. In this respect, the memory 230 may also be considered to constitute a “storage device” and the storage 260 may be considered a “memory.” Various other arrangements will be apparent. Further, the memory 230 and storage 260 may both be considered to be “non-transitory machine-readable media.” As used herein, the term “non-transitory” will be understood to exclude transitory signals but to include all forms of storage, including both volatile and non-volatile memories.

The system bus 210 allows communication between the processor 220, memory 230, user interface 240, storage 260, and network interface 250.

While the host device 200 is shown as including one of each described component, the various components may be duplicated in various embodiments. For example, the processor 220 may include multiple microprocessors that are configured to independently execute the methods described herein or are configured to perform steps or subroutines of the methods described herein such that the multiple processors cooperate to achieve the functionality described herein. Further, where the device 200 is implemented in a cloud computing system, the various hardware components may belong to separate physical systems. For example, the processor 220 may include a first processor in a first server and a second processor in a second server.

The foregoing disclosure provides illustration and description but is not intended to be exhaustive or to limit the aspects to the precise form disclosed. Modifications and variations may be made in light of the above disclosure or may be acquired from practice of the aspects.

As used herein, the term “component” is intended to be broadly construed as hardware, firmware, and/or a combination of hardware and software. As used herein, a processor is implemented in hardware, firmware, and/or a combination of hardware and software.

As used herein, satisfying a threshold may, depending on the context, refer to a value being greater than the threshold, greater than or equal to the threshold, less than the threshold, less than or equal to the threshold, equal to the threshold, not equal to the threshold, and/or the like. It will be apparent that systems and/or methods described herein may be implemented in different forms of hardware, firmware, and/or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the aspects. Thus, the operation and behavior of the systems and/or methods were described herein without reference to specific software code—it being understood that software and hardware can be designed to implement the systems and/or methods based, at least in part, on the description herein.

As used herein, the term “non-transitory machine-readable storage medium” will be understood to exclude a transitory propagation signal but to include all forms of volatile and non-volatile memory. When software is implemented on a processor, the combination of software and processor becomes a specific dedicated machine.

Because the data processing implementing the embodiments described herein is, for the most part, composed of electronic components and circuits known to those skilled in the art, circuit details will not be explained in any greater extent than that considered necessary as illustrated above, for the understanding and appreciation of the underlying concepts of the aspects described herein and in order not to obfuscate or distract from the teachings of the aspects described herein.

Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements.

It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative hardware embodying the principles of the aspects.

While each of the embodiments are described above in terms of their structural arrangements, it should be appreciated that the aspects also cover the associated methods of using the embodiments described above.

Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of various aspects. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of various aspects includes each dependent claim in combination with every other claim in the claim set. A phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).

No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items and may be used interchangeably with “one or more.” Furthermore, as used herein, the terms “set” and “group” are intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and/or the like), and may be used interchangeably with “one or more.” Where only one item is intended, the phrase “only one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having” and/or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.

Claims

1. A data processing system comprising instructions embodied in a non-transitory computer readable medium, the instructions for a cryptographic operation using polynomials for lattice-based cryptography in a processor, the instructions, comprising: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial;applying a Kronecker substitution to a second polynomial;multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output;converting the shares of the first output to arithmetic shares of a polynomial representation;converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation;arithmetically adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; andcarrying out a cryptographic operation using the Boolean shares of the second output.
2. The data processing system of claim 1, wherein the instructions further comprise: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.
3. The data processing system of claim 2, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:
4. The data processing system of claim 1, wherein the Kronecker substitution is a Kronecker plus substitution.
5. The data processing system of claim 1, wherein multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial includes calculating
6. The data processing system of claim 1, wherein adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}B,log2(4γ1)=SecAddlog2(4γ1)d(ŷB,log2(2γ1),{circumflex over (R)}B,log2(2γ1)),
7. A data processing system comprising instructions embodied in a non-transitory computer readable medium, the instructions for a cryptographic operation using polynomials for lattice-based cryptography in a processor, the instructions, comprising: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial;applying a Kronecker substitution to a second polynomial;multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output;converting the shares of the first output to arithmetic shares of a polynomial representation;converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation;subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; andcarrying out a cryptographic operation using the Boolean shares of the second output.
8. The data processing system of claim 7, wherein the instructions further comprise: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.
9. The data processing system of claim 8, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:
10. The data processing system of claim 7, wherein the Kronecker substitution is a Kronecker plus substitution.
11. The data processing system of claim 7, wherein multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial includes calculating
12. The data processing system of claim 7, wherein subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}B,log2(4γ2)←SecSublog2(4γ2)d(B,log2(2γ2),{circumflex over (R)}B,log2(2γ2)),
13. A method for a cryptographic operation using polynomials for lattice-based cryptography, comprising: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial;applying a Kronecker substitution to a second polynomial;multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output;converting the shares of the first output to arithmetic shares of a polynomial representation;converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation;arithmetically adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; andcarrying out a cryptographic operation using the Boolean shares of the second output.
14. The method of claim 13, wherein the instructions further comprise: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.
15. The method of claim 14, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:
16. The method of claim 13, wherein the Kronecker substitution is a Kronecker plus substitution.
17. The method of claim 13, wherein multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial includes calculating
18. The method of claim 13, wherein adding the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}B,log2(4γ1)=SecAddlog2(4γ1)d(ŷB,log2(2γ1),{circumflex over (R)}B,log2(2γ1)),where {circumflex over (R)}B,log2(4γ1) are Boolean shares of the second output, SecAddlog2(4γ1)d is a secure add function, ŷB,log2(2γ1) are the Boolean shares of a third polynomial, {circumflex over (R)}B,log2(2γ1) are the Boolean shares of the polynomial representation, and γ1 defines a range [−γ1−1,γ1] of the coefficients of the third polynomial ŷ.
19. A method for a cryptographic operation using polynomials for lattice-based cryptography in a processor, the instructions, comprising: applying a share-wise Kronecker substitution to arithmetic shares of a first polynomial;applying a Kronecker substitution to a second polynomial;multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial to produce arithmetic shares of a first output;converting the shares of the first output to arithmetic shares of a polynomial representation;converting the arithmetic shares of the polynomial representation to Boolean shares of the polynomial representation;subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output; andcarrying out a cryptographic operation using the Boolean shares of the second output.
20. The method of claim 19, wherein the instructions further comprise: converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial.
21. The method of claim 20, wherein converting Boolean shares of the first polynomial into the arithmetic shares of the first polynomial includes calculating:
22. The method of claim 19, wherein the Kronecker substitution is a Kronecker plus substitution.
23. The method of claim 19, wherein multiplying share-wise the Kronecker substitution of the second polynomial and the arithmetic shares of the Kronecker substitution of the shares of the first polynomial includes calculating
24. The method of claim 19, wherein subtracting the Boolean shares of the polynomial representation to Boolean shares of a third polynomial to produce Boolean shares of a second output includes calculating {circumflex over (R)}B,log2(4γ2)←SecSublog2(4γ2)d(B,log2(2γ2),{circumflex over (R)}B,log2(2γ2)),

MASKED KRONECKER SUBSTITUTION FOR POLYNOMIAL MULTIPLICATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims