Aspects of the present disclosure are directed to cryptographic computing applications, more specifically to protection of lattice-based post-quantum cryptographic applications from side-channel attacks.
The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various implementations of the disclosure.
In public-key cryptography systems, a processing device may have various components/modules used for cryptographic operations on input messages, which are typically represented via large integers. Cryptographic algorithms often involve modular arithmetic operations with modulus q, in which the set of all integers Z is wrapped around a circle of length q (the set Zq), so that any two numbers that differ by q (or any other integer multiple of q) are congruent to (and treated as) the same number within Zq. Pre-quantum cryptographic applications such as the Rivest-Shamir-Adelman (RSA) algorithm, digital signature algorithms (DSA), Diffie-Hellman key exchange (DH), Elliptic Curve Cryptography (ECC), and the like-exploit the fact that solving an integer factorization problem, a discrete logarithm problem, an elliptic curve discrete logarithm problem, and/or the like, involves prohibitively difficult operations (for large moduli q) on a classical computer.
Progress in quantum computing technology has placed conventional public key encryption schemes into jeopardy. In response, in 2016, the National Institute of Standards and Technology (NIST) initiated a Post-Quantum Cryptography (PQC) standardization process to promote development of public-key cryptographic algorithms that are resistant against attacks using quantum computers. In July 2022, after rigorous analysis and evaluation, NIST has finalized the following algorithms: CRYSTALS-KYBER (referred to as Kyber herein) key encapsulation mechanism (KEM), CRYSTALS-DILITHIUM digital signatures algorithm (referred to as Dilithium herein), FALCON digital signatures algorithm, and SPHINCS+ hash-based signature algorithm. In particular, NIST recommended Dilithium as the primary signature algorithm. Additional KEM algorithms are currently considered, including BIKE, Classic McEliece, and HQC. Further NIST competitions have been initiated for signature algorithms that are based on different mathematical foundations.
As an example, Kyber algorithm—is based on the Module-Learning-With-Errors (MLWE) problem on structured lattices with the underlying operations involving matrix-vector (and vector-vector) multiplications where the elements of the matrices/vectors are polynomials defined on a ring Rq=Zq[x]/(xn+1), namely polynomials with coefficients in Zq and polynomial operations defined modulo the modulus polynomial xn+1. Although confidential data encrypted using Kyber and/or other similar polynomial-based cryptographic techniques may be well protected from unauthorized accesses while in the ciphertext form, a weak security link occurs on a sender's or a recipient's side, where a private key may be exposed. In particular, decryption typically includes computing polynomial products of a series of ciphertexts c1(x), c2(x), c3(x), . . . and some secret polynomials s(x) (e.g., the private key or other secret data derived from the private key). As the same secret data is multiplied over and over by varying and known (to the attacker) ciphertexts, the secret data may become vulnerable to side-channel attacks. During a side-channel attack, an attacker observes a large number of multiplications s(x)·c1(x), s(x)·c2(x), s(x)·c3(x) . . . and monitors signals produced by electronic circuits of the targeted computer. Monitored signals may be acoustic, electrical, magnetic, optical, thermal, and so on. By recording such signals, a hardware trojan and/or a malicious software may correlate specific processor (and/or memory) activity with computations carried out by the targeted computer. A simple power analysis (SPA) side-channel attack involves examination of the electrical power used by the device as a function of time. As presence of noise hides the signal of the processor/memory, a more sophisticated differential power analysis (DPA) attack may include statistical analysis of power measurements performed over multiple cryptographic operations (or multiple iterations of a single cryptographic operation). An attacker employing DPA may filter out the noise component of the power signal (using the fact that the noise components may be uncorrelated between different operations or iterations) to extract the component of the signal that is representative of the actual processor activity, and to infer the value of the secret polynomial s(x) from this signal, thus gaining access to the private key.
Although the above illustration uses the Kyber algorithm as an example, similar side channel attacks can be used to compromise integrity of digital signature algorithms, including the Dilithium algorithm, FALCON algorithm, and/or other digital signature algorithms.
Protection against side-channel attacks includes various masking techniques. Masking involves generating a random (or pseudorandom) masking polynomial m(x) and combining (e.g., adding, multiplying, etc.) the secret polynomial s(x) with the masking polynomial to reduce correlations of side-channel measurements with the secret polynomial. Masking, however, comes at the cost of additional computations, since it is typically necessary to perform computations on both the masked data, e.g., s(x)+m(x), and the masking data m(x) separately before using the results of these two (or more) computations to unmask a final output. This increases latency, reduces computational throughput, and consumes valuable processing and memory resources.
Aspects and implementations of the present disclosure address these and other challenges of the existing technology by enabling systems and techniques to implement masking that does not require performing separate computations on both the masked data and the masking data and enables efficient unmasking (combining) operations at the end of the algorithm. More specifically, products of a large secret polynomial and a large public polynomial are known to be efficiently computed using Number Theoretic Transforms (NTTs, described in more detail below) that are analogous to Digital Fourier Transforms and can be performed using only O(n log n) operations (rather than O(n2) operations for a conventional schoolbook multiplication). For example, a set {si} of coefficients of a polynomial s(x)=Σi=0n-1 sixi may be transformed to the NTT domain, {si}→{ŝi} (and similarly for other polynomials, e.g., {ci}→{ĉi}), where a product of two polynomials, p(x)=s(x)·c(x), is represented by a set of coefficients {{circumflex over (p)}i}={ŝiĉi} that are elementwise products of the coefficients of the two sets. An inverse NTT then transforms the obtained set, {{circumflex over (p)}i}→{pi}, to the set of the product polynomial p(x)=Σi=0n-1pixi (e.g., an output of the cryptographic operation). The NTT/inverse NTT operations as well as NTT domain products may be modulo-q operations, where q is set by a particular algorithm specification, e.g., q=13×28+1=3329 for Kyber, q=223−213+1=8380417 for dilithium, and the like.
In some implementations of the present disclosure, to mask various polynomial operations, a larger circle ZQ (auxiliary domain) may be selected for masking operations that embeds ZQ, such that q divides Q with Q/q being an integer much greater than one, e.g., 28, 212, 216, or some other number. The polynomials can be mapped from si, ci∈Zq to Si, Ci∈ZQ, which may be accompanied by adding arbitrary masking polynomials MS(x), MC(x) multiplied by modulus q: S(x)=s(x)+qMS(x), C(x)=c(x)+qMC(x). Such addition, while efficiently masking the polynomials in the ZQ domain, nonetheless retains information about the original polynomials s(x) and c(x). Correspondingly, when the final product P(x) computed in ZQ is transformed back to the working domain Zq, P(x)→p(x), the result p(x) automatically amounts to the correct product s(x)·c(x) without any need for additional unmasking or separate computations performed on the masks.
This and various other disclosed techniques are not limited to lattice-based algorithms and may be used in a variety of cryptographic devices and applications. Numerous additional implementations are disclosed herein. The advantages of the disclosed implementations include, but are not limited to, secure execution of cryptographic applications using masking techniques that do not require separate unmasking operations and/or separate parallel computations on the masking polynomials. The disclosed implementations may be used in public key cryptography, symmetric key cryptography, digital signature algorithms, homomorphic encryption, and/or various other cryptographic applications.
As disclosed herein, an embedded domain masking 110 may receive ciphertexts 126 and private key 106 may implement masking to protect the private key 106 (or any other secret data derived from private key 106) from a side-channel attack during decryption of the received ciphertext 126. Although, for illustration, ciphertext(s) and plaintext(s) are generated/processed by different devices in the illustration of
Embedded domain masking 110 may select an auxiliary domain ZQ that embeds the working domain Zq, e.g., by using a random process (or a constrained random process), as disclosed below. Embedded domain masking 110 may further select masking polynomials for the secret and public data. Decryption without unmasking module 112 may perform the decryption process, e.g., by first transforming the two polynomials to the NTT domain in ZQ, multiplying polynomial coefficients, performing the inverse NTT, and then transforming the product embedded into ZQ into the working domain Zq in such a way that a correct plaintext 114 is obtained without any special unmasking or separate computations handling the masks.
Sending device 120 may then send message signature 115 together with message 114 to receiving device 102 over public communication channel 130. Receiving device 102 may use public key 108 to perform message verification 116 of message signature, e.g., using public key 106 (and, optionally, the message hash). In some implementations, the digital signature scheme may be one of the post-quantum digital signature schemes, including but not limited to Dilithium FALCON, and/or the like.
Although, in the illustration of
Example computing system 200 may include an input/output (I/O) interface 204 to facilitate connection of computing device 202 with peripheral hardware devices 206 such as card readers, terminals, printers, scanners, internet-of-things devices, and the like. Example computing system 200 may further include a network interface 208 to facilitate connection to a variety of networks (Internet, wireless local area networks (WLAN), personal area networks (PAN), public networks, private networks, etc.), and may include a radio front end module and other devices (amplifiers, digital-to-analog and analog-to-digital converters, dedicated logic units, etc.) to implement data transfer to/from the computing device 202. For example, network interface 208 may be used to support a connection to sending device 120 of
Example computing system 200 may support one or more cryptographic applications 210-n, such as one or more external cryptographic applications 210-1 and/or one or more embedded cryptographic applications 210-2. Cryptographic applications 210-n may be secure authentication applications, public key signature applications, key encapsulation applications, key decapsulation applications, encryption applications, decryption applications, fully homomorphic encryption/decryption applications, secure storage applications, and so on. External cryptographic application 210-1 may be instantiated on the same computing device 202, e.g., by an operating system executed by the processor 220 and residing in a memory device 230. Alternatively, external cryptographic application 210-1 may be instantiated by a guest operating system supported by a virtual machine monitor (hypervisor) executed by the processor 220. In some implementations, external cryptographic application 210-1 may reside on a remote access client device or a remote server (not shown), with the computer device 202 providing cryptographic support for the client device and/or the remote server.
Processor 220 may include one or more processor cores 222 having access to cache 224 (e.g., a single-level or multi-level cache) and one or more hardware registers 226. In some implementations, each processor core 222 may execute instructions to run a number of hardware threads, also known as logical processors. Various logical processors (or processor cores) may be assigned to one or more cryptographic applications 210-n, although more than one processor may be assigned to a single cryptographic application for parallel processing. Memory device 230 may refer to a volatile or non-volatile memory and may include a read-only memory (ROM) 232, a random-access memory (RAM) 234, as well as (not shown) electrically erasable programmable read-only memory (EEPROM), flash memory, flip-flop memory, or any other device capable of storing data. RAM 234 may be a dynamic random access memory (DRAM), synchronous DRAM (SDRAM), a static memory, such as static random access memory (SRAM), and the like.
Memory device 230 may include one or more registers, such as one or more input registers 236 to store cryptographic keys, input polynomials, and other data for cryptographic applications 210-n. Memory device 230 may further include one or more output registers 238 to store outputs of cryptographic application, and one or more working registers 240 to store various intermediate values generated in the course of performing cryptographic computations, including masking operations. Memory device 230 may also include one or more control registers 242 for storing information about modes of operation, selecting a cryptographic algorithm, initializing cryptographic computations, selecting a masking mode, selecting auxiliary domain ZQ, sampling masking polynomials, modifying secret and ciphertext data with sampled polynomials, and/or the like. Control registers 242 may communicate with one or more processor cores 222 and a clock 228, which may keep track of a processing operation (e.g., iteration of the NTT/inverse NTT) being performed. In some implementations, registers 236-242 may be implemented as part of RAM 234. In some implementations, some or all of the registers 236-242 may be implemented separately from RAM 234. Some of or all registers 236-242 may be implemented as part of processor 220 (e.g., as part of the hardware registers 226). In some implementations, processor 220 and memory device 230 may be implemented as a single field-programmable gate array (FPGA).
Computing device 202 may include a cryptographic engine 250 to support cryptographic operations of processor 220. Cryptographic engine 250 may be configured to perform side-channel attack-resistant cryptographic operations, in accordance with implementations of the present disclosure. As depicted in
While the underlying plaintext message may be confidential, the input data 302 itself may be public. Furthermore, during a side-channel attack, an attacker may generate many instances of input data 302 and observe processor and/or memory activity during decryption of the generated ciphertexts. (Alternatively, an attacker may observe processor/memory activity during processing of ciphertexts 302 generated by other entities.) Decryption or encryption of input data 302 may involve using a secret data 304, which may be any cryptographic key permanently stored on receiving device 102, ephemeral key or session key generated for a particular cryptographic episode, key generated to decrypt a particular message or a portion of a message, and/or any data string generated using secret key or other confidential information.
In one example implementation, input data 302 includes a set of numbers {ci}=c0 . . . cn-1, which may be represented as the polynomial c(x)=Σi=0n-1 cixi and transformed, in various arithmetic operations, as corresponding polynomial coefficients are transformed (for example, coefficients of a sum of two polynomials are given by sums-modulo a suitable modulus—of corresponding same-degree coefficients of the two polynomials). Similarly, secret data 304 may include a set of numbers {si}=s0 . . . sn-1, which may be represented as the polynomial s(x)=Σi=0n-1 sixi. The polynomials c(x), s(x) are defined modulo a suitable modulus polynomial, which for the purpose of illustration and not limitation may be xn+1. This choice of the modulus polynomial amounts to replacing powers xα with a≥n that arise in various polynomial products as simply xα→xα-n. Correspondingly, the product p(x)=s(x)·c(x) has a set of coefficients {pi} that are determined according to, pi=Σj=0n-1 si-jcj (mod q), namely a discrete convolution of sets {si} and {ci}, in which coefficients sk with negative k are to be understood in the sense, sk<0→sk+n The modulus q may be any suitable number, e.g., q=13×28+1=3329 for Kyber and q=223−213+1=8380417 for Dilithium. Direct computation of the coefficients {pi} using the convolution formula involves O(n2) multiplication products si-jcj. This number is reduced significantly (for n»1) by first transforming the polynomials into an NTT domain. More specifically, the n-point NTT transforms the polynomial s(x) (and, similarly, c(x)) into a polynomial ŝ(x) (and, similarly, Ĉ(x)) with the following coefficients,
where Wn is an nth principal root of unity modulo q. The inverse NTT determines polynomial {si} in terms of {ŝi},
As follows from the inverse NTT applied to the convolution formula, multiplication of the polynomials, s(x) and c(x) (or any other polynomials) is most simple in the NTT domain, as the coefficients of the NTTs of the product is the elementwise (Hadamard) product of the NTTs of the polynomials: {circumflex over (p)}i=ŝiĉi, or symbolically {circumflex over (p)}(x)=ŝ(x)ĉ(x). Accordingly, a fast NTT-based multiplication of polynomials may be performed by (1) transforming the polynomials to the NTT domain, s(x)→ŝ(x), c(x)→ĉ(x), (2) computing the elementwise product, {circumflex over (p)}(x)=ŝ(x)ĉ(x), and (3) performing the inverse NTT transform, {circumflex over (p)}(x)→p(x).
Fast NTT (performed similarly to the Fast Fourier Transform) computes n/2 2-point butterfly transforms in each of log2 n iterations. Essentially, a fast NTT amounts to computing n/2 2-point transforms in the first iteration followed by computing n/4 4-point transforms in the second iteration, and so on, until the last iteration produces the ultimate n-point NTT. Different groupings of input elements into each iteration may be used. Grouping even elements with adjacent odd elements gives rise to Cooley-Tukey butterfly operations, where two input elements into a particular iteration, A and B, are transformed into the output elements according to: A, B→A′=A+B·Wni, B′=A−B·Wni. Grouping elements from a first half of elements with corresponding elements from a second half gives rise to Gentleman-Sande butterfly operations, where input elements are transformed into the output elements according to: A, B→A′=A+B, B′=(A−B)·Wni. Often, the Cooley-Tukey butterfly operations are used for the forward NTTs and the Gentleman-Sande butterfly operations are used for the inverse NTTs. In some algorithms (e.g., Kyber), two or more n/m-point NTTs may be computed, as described in more detail below.
To protect secret polynomials from exposure to side-channel attacks during the operations of the NTT transform and the NTT domain multiplications, polynomials s(x) and c(x) in working domain 306, in which the polynomial coefficients are defined in the domain (ring) Zq, may be mapped to an auxiliary domain ZQ. In the auxiliary domain, coefficients of polynomials s(x) and c(x) are defined modulo Q that is a multiple of the size of the working domain Q=mq, where m is an integer number, referred to as the domain scaling factor herein. In some implementations, the domain scaling factor may be a number that is much greater than one, e.g., 26 or more. Initially the (unmasked) polynomials s(x) and c(x) may have the same coefficients in the auxiliary domain ZQ as in the working domain Zq while further computations (e.g., masking operations, additions, multiplications, etc.) with the polynomials may occur in the auxiliary domain that embeds the working domain. Multiplication of polynomials may be performed on an auxiliary ring RQ [x]=ZQ [x]/(xn+1), using the same modulus polynomial xn+1 as in the working domain Zq but with the coefficients now defined in the auxiliary domain ZQ. In some implementations, mapping of polynomial coefficients from the working domain to the auxiliary domain may be performed explicitly, e.g., by copying the coefficients from log2 q-bit registers (or memory addresses) to log2 Q-bit registers (or memory addresses) in which the log2(Q/q) most significant bits are assigned zero values. In some implementations, mapping of polynomial coefficients may be performed implicitly, e.g., associating the coefficients with modulo-Q operations.
The expansion of the domain in which coefficients of various polynomials are defined, Zq→ZQ facilitates efficient masking of the polynomials. More specifically, auxiliary domain masking 308 may mask polynomials s(x) and c(x),
by adding arbitrary masking polynomials MS(x) and MC(x) (defined on RQ) multiplied by the working domain modulus q. In some implementations, masking polynomials MS(x) and MC(x) may be polynomials of the same degree n−1 (as polynomials s(x) and c(x)) and coefficients that are randomly sampled from the auxiliary domain ZQ, e.g., using any suitable random (or pseudorandom) number generator 310. In some implementations, different coefficients of the same masking polynomial may be sampled independently from each other.
Auxiliary domain NTT 314 computations may then be performed in the auxiliary domain, on the masked polynomials SM(x) and CM(x), e.g., substantially as described above. An output of auxiliary domain NTT 314 may include the NTT polynomials Ŝ(x), Ĉ(x) with coefficients in ZQ. NTT multiplication 316 may then be performed using elementwise multiplication of the polynomial coefficients, {circumflex over (P)}i=ŜiĈi. The obtained polynomial coefficients {circumflex over (P)}i may define a polynomial {circumflex over (P)}(x) associated with plaintext data. Inverse NTT 318 may transform the NTT polynomial {circumflex over (P)}(x) back from the NTT domain: {circumflex over (P)}(x)→P(x).
Since operations of blocks 314, 316, and 318 are performed starting from the masked polynomials SM(x) and CM(x) defined on the auxiliary polynomial ring RQ=ZQ [x]/(xn+1) with coefficients in the auxiliary domain ZQ, no secret data is revealed in the course of computations involved in operations of blocks 310314, 316, and 318. The output P(x) may now be transformed from the auxiliary ring RQ to Rq, by performing a reduction to the working domain, e.g.,
p
i
=P
i mod q,
resulting in a output data 322 (e.g., plaintext in decryption operations, signature in digital signature applications, and/or the like) represented by the polynomial p(x)=Σi=0n-1 pixi defined modulo the working domain modulus q.
In some implementations, additional masking may be performed, e.g., twiddle factor masking 312. For example, for the Cooley-Tukey butterfly operation (where A and B may be any pair of coefficients of the masked polynomials SM(x) and CM(x) or any pair of coefficients derived from SM(x) and CM(x) via one or more NTT/NTT domain/inverse NTT operations),
masking may be performed for one or both inputs A, B, and a twiddle factor Wni. The masking may be performed modulo the auxiliary modulus Q. This does not affect the outputs in the working domain (modulo q). For example, masking A→A+MA, B→B+MB, and Wni→ni+MW results in the following output (presenting only A′, for conciseness),
where a is some integer that is used to bring A′ inside the domain ZQ. It follows that all the terms in the parenthesis would disappear upon a mod q reduction of the right-hand side of the last identity, since each of the terms in the parenthesis is divisible by q (as all masks Mα and Q are so divisible, by construction). Similarly, any number of intervening operations, e.g., additions, subtractions, multiplications, divisions, exponentiations, and/or the like, do not affect the final output of decryption process 300, e.g., output data 322. Accordingly, no additional unmasking is required when the disclosed techniques are deployed, as working domain reduction 320 automatically recovers the correct plaintext values. Stated equivalently, to achieve masking without unmasking, the masking polynomials MS(x) and MC(x) may be selected from the kernel of homomorphism RQ[x]→Rq[x], namely as the set of polynomials in RQ [x] that map to a zero element of Rq[x].
Numerous implementations of secret data protection by domain embedding are within the scope of the present disclosure. It should be understood that although, in some implementations, input data 302 may be ciphertext 126 (see
In some applications, a full n-point NTT may not exist. For example, Kyber uses the modulus polynomial x256+1, that does not factorize into a product of n=256 linear polynomials terms but factorizes into n/2=128 quadratic polynomials. In such instances, roots Wn1, Wn3, Wn5, . . . Wnn-1 may be used to implement two n/2-point NTTs, separately for even-numbered and odd-numbered coefficients of s(x) (and, similarly, c(x) and/or any other polynomial). For example (where mod q operations are implied but not explicitly stated, for brevity),
Correspondingly, the convolution formula for the product of two polynomials results in the elementwise-pairwise multiplication of the respective even-numbered NTT and odd-numbered NTT,
or, equivalently, as the product of degree-one polynomials,
Similarly, when the highest-degree root has a degree n/m, where m=3, 4, etc., m sets of partial NTTs may be defined with the product of two polynomials defined in the NTT domain as products of degree m−1 polynomials. In each instance, secret data may be protected by masking various coefficients of those polynomials and/or twiddle factors Wn/mik by adding arbitrary masking coefficients q·M defined in the auxiliary domain ZQ.
In some implementations, masking may be performed on initial polynomials s(x) and c(x) and/or twiddle factors W. In some implementations, re-masking of coefficients and/or twiddle factors may be performed (in the auxiliary domain ZQ) after completion of some portion of the computations, e.g., after completion of any number of NTT and/or inverse NTT iterations, before and/or after NTT domain multiplications, and/or any combination thereof. Various polynomial coefficients as well as different twiddle factors may be masked independently from each other, e.g., some or all in the set of twiddle factors Wn0, Wn1, . . . Wnn-1 may be masked with masks that are different from masks used for other twiddle factors.
In some implementations, twiddle factors Wni for performing the NTT on the auxiliary ring RQ [x], may be the same as the original twiddle factors defined in the working domain Zq and redefined in the auxiliary domain ZQ. For example, each original twiddle factors may be stored using minimum ┌log q┐ bits. The twiddle factors in the auxiliary domain may use ┌log Q┐ bits, with the additional ┌log Q┐−┌log q┐ bits having zero values.
In some implementations, the twiddle factors for the NTT on RQ [x] may be based on the principal root of unity {tilde over (W)}n1 that is different from Wn1. In one example, the Hensel lifting algorithm may be used to obtain, (based on the principal root Wn1 that obeys the equation (Wn1)n=1 (mod q)), a root γ that obeys the equation, γn=1 (mod q2). The obtained root may then be used, γ→{tilde over (W)}n1, as the principal root {tilde over (W)}n1, from which other twiddle factors {tilde over (W)}n1 for the NTT on RQ [x] may be computed by exponentiating {tilde over (W)}n1 the appropriate number i of times.
In some implementations, the twiddle factors for the NTT on RQ [x] may be based on the principal root of unity γ that is obtained as follows. First, Bezout coefficients a=q−1 mod m and b=m−1 mod q are defined, so that aq+mb=1, where m=Q/q. This may be accomplished, e.g., using the Euclidean algorithm (or the extended Euclidean algorithm). An additional root of unity β modulo m may be precomputed, βn mod m=1. A principal root of unity modulo Q=mq may then be computed as γ=aqβ+bmWn1 since
The obtained root γ may be used, γ→{tilde over (W)}n1, as the principal root for the NTT on RQ [x], which results in correct polynomial multiplication products modulo q. The new principal root of unity, γ, may additionally be masked with masks that are integer multiples of q, e.g., as disclosed above.
In some implementations, masks for the principal roots/twiddle factors may be randomly selected subject to suitable fitness criteria. For example, the fitness criteria may include verifying that the principal roots/twiddle factors are non-zero (modulo m=Q/q). In another example, the fitness criteria may include verifying that the NTT in the auxiliary domain is invertible (e.g., represented by a matrix having a non-zero determinant) in ZQ.
Method 400 may include, at block 410, identifying, using a processing device, a plurality of input polynomials, such as a first polynomial (e.g., s(x)) associated with a first data and a second polynomial (e.g., c(x)) associated with a second data. In some implementations, the first data may be a secret data (e.g., a private key or any secret information obtained using the private key) and the second data may be a non-secret or public data (e.g., a ciphertext in decryption algorithms, a document hash in digital signature algorithms, and/or the like). The plurality of polynomials may be defined on a working domain, e.g., with coefficients defined modulo a first modulus (e.g., q) and with polynomial operations defined modulo any suitable irreducible polynomial (e.g., a degree n polynomial). The combination of the first modulus and the degree of the irreducible polynomial represents a dimension of the working domain, e.g., a different number of ways in which coefficients of a given polynomial may be selected (e.g., q×n, where each of n coefficients of a degree n−1 polynomial may be selected from q different values).
At block 420, method 400 may continue with the processing device mapping the plurality of input polynomials to an auxiliary domain. The auxiliary domain may have a second dimension (e.g., Q×n) that is different from the first dimension. In some implementations, the second dimension is greater than the first dimension. In some implementations, the second modulus Q may be randomly sampled from a target range of values. In some implementations, the target range of values may be determined by at least one of: (1) a bit size of one or more registers storing coefficients of the plurality of masked polynomials, or (2) an operand size of a processing unit that supports computation of the one or more NTTs. For example, the second modulus Q may be selected using a restricted random sampling from such values that (1) are divisible by q, and (2) do not exceed 2N−1, where N is the bit size of registers/processing unit operands. In some implementations, mapping of the polynomials may maintain coefficients of the respective polynomials while defining the polynomials in the new-auxiliary-domain. In some implementations, mapping of the polynomials to the auxiliary domain may also include modification of the coefficients (e.g., using any reversible transformation).
Operations of block 420 may further include generating a plurality of masking polynomials, e.g., a first masking polynomial and a second masking polynomial. In some implementations, the first masking polynomial and the second masking polynomial are associated with a kernel of a homomorphism transformation from the auxiliary domain to the working domain. For example, coefficients of the masking polynomials (e.g., polynomials qMS(x) and qMC(x)) may be divisible by the first modulus q and may be defined modulo the second modulus Q.
At block 430, method 400 may include masking the first mapped polynomial with a first masking polynomial to obtain a first masked polynomial and, similarly, masking the second mapped polynomial with a second masking polynomial to obtain a second masked polynomial. In some implementations, as indicated with the callout block 432 in
At block 440, method 400 may include performing, using the first masked polynomial and the second masked polynomial, one or more computations in the auxiliary domain, e.g., one or more Number Theoretic Transforms (NTTs) performed modulo the second modulus. More specifically, as indicated with callout block 442, the one or more computations may include computing a first NTT of the first masked polynomial and computing a second NTT of the second masked polynomial. At callout block 444, the one or more computations may continue with an elementwise multiplication product of the first NTT and the second NTT. At block 446, the one or more computations may include an inverse NTT of the elementwise multiplication product of the first NTT and the second NTT.
In some implementations, the first NTT and the second NTT may be computed using a plurality of butterfly operations that deploy a plurality of twiddle factors, where one or more twiddle factors are masked using random numbers divisible by the first modulus. In some implementations, the first NTT of the first masked polynomial, the second NTT of the second masked polynomial, and/or the inverse NTT of the elementwise multiplication product may be based on a root of unity modulo the second modulus, wherein the root of unity modulo the second modulus (e.g., γ, as described above) is computed using a root of unity modulo the first modulus (e.g., Wn1).
At block 450, method 400 may include obtaining an output of the cryptographic operation (e.g., a plaintext in decryption algorithms, a document hash in digital signature algorithms, and/or the like) by transforming an output of the one or more computations (e.g., the output of the inverse NTT) from the auxiliary domain to the working domain. For example, the polynomial multiplication product output by the inverse NTT may be reduced modulo the first modulus.
Example computer system 500 may include a processing device 502 (also referred to as a processor or CPU), which may include processing logic 526, a main memory 504 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 506 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary memory (e.g., a data storage device 518), which may communicate with each other via a bus 530.
Processing device 502 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, processing device 502 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 502 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In accordance with one or more aspects of the present disclosure, processing device 502 may be configured to execute instructions implementing example method 400 of protecting secret data against side-channel attacks by domain embedding of polynomial computations.
Example computer system 500 may further comprise a network interface device 508, which may be communicatively coupled to a network 520. Example computer system 500 may further comprise a video display 510 (e.g., a liquid crystal display (LCD), a touch screen, or a cathode ray tube (CRT)), an alphanumeric input device 512 (e.g., a keyboard), a cursor control device 514 (e.g., a mouse), and an acoustic signal generation device 516 (e.g., a speaker).
Data storage device 518 may include a computer-readable storage medium (or, more specifically, a non-transitory computer-readable storage medium) 528 on which is stored one or more sets of executable instructions 522. In accordance with one or more aspects of the present disclosure, executable instructions 522 may comprise executable instructions implementing example method 400 of protecting secret data against side-channel attacks by domain embedding of polynomial computations.
Executable instructions 522 may also reside, completely or at least partially, within main memory 504 and/or within processing device 502 during execution thereof by example computer system 500, main memory 504 and processing device 502 also constituting computer-readable storage media. Executable instructions 522 may further be transmitted or received over a network via network interface device 508.
While the computer-readable storage medium 528 is shown in
Some portions of the detailed descriptions above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “identifying,” “determining,” “storing,” “adjusting,” “causing,” “returning,” “comparing,” “creating,” “stopping,” “loading,” “copying,” “throwing,” “replacing,” “performing,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Examples of the present disclosure also relate to an apparatus for performing the methods described herein. This apparatus may be specially constructed for the required purposes, or it may be a general purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic disk storage media, optical storage media, flash memory devices, other type of machine-accessible storage media, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
The methods and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear as set forth in the description below. In addition, the scope of the present disclosure is not limited to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementation examples will be apparent to those of skill in the art upon reading and understanding the above description. Although the present disclosure describes specific examples, it will be recognized that the systems and methods of the present disclosure are not limited to the examples described herein, but may be practiced with modifications within the scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense. The scope of the present disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
This application claims the benefit of U.S. Provisional Patent Application No. 63/529,243, filed Jul. 27, 2023, entitled “MASKING WITH EFFICIENT UNMASKING VIA DOMAIN EMBEDDING IN CRYPTOGRAPHIC DEVICES AND APPLICATION,” the entire contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63529243 | Jul 2023 | US |