Current trends in cloud computing and data analysis have led to an increase in Machine Learning (ML) as a Service (MLaaS). Since ML can include a significant number of calculations, costly hardware, and specially trained neural networks, clients using MLaaS can provide their data to a service provider to perform analysis on their data using neural networks and then receive the analysis from the service provider. In some cases, the data being analyzed can include sensitive data, such as medical data, financial data, or other personal data. Although the service provider (e.g., a MLaaS provider) may be trusted by the client, the service provider, its network, or the network between the client and the service provider may become compromised, thereby exposing the data sent by the client to the service provider and/or the result of the analysis provided to the client from the service provider.
Various types of homomorphic encryption, such as a Cheon Kim Kim Song (CKKS) encryption scheme, have been developed to allow encrypted data to be sent to a service provider and remain encrypted during arithmetic operations performed by the service provider, without the service provider decrypting the encrypted data in performing the operations. The service provider then sends an encrypted result back to the client, which decrypts the encrypted result using a secret key. Such homomorphic encryption can safeguard the data provided by the client and the result of the analysis if the service provider or the communication network between the client and the service provider becomes vulnerable.
However, such homomorphic encryption schemes have not been used with deep or wide Convolutional Neural Networks (CNN) since operations performed on data using such homomorphic encryption schemes significantly increase the computational complexity. This has prevented the practical use of homomorphic encryption with CNNs, which would create a significant processing bottleneck in terms of latency when using encrypted data to perform convolutions of the CNN.
The features and advantages of the embodiments of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings. The drawings and the associated descriptions are provided to illustrate embodiments of the disclosure and not to limit the scope of what is claimed.
In the following detailed description, numerous specific details are set forth to provide a full understanding of the present disclosure. It will be apparent, however, to one of ordinary skill in the art that the various embodiments disclosed may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail to avoid unnecessarily obscuring the various embodiments.
Network 110 can include, for example, a Storage Area Network (SAN), a Local Area Network (LAN), and/or a Wide Area Network (WAN), such as the Internet. In this regard, one or more of client device 102 and server 112 may not be physically co-located. Client device 102 and server 112 may communicate using one or more standards such as, for example, Ethernet or Fibre Channel.
Client device 102 includes one or more processors 104, interface 108, and memory 106. Processor(s) 104 can execute instructions, such as instructions from one or more applications loaded from memory 106, and can include circuitry such as, for example, a Central Processing Unit (CPU) (e.g., one or more Reduced Instruction Set Computer (RISC)-V cores), a Graphics Processing Unit (GPU), a microcontroller, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), hard-wired logic, analog circuitry and/or a combination thereof. In some implementations, processor(s) 104 can include a System on a Chip (SoC), which may be combined with memory 106.
Memory 106 can include, for example, a volatile Random Access Memory (RAM) such as Static RAM (SRAM), Dynamic RAM (DRAM), or a non-volatile RAM, or other solid-state memory that is used by processor(s) 104. Data stored in memory 106 can include, for example, data to be encrypted before being sent to server 112 and encrypted results received from server 112 that are decrypted to derive a final result, in addition to instructions loaded from one or more applications for execution by processor(s) 104, and/or data used in executing such applications, such as keys 14.
While the description herein refers to solid-state memory generally, it is understood that solid-state memory may comprise one or more of various types of memory devices such as flash integrated circuits, NAND memory (e.g., Single-Level Cell (SLC) memory, Multi-Level Cell (MLC) memory (i.e., two or more levels), or any combination thereof), NOR memory, EEPROM, other discrete Non-Volatile Memory (NVM) chips, or any combination thereof. In other implementations, memory 106 may include a Storage Class Memory (SCM), such as, Chalcogenide RAM (C-RAM), Phase Change Memory (PCM), Programmable Metallization Cell RAM (PMC-RAM or PMCm), Ovonic Unified Memory (OUM), Resistive RAM (RRAM), Ferroelectric Memory (FeRAM), Magnetoresistive RAM (MRAM), 3D-XPoint memory, and/or other types of solid-state memory, for example.
As shown in the example of
Server 112 in the example of
Memory 116 can include, for example, a volatile RAM such as SRAM, DRAM, or a non-volatile RAM, or other solid-state memory that is used by processor(s) 114. Data stored in memory 116 can include, for example, a ciphertext to be used as an input for performing a convolution or an encrypted output (i.e., an output ciphertext) resulting from the convolution. In addition, memory 116 can store instructions loaded from one or more applications for execution by processor(s) 114, such as computing module 18, and/or data used in executing such applications, such as one or more keys 20. As discussed in more detail below, key(s) 20 can include one or more evaluation keys used as part of a FHE scheme that enables convolutions to be performed on encrypted data and to return an encrypted result that can be decrypted using a secret key stored in memory 106 of client device 102.
As shown in
For its part, interface 118 may communicate with client device 102 via network 110 using, for example, Ethernet or Fibre Channel. Each network interface 118 may include, for example, a NIC, a network interface controller, or a network adapter.
Those of ordinary skill in the art will appreciate with reference to the present disclosure that system 100 in
As yet another example variation, the particular modules and arrangements of memory may differ in other implementations, such as with a consolidation of coding module 10 and FHE module 12 into a single security module and/or the consolidation of computing module 18 and activation module 24 into a single convolutional layer evaluation module in some implementations. In other variations, key(s) 20 or kernels 22 may not be stored in server 112 and may be accessed from another server or from client device 102. In some implementations, server 112 may receive already encoded and/or encrypted kernels from another server or device. Server 112 may not need coding module 16 in such implementations since the kernels may already be encoded into kernel polynomials when received by server 112.
In the example of
As will be appreciated by those of ordinary skill in the art with reference to the present disclosure, a Cheon Kim Kim Song (CKKS) scheme can be used for FHE that allows approximate arithmetic with encryption and decryption algorithms (Enc/Dec) using the ring of polynomials =[X]/(XN+1) as a plaintext space, where [X] denotes integral coefficients of the polynomials in the ring. Such a CKKS scheme is described in the paper by J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic Encryption for Arithmetic of Approximate Numbers”, in International Conference on the Theory and Application of Cryptology and Information Security, Springer, November 2017, pgs. 409-437, which is hereby incorporated by reference in its entirety. Using input messages m1, m2∈, the addition and multiplication of the input messages can be expressed as:
Dec(Enc(m1)Enc(m2))≈m1+m2
Dec(Enc(m1)Enc(m2))≈m1·m2 Eqs. 1
where and denote ciphertext addition and multiplication, respectively. As shown by Equations 1 above, the decoded results of the ciphertext addition or ciphertext multiplication of the encrypted messages is approximately equal to the addition or multiplication, respectively, of the messages within a bounded error.
The CKKS scheme uses the plaintext space for messages of complex numbers with an encoding algorithm (EcdΔ: →) and a decoding algorithm (DcdΔ: →) parameterized by a scaling factor Δ, which controls the precision of the following arithmetic operations in Equations 2 below, that for z1, z2∈:
DcdΔ(EcdΔ(z1)+EcdΔ(z2))≈z1⊕z2
DcdΔ(EcdΔ(z1)·EcdΔ(z2))≈z1⊙z2 Eqs. 2
where ⊕ and ⊙ denote the component-wise addition and Hadamard product of vectors, respectively, and C denotes a ring of complex numbers. With these encoding and decoding algorithms, the CKKS scheme encrypts a vector of complex numbers into a ciphertext and can perform addition or multiplication on the vectors with the foregoing ciphertext operations. The CKKS scheme also uses a rotation operation denoted by Rot. For an input ciphertext ct, the rotation operation outputs a ciphertext ctrot=Rot(ct, j), which has a message vector rotated j steps left from the input vector, which can be represented as:
Dcd(Dec(ctrot))≈(zj, zj+1, . . . , zN/2−1, Z0, . . . , zj−1) Eq. 3
where (z0, z1, . . . , zN/2−1)=Dcd(Dec(ct)). By using the rotation operations, the CKKS scheme can represent any arithmetic function with vectorized operations.
The computational cost of FHE for evaluating a given function primarily depends on the number of ciphertext multiplications and rotations required to represent the function. In addition, each ciphertext in the CKKS scheme has a level that can be denoted by ∈{0, 1, . . . , L} and a certain number of least significant digits of the messages can be truncated by decreasing the level of the ciphertexts. Notably, ciphertexts of a higher level are larger and have a greater computational cost. In general, performing computations of multiplicative depth L requires a ciphertext of level L as an input, where the multiplicative depth refers to the number of sequential encrypted or homomorphic multiplications that can be performed on the ciphertext and still be able to accurately decrypt the result.
To continue operations on a ciphertext of level 0, a bootstrapping can be performed as discussed in the paper by J. H. Cheon, A. Kim, K. Kim, and Y. Song, “Bootstrapping for Approximate Homomorphic Encryption”, in Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2018, pgs. 360-384, which is hereby incorporated by reference in its entirety. The bootstrapping outputs a new ciphertext of level L having similar messages. However, such bootstrapping typically requires a significant number of computations and has been mostly abandoned in prior works on Privacy Preserving Machine Learning (PPML) by being limited to inferences of low multiplicative depth. The evaluation of convolutions on encrypted data and the modified bootstrapping disclosed herein enable the evaluation of convolutional layers of CNNs, including deep CNNs.
A two-dimensional evaluation of a convolution without encrypted data can be represented as:
with I∈ being an input matrix representing, for example, image data, K∈ being a kernel, and denoting a ring of real numbers. The subscript i, j denotes the i-th row and j-th column of a matrix and 0≤i, j<d, and d=w−k+1.
A batched convolution with BB′ kernels, which can be represented as: K(B,B′)=(K0,0, . . . , Ki,i′∈, . . . , KB−1,B′−1), on B-batched inputs I(B)=(I0, . . . , Ii∈, . . . , IB−1) outputs B′ batch outputs 0≤b′<B′ as follows:
where the superscripts b′ and i denote the b′-th and i-th batches, respectively.
Evaluating a convolution using FHE can be performed with a reduced computational cost than previous schemes, such as CKKS, by using a different encoding and by multiplying an input polynomial by a kernel polynomial representing an encoded kernel matrix K in =[X]/(XN+1). The kernel or a batch portion of the kernel can be encoded into the kernel polynomial with kernel values from the kernel forming coefficients of terms in the kernel polynomial.
Instead of using the conventional CKKS encoding and decoding discussed above that uses the plaintext space for messages of complex numbers with EcdΔ: → and DcdΔ: →, the encoding and decoding of the present disclosure encodes and decodes the message vector directly into or from coefficients of a plaintext polynomial in a way that is more efficient for evaluating convolutions. This new convolutional FHE encoding and decoding is represented with Cf-EcdΔ and Cf-DcdΔ in Relations 6 below.)
Cf-EcdΔ(r0, r1, . . . , rN−1)∈)→└Δ·(r0+r1X+ . . . +rN−1XN−1)┐∈
Cf-EcdΔ(m0, m1X+ . . . +mN−1XN−1)∈)→(m0/Δ, m1/Δ, . . . , mN−1/Δ)┐∈
where └⋅┐ means that each coefficient is rounded to the nearest integer. Notably, the foregoing encoding and decoding (i.e., Cf-EcdΔ and Cf-DcdΔ) facilitates homomorphically computing polynomial addition and multiplication with N real numbers as coefficients, while conventional CKKS encoding discussed above performs vector addition and multiplication with N/2 complex numbers, resulting in only N/2 real numbers. Although two real numbers could be encoded into one complex number, such an encoding would not preserve component-wise multiplication.
Using the above convolutional encoding and decoding of Relations 6 (i.e., Cf-EcdΔ and Cf-DcdΔ), a convolution or batched convolution, Conv(I,K), can be represented by a product of two plaintext polynomials in . For example, one or more input datasets of real numbers can be represented by a matrix I and a kernel or batch portion of a kernel used for performing a convolution by server 112 can be represented by a kernel matrix K. The one or more input datasets can be encoded as a plaintext polynomial expressed in terms of the variable X as:
with Ii,j being an input value in the input matrix I at a row index of i and a column index of j, k being a total number of rows or columns in the kernel matrix K, w being a total number of rows or columns in the input matrix I, and s equal to 1 if the input matrix I represents a single input dataset or s being equal to or greater than a total number of input datasets if the input matrix I represents the batched input datasets. The plaintext polynomial I(X) is in the ring of polynomials expressed as =[X]/(XN+1) with N being a power of two and [X] denoting integral coefficients of the polynomials in the ring.
The kernel polynomial can be expressed in terms of the variable X as:
with Ki,j being a kernel value in the kernel matrix K at a row index of i and a column index of j. As with the plaintext polynomial I(X), the kernel polynomial K(X) is in the ring of polynomials of =[X]/(XN+1) with N being a power of two and [X] denoting integral coefficients of the polynomials in the ring. In simplifying the product of the plaintext polynomial I(X), or its encrypted counterpart Î(X), and the kernel polynomial K(X), Xt=−XN+t if t<0 with N≥max(sw2, sk2).
In this way, the size of the input matrix I and the size of the kernel matrix K do not exceed the degree bound or highest degree of the variable X of the product of the input polynomial and the kernel polynomial and can be simplified into a simplified polynomial where at least a plurality of the coefficients of the simplified polynomial forms an output of the convolution between the input matrix I and the kernel matrix K. As discussed in more detail below, the foregoing encoding of the one or more input datasets and of the kernel can enable a single multiplication to provide a convolution result on the one or more input datasets using the kernel or batch portions thereof. In contrast, convolutions using other forms of encoding typically involve many more arithmetic operations for a given input size and kernel size.
As shown in the example dataflow of
Coding module 10 of client device 102 encodes the input dataset into a plaintext polynomial expressed in terms of the variable X as I(X)=m1+m2X2+m3X4+m4X6. As shown in
With reference to the example of
The foregoing encoding and convolution performed with polynomial multiplication provides significant improvements in evaluating convolutions on data that has been encrypted using FHE in terms of computational complexity and processing time. These improvements are especially apparent as the size of the kernel (e.g., depth and width) and the size of the input increase since the number of multiplications can be significantly reduced as compared to performing convolutions with FHE using previous encoding algorithms and ciphertext arithmetic operations.
Returning to the example of
In more detail, the modified bootstrapping in some implementations includes converting the coefficients from the output ciphertext into a plurality of slots of an input vector according to the FHE scheme and performing a modular reduction approximated by a scaled sine function on each slot of the plurality of slots to generate reduced slots. An activation function (e.g., a ReLU function) is evaluated for the reduced slots to generate an output vector and values are extracted from the output vector based on a stride for the convolutional layer. The extracted values are then used to determine coefficients of an encrypted result polynomial {circumflex over (R)}(X), which remains encrypted according to the FHE scheme. In this regard, the convolutional encoding and FHE encryption disclosed herein does not require decryption after performing the convolution to evaluate the activation function, which enables server 112 to evaluate a convolutional layer without additional communication with client device 102.
As shown by the dashed line returning to computing module 18 in
The client device 102 then decrypts the result polynomial using FHE module 12 and secret key Ks according to the FHE scheme. The decrypted result polynomial R(X) is then decoded by coding module 10 to determine one or more results R for a CNN using coefficients from the decrypted polynomial, such as by following the Cf-DcdΔ portion of Relations 6 above. In some cases, the results may be for a single input dataset. In other cases, such as where the input polynomial represents a batched input of multiple input datasets, the coefficients of the decrypted polynomial can be decoded to provide corresponding CNN results for the different input datasets.
Those of ordinary skill in the art will appreciate with reference to the present disclosure that other implementations of the dataflow of
The input datasets are packed into the plaintext polynomial I(X) represented in
Since batched convolution is a summation of single convolutions as shown by Equation 5 above, computing module 18 can use sparsely-packed input and kernel polynomials, which can be denoted as Isp(X) and Ksp(X), so that convolutions between each batch are computed separately and then summed into one, which is represented in
The sparsely-packed polynomials can be formed as Isp(X)=I(Xs) and Ksp(X)=K(Xs) using Equations 7 and 8 above for I(X) and K(X), respectively, for an input I∈ and a kernel K∈ and a positive integer s, with N=max(sw2, sk2). It follows then that the coefficient at a left to right position corresponding to a s(iw+j)-th position in I(X)·K(X) when written in increasing degree of X equals Conv(I,K)i,j. In addition, the L-th coefficient of I(X)·K(X) is nonzero only if L is divisible by s (i.e., L=0 mod s). As used herein, such polynomials are referred to as “sparsely-packed” polynomials.
For a batch input I(B)∈ and kernels K(B,B)∈ with N≥max(w2B, k2B), and for batches b∈{0,1, . . . , B−1}, the encrypted output at a row i and column j for a convolution between I(B) and kernels K(B,B′) for a given batch b is the coefficient at a left to right position corresponding to a B(iw+j)-th position in I(X)·K(X) when written in increasing degree of X (i.e., Conv(IB, K(B,B′)i,jb). In other implementations, w2B and k2B may initially be smaller than N (i.e., N<w2B and N<k2B), but in such implementations, the number of batches B can be decreased by setting a subset of the input datasets to be encoded into a plaintext polynomial to effectively lower B so that N≥max(w2B, k2B) for the new smaller number for B. In such implementations, the other input datasets that were not encoded into the first plaintext polynomial could then be encoded into one or more additional plaintext polynomials for encryption and performance of the convolution for the remaining input datasets.
Algorithm 1 provided below (as pseudocode) can be used in some implementations by the server to perform a batched convolution.
Note that the batched convolution of Algorithm 1 above outputs one output among B batches constituting the result Conv(IB, K(B,B)). The multiplication can be performed B times by computing module 18 in
In more detail, the selection of coefficients from the B convolved polynomials (i.e., the B output ciphertexts) can be performed based on a scaled geometric sequence with a common ratio of two between each successive term in the geometric sequence. This is similar to a modified “PackLWE” algorithm used in the paper by Chen, H., Dai, W., Kim, M., and Song, Y., “Efficient Homomorphic Conversion Between (Ring) LWE Ciphertexts”, in International Conference on Applied Cryptography and Network Security, 2021, pgs. 460-479, which is hereby incorporated by reference in its entirety. Algorithm 2 below for packing the new polynomial of the final output ciphertext differs from the PackLWE algorithm in the above-referenced paper by adding an initial log step s that corresponds to the log base two of the output batch numbers (i.e., log2B) instead of setting s=log2 N as in the PackLWE algorithm. In addition, the EvalAuto(⋅,k) function in the PackLWE algorithm is replaced by a Rot(⋅,ρ5(k)) function evaluated using an evaluation key Keval according to the FHE scheme.
Algorithm 2 provided below (as pseudocode) can be used in some implementations by the server to homomorphically pack the B convolved polynomials (i.e., Learning With Error (LWE) ciphertexts) into one final output ciphertext representing the result of the batched convolution.
Assuming that each ciphertext cti(0≤i<) can be represented as a plaintext polynomial in the form:
C
i(X)=ci,0+ci,1X+ . . . +ci,N−1XN−1∈, satisfying ci,j·2
and letting n=, the output ciphertext ct of Algorithm 2 above can then be represented as a plaintext polynomial in the form:
μ(X)=n(μ0+μ1X+ . . . +μN−1XN−1)∈.
Algorithm 2 above collects the j·2s-th coefficients from each convolved polynomial Ci(X), which can be described as a geometric sequence scaled by j with a common ratio of two between each successive term in the geometric sequence.
The extraction and packing of the new convolved polynomial can be expressed with letting {ctb}0≤b<B be the ciphertexts that each derive from the coefficients of the plaintext polynomial resulting from I(X)·K(X). The new encoded or packed convolved polynomial can be represented as C(X) that has coefficients of X such that the (iB+b)-th coefficient of C(X) equals the i-th coefficient of I(X)·K(X) for a particular batch b. In other words, the coefficient corresponding to the (B(iw+j)+b)-th term of C(X) equals Conv(IB, K(B,B))i,jb). This results from setting up the batched convolutions as described above so that each output is located in a B-strided position among the coefficients so that the extracted coefficients from each resulting convolved polynomial can be packed or encoded efficiently into the new convolved polynomial C(X).
The batched convolutions cost B multiplications and one multiplicative depth to calculate {rb(X)}0≤b<B (i.e., one multiplication for each rb(X)). Packing the B ciphertexts into one convolved polynomial then costs 2(B−1) multiplications and B−1 rotations without consuming multiplicative depth. In more detail, Rotkeval(ct, ρ5(N/+1)) preserves the i-th coefficient of the underlying plaintext polynomial if i is divisible by and changes only its sign if i is divisible by but not by . This can be interpreted more naturally with automorphisms on whose detail can be found in the paper by Chen et al. incorporated by reference above. Notably, the multiplication of does not consume multiplicative depth since it does not increase the scaling factor of the ciphertext.
The leading term n can be removed without additional cost by regarding the scaling factor Δ of the input or output ciphertexts, as described above with reference to Relations 6, to be nΔ so that the messages are scaled to 1/n or multiplied by n−1. In addition, Algorithm 2 above can be further adapted for the batched convolutions disclosed herein by setting s such that 2s≥B. When the input ciphertexts are sparsely-packed as discussed above, the initial log step can be set as s+log2B so as to sparsely-pack the resulting output ciphertext.
The overall computational cost of homomorphic convolution using the encoding and decoding disclosed herein is provided in Table 1 below as “ConvFHE” in comparison to previous methods proposed that use “Vector Enc.” described in the paper by Juvekar, C., Vaikuntanathan, V., and Chandrakasan, A., “Gazelle: A Low Latency Framework for Secure Neural Network Inference”, in 27th USENIX Security Symposium (USENIX Security 18), 2018, pgs. 1651-1669, and “Spectral Enc.” described in the paper by Lou, Q., Lu, W., Hong, C., and Jiang, L., “Falcon: Fast Spectral Inference on Encrypted Data”, in Advances in Neural Information Processing Systems, 2020, Vol. 33, pgs. 2,364-2374. The number of available message slots and the multiplicative depths are also shown in Table 1 below. The computational cost can be represented by counting the number of required multiplications and rotations required for a given number of batches B, the kernel size k, and the input size w.
As shown in Table 1 above, the overall computational cost of using the ConvFHE encoding disclosed herein is less than the previously proposed Vector Encoding and Spectral Encoding when considering both the required multiplications and the rotations. In addition, there is no need for additional Discrete Fourier Transform (DFT) and Inverse DFT (IDFT) evaluation or decrypting the result after each convolution as with Spectral Encoding since the operations for ConvFHE are not transformed to a frequency domain and remain in a space domain (i.e., X). In addition, the multiplicative depth remains at a lower level of 1. The disclosed ConvFHE encoding also provides twice as many slots as Vector Encoding, which allows for twice as many elements to be packed into the input ciphertext. In the disclosed ConvFHE encoding, rotations are applied on the ciphertext at level 0 after multiplications are performed on an input ciphertext of level 1, so the computational cost is more affected by the multiplications. In contrast, rotations and multiplications for the Vector Encoding scheme are both on an input ciphertext of level 1. As a result, the cost of performing convolutions using the disclosed ConvFHE encoding scheme is approximately less by a factor of
The convolved polynomial C(X) has the same packing structure as the input polynomial I(X). As discussed in more detail below with reference to the modified bootstrapping of
In addition, the foregoing ConvFHE encoding can evaluate padded convolution by encoding the input appropriately with zeros. For example, for k-width kernels,
rows and columns of zeros can be added to the input for “SAME” padding (i.e., so that the input size is the same as the output size when the stride is equal to one). For strided convolution, the desired output can be extracted after evaluating the convolution.
When the total size Bw2 of the batched input is relatively small and is equal to N/2s for a positive integer s, the input can be packed even more sparsely using I(X2
Unlike the vector encoding typically used with conventional bootstrapping, the polynomial coefficient encoding (i.e., ConvFHE) discussed above does not provide a format that is compatible with evaluating an activation function following a convolution. This problem is solved by using an intermediate state within bootstrapping when the output from the convolution is in a vector form to evaluate the activation function for a convolutional layer. In addition, any extraction needed for a stride of the convolutional layer can also be performed during this intermediate state of the output during bootstrapping. Specifically, the activation function and the extraction step in the modified bootstrapping disclosed herein is performed before the slot to coefficient step of bootstrapping, which also improves the efficiency of the bootstrapping as compared to conventional bootstrapping.
As shown in
Unlike the conventional bootstrapping, an activation function is evaluated during an intermediate stage after the EvalSine step and before the CtoS stage. As noted above, the activation function can include, for example, a ReLU function, a sigmoid function, a tanh function, or a softmax function. This allows the modified bootstrapping to homomorphically evaluate an approximate polynomial of the activation function on the ciphertext since the inputs are encoded as a vector similar to CKKS encoding where the evaluation will be done component-wise. In the case of a ReLU function, a higher degree polynomial can also be used to approximate the ReLU function instead of a square function (i.e., x2) typically used for approximating a ReLU function. This enables a more precise output for a deep CNN with higher accuracy.
As another difference from usual bootstrapping, valid values can be extracted in the Ext step from the output vector of the activation function to represent strided or non-strided convolutions before the slot to coefficient step (i.e., the StoC step in
The modified bootstrapping disclosed herein is more efficient than conventional bootstrapping typically used with Vector Encoding because the final StoC step, which is more computationally costly than the EvalSine and activation function steps, is applied to ciphertext with a lower level L. In addition, and as noted above with reference to Table 1, the ciphertext using ConvFHE can pack twice as many real number messages than Vector Encoding. This improves the efficiency of the modified bootstrapping by enabling a larger number of input datasets to be used in a sparsely-packed polynomial where the size of the input is much less than the number of full message slots. The use of such sparsely-packed polynomials results in substantially less computational cost for the CtoS and StoC steps in bootstrapping. Processing times for different bootstrapping steps are discussed in more detail below with reference to
In some implementations, the extraction can be performed by multiplying a plaintext vector having 1 at valid positions and by 0 at all other positions in the vector. The extraction step is more complicated for strided convolutions though (i.e., convolutions with a stride greater than 1), since the packing structure of the output polynomial O(X) also needs modification. However, this extraction can still be performed more efficiently than with usual vector encoding where similar extraction is performed after the slot to coefficient step of bootstrapping.
A strided convolution can be represented by evaluating a convolution and then extracting appropriate entries from the strided output Ost(X) according to the stride. The extracted output entries correspond to the circled input entry locations in
During the extraction, the valid output values are located in the slots of a plaintext polynomial. However, the order of the slots is bit-reversed from those of the coefficients due to the conversion of the coefficients to slots in the CtoS step. With reference to
In performing the extraction step, the valid entries are extracted from the slots taking into consideration the bit-reversed order. With reference to
For the extraction in the modified bootstrapping, it then suffices to extract and move only the slot entries at (brev, 0, j′rev, 0, i′rev) to (brev, 0, 0, j′rev, i′rev), which can be performed by a multiply one-then-rotate operation according to the FHE scheme. In more detail, if w is the width of each input, then for each j′rev∈, each element or entry at (brev, 0, j′rev, 0, i′rev) can be moved to the left by (j′rev, 0∈) positions. This can be performed by (i) multiplying a plaintext vector having one at desired positions and 0 at other positions, as noted above, (ii) rotating the multiplied output corresponding to the required moves, and (iii) summing the outputs from each rotation. Notably, the total number of rotations and total number of multiplications for this extraction step is only w/2−1 and w/2, respectively, since the moves depend only on j′rev∈.
In comparison, the extraction step for the vector encoding discussed above with reference to
As the number of batches increases along the x-axis of
In
In block 902, the client device encodes one or more input datasets of real numbers into a plaintext polynomial with integral coefficients that do not include an imaginary component. The input dataset or input datasets can include, for example, financial data, health data, or other private data that a client may want to have analyzed by a CNN. In encoding the one or more input datasets, the client device may follow the Cf-EcdΔ portion of Relations 6 above so as to encode an input dataset using Equation 7 or using a sparsely packed plaintext polynomial Isp(X)=I(Xs) or Isp(X)=I(X2
In addition, the client device may determine in block 902 if the total size of a plurality of input datasets to be convolved is larger than a highest degree, N, of the input polynomial (e.g., Bw2>N). If so, the client device can set a subset of the input datasets to be convolved as the one or more input datasets that are encoded into the plaintext polynomial and then use the remaining input datasets from the plurality of input datasets for one or more additional plaintext polynomials that are then used for additional convolutions. This selection of a subset of input datasets for encoding can effectively lower the value of B so that Bw2≤N.
With reference to Relations 6 above, the input datasets can be represented by the real numbers r0, r1, . . . , rN−1. In addition, the input datasets can be organized to correspond to multiple dimensions of input data so as to represent matrices (e.g., 2D or 3D matrices), as is often the case with image data, for example. In such cases, a total row size or column size, such as w in Equation 7 above, may indicate a new row at every w+1-th value in the input dataset. An example implementation of the encoding in block 902 is discussed in more detail below with reference to the client device subprocess of
In block 904, the client device generates an input ciphertext by encrypting the plaintext polynomial using a first key (e.g., a public key) according to an FHE scheme. In some implementations, this can be similar to the CKKS encryption discussed above, but with using a different plaintext polynomial due to the differences in encoding the plaintext polynomial with the format of the Cf-EcdΔ portion of Relations 6 above, as compared to encoding messages of complex numbers with EcdΔ: → and DcdΔ: → for the CKKS encoding discussed above. The resulting input ciphertext can include an encrypted input polynomial with encrypted coefficients. In some implementations, the input ciphertext may instead include only the encrypted coefficients with an indication of the degree of the term for the encrypted coefficient in the input polynomial, such as by occupying a particular position in the input ciphertext.
In block 906, the client device sends the input ciphertext generated in block 904 to a server to perform at least one convolution on the input polynomial formed by the encryption and encoding. In some implementations, one or more modules operating at the client device may communicate with one or more modules operating at the server to exchange information that may be encrypted about the operations to be performed. Such information can include, for example, a row size or a column size of a kernel used by the server (i.e., k) for performing a convolution on the input ciphertext, a total number of input datasets encoded into the input ciphertext that may be used by the server to encode a kernel, and/or evaluation keys or public keys that may be shared with the server by the client device as part of the FHE scheme. In this regard, the modules of the client device and of the server may form part of a distributed application for performing convolutions with encrypted data in some implementations.
In block 908, the client device receives an encrypted result polynomial from the server. The result polynomial can represent the result of the server's evaluation of one or more convolutional layers of a CNN based on using the input ciphertext received from the client device as an input to a first convolutional layer of the CNN. Notably, the input ciphertext and the data used during the evaluation of the one or more convolutional layers by the server remain encrypted with the FHE scheme. As indicated by the dashed line, there may be a break in the processing by the client device as the convolutional layer or layers are evaluated by the server. In some implementations, the client device may use this break to perform other tasks, such as encoding additional input datasets to be sent to the server as input ciphertexts to determine corresponding results from the CNN.
In block 910, the client device decrypts the result polynomial according to the FHE scheme using a secret key to derive a decrypted result polynomial. The decryption in some implementations can follow the CKKS decryption of an encrypted polynomial. The secret key, unlike the public key or the evaluation key or keys, is generally not shared with other devices so that only the client device can perform the final decryption to obtain the result or results from the evaluation of the convolutional layers.
In block 912, the client device determines one or more CNN results by decoding the decrypted polynomial. The client device can use one or more decrypted coefficients of the decrypted result polynomial as inputs into the Cf-DcdΔ portion of Relations 6 above with m0, m1, . . . , mN−1 being the decrypted coefficients. In cases where the input ciphertext may have represented a batched input with multiple input datasets encoded into the plaintext polynomial, the result polynomial may indicate multiple corresponding CNN results for the different input datasets.
Those or ordinary skill in the art will appreciate with reference to the present disclosure that other implementations of
In block 1002, real numbers from one or more input datasets are separately multiplied by a scaling factor. With reference to the Cf-EcdΔ portion of Relations 6 above, the scaling factor is represented by Δ, which can control the precision of the arithmetic operations to be performed on the input ciphertext resulting from a later encryption of the plaintext polynomial. In some implementations, block 1002 may be omitted, such as where there is not a need for a scaling factor (e.g., Δ=1).
With reference to Relations 6 above, the input datasets can be represented by the real numbers r0, r1, . . . , rN−1. In addition, the input datasets can be organized by the client device to reflect datasets of multiple dimensions (e.g., 2D or 3D data) where the order of the real numbers in a dataset can indicate different rows of values. For example, with reference to Equation 7 above, a w value of four can indicate that every fifth value in the input dataset is the beginning of a new row in a matrix represented by the input dataset.
In block 1004, the client device determines the coefficients of the plaintext polynomial by rounding, to the nearest integer, each of the corresponding products of the real numbers of the one or more input datasets and the scaling factor. This is indicated by the operators └⋅┐ in Relations 6 above.
In block 1006, the client device uses the determined coefficients from block 1004 as coefficients in the plaintext polynomial. As discussed above, the plaintext polynomial is in the ring of polynomials defined by =[X]/(XN+1) with N being a power of two and [X] indicating that the coefficients of the plaintext polynomial are integers. In addition, XN=−1 for the polynomial in the ring, and as noted above when simplifying the product of the input polynomial and the kernel polynomial, Xt=−XN+t if t<0.
The plaintext polynomial can follow the format of I(X) in Equation 7 for encoding a single input dataset or can follow a sparsely-packed format for the plaintext polynomial as discussed above with Isp(X)=I(Xs) or Isp(X)=I(X2
Those of ordinary skill in the art will appreciate with reference to the present disclosure that other implementations of the encoding process of
In block 1102, the server receives an input ciphertext from the client device including at least the encrypted coefficients on an input polynomial encrypted according to an FHE scheme. The input ciphertext can include the variables (e.g., X) of each term of the input polynomial with the degrees of the variable (e.g., X6). In some implementations, the input ciphertext may include only the encrypted coefficients with the degree of the variable being indicated by its position in the ciphertext or by another indicator for the term immediately following the encrypted coefficient.
As discussed above with reference to
In block 1104, the server encodes a kernel into one or more kernel polynomials using kernel values from the kernel as kernel coefficients. This may be performed using a coding module of the server (e.g., coding module 16 in
In the example of
In block 1106, the server performs a convolution using the kernel by at least in part separately multiplying the input polynomial by the one or more kernel polynomials encoded in block 1104. The multiplication or multiplications result in one or more corresponding convolved polynomials that are used by the server to form an output ciphertext. As discussed in more detail below with reference to the modified bootstrapping process of
In the case of multiple input datasets being encoded into the plaintext polynomial and the separate multiplications of the input polynomial with respective kernel polynomials, the resulting convolved polynomials are distinct ciphertexts due to the FHE scheme that the server then needs to combine or pack into a single output ciphertext. As discussed above, Algorithm 2 can be used by the server to pack selected coefficients from the different convolved polynomials into a single convolved polynomial that serves as the output ciphertext representing the result of the convolution on the input ciphertext using the kernel. As noted above, the result of the homomorphic operations disclosed herein are approximations of the arithmetic operations being performed on the unencrypted counterparts (e.g., a convolution performed on the input dataset using the kernel). An example implementation of the packing of coefficients from the different convolved polynomials into an output ciphertext is discussed in more detail below with reference to the subprocess of
In block 1108, the server performs a modified bootstrapping on the output ciphertext that includes evaluating an activation function to derive an encrypted result polynomial. As discussed above, a ciphertext of level L typically allows L sequential homomorphic multiplications to be performed on the ciphertext while still being able to accurately decrypt the result. A bootstrapping operation refreshes the ciphertext with a new ciphertext of a similar form that has a reset level of L to allow for more homomorphic multiplications.
Unlike conventional bootstrapping, the modified bootstrapping disclosed herein that can be performed by the activation module of the server includes the evaluation of an activation function and the possible extraction of values from an interim output vector based on a stride of the convolutional layer before converting values or slots from the output vector back into coefficients. Conventional bootstrapping does not include the evaluation of an activation function and such extraction based on the stride as part of the bootstrapping. Instead, these operations are typically performed after the completion of the bootstrapping. Modifying the bootstrapping this way not only facilitates the encoding disclosed herein, but is also more efficient as discussed above since the activation function is performed on a ciphertext with a lower level, which decreases the computational complexity of evaluating the activation function. More detail on the modified bootstrapping is provided below with reference to the subprocess of
In block 1110 in
On the other hand, if there are not more convolutional layers to evaluate for the CNN in block 1110, the process of
Those of ordinary skill in the art will appreciate with reference to the present disclosure that other implementations can include a different order of blocks or different blocks than shown in
In block 1202, the server selects a subset of coefficients from each convolved polynomial based on a scaled geometric sequence with a common ratio of two between each successive term in the geometric sequence. In other words, the selection can be the j·2s-th coefficients from each convolved polynomial, where s is an integer initial log step that is greater to or equal to one and corresponds to the log base two of the number of convolved polynomials (e.g., s=└log2B┐). In some implementations, this can include performing Algorithm 2 above, which uses n−1 rotations and 2(n−1) plaintext multiplications without consuming any multiplicative depth in terms of the level of the ciphertexts, L, where n is the total number of convolved polynomials. In performing the rotations, the server can use an evaluation key as part of the FHE scheme that may be stored at the server as part of keys 20 in
In some implementations, the initial log step can be set to s+log2B when the input polynomial has been packed even more sparsely with I(X2
In block 1204, the selected subset of coefficients from the convolved polynomials are used by the server to form the output ciphertext representing the output of the convolution of an input ciphertext and a kernel. The output ciphertext can have the form of μ(X)=n(μ0+μ1X+ . . . +μN−1XN−1)∈, where denotes the ring of polynomials discussed above for the input and kernel polynomials. The output ciphertext may then be used as an input to a modified bootstrapping process that includes evaluating an activation function for a convolutional layer.
In this regard,
In block 1302, the server converts coefficients of an output ciphertext into a plurality of slots of an input vector according to the FHE scheme. This corresponds to the CtoS step of the modified bootstrapping of
In block 1304, the server performs a modular reduction that is approximated by applying a scaled sine function on each slot of the input vector to generate reduced slots. Block 1304 and block 1302 may be similar to conventional bootstrapping where an encoding algorithm is homomorphically applied to enable a parallel or slot-wise evaluation. However, unlike conventional bootstrapping, the subprocess of
In block 1306, the server evaluates an activation function for the reduced slots in the input vector to generate an output vector. As discussed above, the activation function can include, for example, a ReLU function, a sigmoid function, a tanh function, or a softmax function. As appreciated by those of ordinary skill in the art, the activation function can be performed on the output of a convolution to transform a weighted sum resulting from the convolution into a result or output for a convolutional layer of a CNN. By performing the activation function on the reduced slots of the input vector, the operations of the activation function are typically simplified as compared to performing the activation function on the coefficients of a polynomial. In the case of a ReLU function, a higher degree polynomial can also be used to approximate the ReLU function than the square function (i.e., x2) typically used for approximating a ReLU function. This enables a more precise output for a deep CNN with higher accuracy.
In block 1308, the server extracts values from the output vector resulting from the evaluation of the activation function for the slots of the input vector in block 1306. As discussed above with reference to the modified bootstrapping of
In block 1310, the server converts the extracted valid values from block 1308 into encrypted output coefficients of an encrypted result polynomial according to the FHE scheme. As compared to conventional bootstrapping, performing the activation function and the extraction steps before performing the conversion of the slots to coefficients (i.e., the StoC step) improves the overall efficiency of the bootstrapping process by performing the more computationally complex operations of the StoC step at a lower level of the ciphertext. The StoC step, which can be similar to a CKKS decoding, results in an encrypted result polynomial according to FHE scheme. The encrypted result polynomial may then be used as an input polynomial for a next convolutional layer or may be sent to the client device as an encrypted result of the CNN if the activation function was performed for a final convolutional layer of the CNN.
Those of ordinary skill in the art will appreciate with reference to the present disclosure that other implementations of the modified bootstrapping subprocess of
The foregoing systems and methods for evaluating convolutional layers using encrypted data can significantly improve the processing time of evaluating convolutional layers with FHE, as demonstrated by
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks, modules, and processes described in connection with the examples disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Furthermore, the foregoing processes can be embodied on a computer readable medium which causes processor or controller circuitry to perform or execute certain functions.
To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, and modules have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those of ordinary skill in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, units, modules, processor circuitry, and controller circuitry described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a GPU, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. Processor or controller circuitry may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, an SoC, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The activities of a method or process described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by processor or controller circuitry, or in a combination of the two. The steps of the method or algorithm may also be performed in an alternate order from those provided in the examples. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable media, an optical media, or any other form of storage medium known in the art. An exemplary storage medium is coupled to processor or controller circuitry such that the processor or controller circuitry can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to processor or controller circuitry. The processor or controller circuitry and the storage medium may reside in an ASIC or an SoC.
The foregoing description of the disclosed example embodiments is provided to enable any person of ordinary skill in the art to make or use the embodiments in the present disclosure. Various modifications to these examples will be readily apparent to those of ordinary skill in the art, and the principles disclosed herein may be applied to other examples without departing from the spirit or scope of the present disclosure. The described embodiments are to be considered in all respects only as illustrative and not restrictive. In addition, the use of language in the form of “at least one of A and B” in the following claims should be understood to mean “only A, only B, or both A and B.”
This application claims the benefit of U.S. Provisional Application No. 63/423,952 titled “EVALUATING CONVOLUTIONS USING ENCRYPTED DATA” (Atty. Docket No. WDA-6488P-US), filed on Nov. 9, 2022, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63423952 | Nov 2022 | US |