The present invention relates to the field of cyber security. More particularly, the invention relates to a system and method for performing Secure Multi-Party Computation (SMPC), with no communication between the parties, using Deep Neural Networks (DNNs) and to distributed computing, with applications in Blockchain systems, including private executable service contracts and NFTs and DNN Coins.
Deep Neural Networks (DNN) are the state-of-the-art form of Machine Learning (ML) techniques. DNNs are used for speech recognition, image recognition, computer vision, natural language processing, machine translation, and many other applications. Similar to other Machine Learning (ML) methods, DNN is based on finding patterns in the data and, hence, the method embeds information about the data into a concise and generalized model. Subsequently, the sharing of the DNN model also reveals private and valuable information about the data.
The structure and weights of Deep Neural Networks (DNNs) typically encode and contain very valuable information about the dataset that was used to train the Neural Network (nn). One way to protect this information when a DNN is published is to perform an interference of the network using Secure Multi-Party Computations (SMPC).
There are situations where the data owner collects the data, trains the model, and shares the model to be used by clients. The model is very valuable for the data owner, since the training process is resource-intensive and frequently performed over private and valuable data. Therefore, the data owner wishes to retain control of the model as much as possible after it was shared. The data owner will likely be willing to delegate the query service to (Machine Learning Data model Store MLDSore [7]) clouds, in a way that the cloud providers do for computing platforms. In case, the cloud providers should not be able to simply copy the model and reuse it. In addition, the data owner should have the ability to limit the number of queries executed on the model, such that a single, or a small team of colluding cloud providers (or servers) cannot execute an unlimited number of queries on the data model.
A DNN can be represented by a (nested) polynomial, therefore enough queries (points on the polynomial) can reveal the neural network, and the ownership of the information (succinct model) is at risk. In practice, as data is constantly updated, new data emerges and old data becomes obsolete. Therefore, frequent updates of the neural network are sufficient to define a new polynomial, for which the past queries are not relevant.
A previous study [6] shows an algorithm to distributively share DNN-based models while retaining control and ownership over the shared model. The activation functions of neural network units were approximated with polynomials and an efficient, additive secret sharing based, MPC protocol for information-secure calculations of polynomials was shown.
Calculating the neural network activation with secure multi-party computations algorithms requires the translation of activation functions from floating-point arithmetic to fixed-point. The regular activation functions of neural network units use floating-point arithmetic for the accuracy of the calculations, as the penalty of such calculations on modern CPUs is small. However, distributed MPC protocols are built for fixed-point arithmetic, and many times even for limited range values.
Approximation of neural network units' activation function with fixed-point arithmetic was described in [9, 10], where polynomial functions were suggested for approximation. In both references, the network was not distributed but rather was translated into fixed-point calculations to run over encrypted data. However, the methods of approximation are similar to the MPC case.
CryptoDL [10] showed an implementation of Convolutional Neural Networks (CNN) over encrypted data using homomorphic encryption (HE). Since fully homomorphic encryption is limited to addition and multiplication operations, CryptoDL [10] has shown approximation of CNN activation functions by low-degree polynomials due to the high-performance overhead of higher degree polynomials.
The calculation of neural networks with secure multi-party computations was described in [12]. Sigmoid and softmax activation functions were replaced with functions that can be calculated with MPC. However, the polynomial approximation of the sigmoid function requires at least a 10-degree polynomial, which causes a slow performance with garbled circuit protocol. Therefore, they propose to replace sigmoid and softmax activation functions with a combination of ReLu activation functions, multiplication, and addition. However, this last reference had a limitation for two participating parties and the algorithm was shown to be limiting in terms of performance and the practical size of the network.
CrypTFlow [11] describes a system that automatically converts TensorFlow (TF) code into secure multi-party computation protocol. The system comprises a compiler, from TF code into two and three-party secure computations, an optimized three-party computation protocol for secure interference, and a hardware-based solution for computation integrity. The most salient characteristic of CrypTFlow is the ability to automatically translate the code into MPC protocol, where the specific protocol can be easily changed and added. The optimized three-party computational protocol is specifically targeted for NN computation and speeds up the computation. This approach is similar to the holistic approach of [1].
SecureNN [15] proposed arguably the first practical three-party secure computations, both for training and for activation of DNN and CNN. The improvement over the state-of-the-art results is achieved by replacing garbled circuits and oblivious transfer protocols with secret sharing protocols, which allowed information security rather than of computational security. This reference provides a hierarchy of protocols allowing the calculation of activation functions of neural networks. However, these protocols are specialized for three-party computations and their adaptation for more computational parties is complex. Also, theproposed protocols require ten communication rounds for ReLu calculation of a single unit, excluding counting share distribution rounds.
Another approach to speed up performance is deccribed in [5], which concentrated on two-party protocols and showed a mixed protocol framework based on Arithmetic sharing, Boolean sharing, and Yao's garbled circuit (ABY). Each protocol was used for its specific ability, and the protocols are mixed to provide a complete framework for neural networks activation functions.
It is therefore an object of the present invention to provide a system and method for performing secure multi-party computation, with no communication between the parties, using Deep Neural Networks (DNNs).
It is another object of the present invention to provide a system and method for performing secure multi-party computation, which supports any number of participating servers.
It is a further object of the present invention to provide a system and method for performing secure multi-party computation, which optimizes communication-less MPC calculations of a shared DNN.
It is still another object of the present invention to provide a system and method for performing secure multi-party computation, which is capable of nesting a multi-layer polynomial, to reduce the redundant calculations of the intermediate layers.
Other objects and advantages of the invention will become apparent as the description proceeds.
A method for performing effective secure multi-party computation by participating parties being one or more computerized devices for executing the tasks, with no communication between the parties, using at least one trained Deep Neural Network (DNN), comprising:
The trained DNN may be approximated by a single polynomial or by a single nested polynomial.
The polynomial approximation of each layer may be nested within the approximation of the next layer, such that a single polynomial approximates several layers of the DNN, or the entire DNN.
In one aspect, perfect information theoretically secure, secret-sharing MPC calculation of the polynomial representing the DNN may be preformed.
The activation functions may be selected form the group of:
The polynomial represents multiplications and additions may be performed by a convolution layer of the DNN.
Calculation of the polynomial may be performed using the Add.split procedure implementing a secret sharing scheme.
Approximation multiple layers of the DNN may be performed by combining multiple layers into a single polynomial activation function, according to the connectivity of the layers.
Non-dense layers may be approximated using corresponding inputs from the previous layer.
The degree of the polynomial may be decreased by nesting.
The polynomial degree at layer l may be limited by:
a) keeping the polynomials from layer l−1 as a sum of lower-degree polynomials; b) calculating the polynomial Pij only once, during the calculating of each multi-layer polynomial.
The network architecture may be concealed by:
The input may be revealed to all participating parties and the secrets are the weights of the trained DNN.
The input may be distributed by secret sharing.
The polynomial may be blindly computed, where some of its coefficients being secret shares of zero, thereby allowing blind execution of the DNN.
The machine learning may be delegated of to a third party without revealing information about the collected data, the inputs/queries, and the outputs by using FHE and a nested polynomial.
A method for performing Statistical Information-Theoretical Secure (SITS) Distributed Communication-Less Secure Multiparty Computation (DCLSMPC) of a Distributed Unknown Finite State Machine (DUFSM), comprising:
The computed transition function may be kept private by using secret shares for all coefficients of the polynomial, while revealing only a bound on the maximal degree k of the polynomial.
Independent additions and multiplications of the respective components of two (or more) numbers over a finite ring may be enabled using CRT representation, where each participant performs calculations over a finite ring defined by its corresponding prime number.
The transition function of the state machine may be represented by a bi-variate polynomial from the current state (x) and the input (y) to the next state (z).
The transition function of the state machine may be represented by a univariate polynomial defined by using the most significant digits of (x+y) to encode the state (x) and the least significant digits, to encode the input (y).
More parties may be added to the computation, whenever the result of a calculation does overflow the ring bounds
Larger primes may be chosen for computation, whenever the result of a calculation does overflow the ring bounds.
A distributed calculation may be carried out using a dealer-worker scheme, where a single party being a dealer is responsible for the assignment of tasks and collection of the results, while other parties being workers are responsible to the calculation itself. The dealer may be allowed to generate the appropriate primes and distributes them to the workers where throughout the computation, the dealer manages a queue that is shared with the workers in such a manner that every time an input arrives, the input is pushed to the queue and popped in turn by the workers and then the dealer is allowed to recover the result.
Computations may be executed with respect to a unique modulus, to prevent overflow, or exceeding the finite ring.
The workers may perform a blind modulo reduction to the result, keeping it inside the field.
The dealer may initialize an FHE for encrypting both the initial value and the incoming input, decrypt the encrypted results and reassemble the results by the CRT into a single solution.
A method for managing a trained data model of a neural network, comprising allowing a blockchain registered data owner to sell the rights on the data model services and/or at least a part of the ownership to other parties by representing the owned data as an executable cryptocoin.
The cryptocoin may be selected to provide beneficial reaction to requests including the examples from the group of:
A system for performing effective secure multi-party computation by participating parties being one or more computerized devices for executing the multi-party computation, with no communication between the parties, using at least one trained Deep Neural Network
(DNN), comprising one or more computerized devices that contain one or more processors being adapted to:
The above and other characteristics and advantages of the invention will be better understood through the following illustrative and non-limitative detailed description of preferred embodiments thereof, with reference to the appended drawings, wherein:
The present invention relates to a system and method for performing effective secure multi-party computation, with no communication between the parties, using Deep Neural Networks (DNNs), which can be approximated with polynomial functions representing a single, or multiple layers.
In a first embidiment, a trained neural network has been approximated with a single (possibly nested) polynomial, to speed up the calculation of the polynomial on a single node. Accordingly, the polynomial approximation of each layer was nested within the approximation of the next layer, such that a single polynomial (or arithmetic circuit) will approximate not only a single network unit, but several layers, or even the entire network. This embodiment provides an efficient, perfect information theoretically secure, secret-sharing MPC calculation of the polynomial calculation of DNN.
The present invention provides a translation of deep neural networks into polynomials (which are easier to calculate efficiently with MPC techniques), including a way to translate complete networks into a single polynomial and how to calculate the polynomial with an efficient and information-secure MPC algorithm. The calculation is done without intermediate communication between the participating parties, which is beneficial. The participating parties may be one or more computerized devices (such as remote computers, remote servers or hardware devices that contain one or more processors).
The goal is to approximate the activation functions that are a typical part of DNNs, by polynomials, while focusing on the most commonly used functions in neural networks.
The weighted sum is a multiplication of inputs X1, . . . , Xn by the corresponding weights S=Σi=1nwiXi−b.
The sum is approximated with a polynomial, as it is a vector multiplication of weight with the n-dimensional input, i.e., a polynomial of degree 1.
Approximating DNN activation functions are focused on several common functions:
These functions can be approximated with a polynomial using various different methods, for example as described in [16, 10, 12, 1, 13]. The optimization method proposed by the present invention is agnostic to a specific approximation method.
The communication-less approach proposed by the present invention allows to use a higher degree polynomials, where a single (nested) polynomial is used for many or even all units (30-degree Chebyshev polynomials achieve good results).
A convolution layer is used in Convolutional Neural Networks (CNN), mainly for image recognition and classification. Usually, this layer performs dot product of a (commonly) n×n square of data points (pixels), in order to calculate the local features. The convolution layer performs multiplication and addition, which are directly translated into a polynomial.
Max and Mean pooling compute the corresponding functions of a set of units. Those functions are frequently used in CNN following the convolution layers. Reference [16] suggested replacing max-pooling with a scaled mean-pooling, which is trivially represented by a polynomial. However, this requires the replacement to be done during the training stage.
For networks that did not replace max pooling with mean and as an alternative, max function can be approximated by:
When d=1 the approximation is reduced to a scaled mean pooling function, i.e., without division by the number of elements.
A simple and practical variation of the equation (1) is:
The function provides an approximation near any values of x and y, which is an advantage over Taylor or Chebyshev approximations, that are developed according to a specific point. Despite its simplicity, equation (2) provides a relatively good approximation, as shown in
Using a two-variable function for the max pooling layer of k inputs requires chaining of the max functions:
max(x1, x2, . . . xk)=max(x1, max(x2, . . . , max(xk−1, xk))).
Alternatively, the optimization sequence is interrupted at the max-pooling layer, which will require an MPC protocol for the max function calculation, as described, for example, in [15].
A protocol that meets the above requirements is introduced as part of a full MPC scheme. For the calculation of polynomial the Add.split procedure is used, shown below in Algorithm 1 for completeness.
Add.split procedure: given an element s∈p, where p is a finite field containing p-primer elements, the procedure returns k additive secret shares whose sum is s.
p-prime number, s∈p—a secret to share and k∈. Y1, . . . , Yk−1←—randomly chosen from p Yk←s−Σi=1k−1Y1(Y1, . . . , Yk)—a sequence of secret shares of s.
Specifically, Add.split procedure is a perfectly-secure secret sharing scheme with threshold N−1. Each party calculates the polynomial known to it and sends the results to other parties. A sum of all results will be the result of the polynomial activation on the given input.
Add.split protocol generates a set of secret shares which are distributed among the k parties. The polynomial is calculated as follows:
given a polynomial p of degree d and input value x, the protocol calculates the value of p(x) using k participating parties: C1, . . . , Ck, such that no party learns p.
The protocol generated additive secret shares for every polynomial coefficient and distributes those shares among participating parties. This round can be done once in several calculations. The next step is sending the input x to the parties and receiving the output of polynomial activation of each party i: pi(x). The final result is the sum of the received output.
This algorithm requires two rounds of communications per input and an additional round of secret sharing. The amount of data transferred by the algorithm is linear with respect to the polynomial degree, which makes the algorithm very efficient.
LSTM is a subset of Recurrent Neural Network (RNN) architecture, whose goal is to learn sequences of data. LSTM networks are used for speech recognition, video processing, time sequences, etc.
There are many different variations of LSTM units with a usual structure, including several gates or functions, which enable the unit to remember values over several cell activations. A common activation function of LSTM units is the logistic sigmoid function.
As an approximation of multiple layers increases the degree of the polynomial function, the present invention provides possible techniques to use (intermediate) communication to keep the degree low. The DNN may be approximated with a single polynomial on a single computing node. Since the approximation exists for all the common activation functions, it is possible to combine multiple layers into a single polynomial function according to the connectivity of the layers.
This is done by creating a polynomial for the “flow” of the data in the network instead of approximating every single neural unit with a polynomial.
In this example, the network consists of an input layer (I) on the left, two dense hidden layers (U1 and U2), and one output layer O, which is implemented by the softmax function. The units are marked as uli where l is the hidden layer number and i is the number of the unit in the layer. It is assumed that the activation functions of the hidden layers are ReLu (or any other function that can be approximated by a polynomial function).
A unit u11 calculates the function which is approximated by the polynomial. Assuming that ReLu activation functions are approximated using a polynomial of d-degree:
ReLu(ΣiwiIi)≈P11=Pol11(ΣiwiIi). (3)
Unit u21 receives P11 and P12 as inputs and calculates the “nested” polynomial function:
P
21=Pol21(ΣiwiP1i). (4)
Generally, assuming dense layers, the nested polynomials are defined as:
P
lj=Pollj(ΣiwiP(l−1)i). (5)
In this simple case, the result of networks evaluation can be calculated by evaluating two polynomials of d2-degree: P21 and P22, and calculating the output layer function of their output. Overall, by approximating softmax by Polsm, the following polynomial for the entire network is obtained:
DNN(x)=Polsm(w1oP21+w2oP22)=Polsm(w1oPol21(w121P11+w221P12)+w2oPol22(w122P11+w222P12))=Polsm(w1oPol21(w121Pol11(w111I1+w211I2)+w221Pol12(w112I1+w212I2))+w2oPol22(w122Pol11(w111I1+w211I2)+w222Pol12(w112I1+w212I2))) (6)
P11 and P12 were calculated twice as they are used as inputs for both U21 and U22 units.
Non-dense layers are approximated in a similar way but with only the corresponding inputs from the previous layer. An example of such architectures is CNN, commonly used for image recognition. CNN layers have a topographic structure, where neurons are associated with a fixed two-dimensional position that corresponds to a location in the input image. Thus, each neuron in the convolutional layer receives its inputs from a subset of neurons from the previous layer, that belong to the corresponding rectangular patch. For the polynomial approximation of such networks, the polynomial approximating the unit depends only on the relevant units from the previous layer. In case when the interconnections of the networks, part of the architecture, are part of a secret as well, the network is considered to be dense, but the weights of the “pseudo”-connections are set to zero, thereby, achieving the same effect as not connecting the units at all.
Neural units' calculation is the most common operation in the DNN feed-forward operation. The approximation of the operation with polynomial significantly increases the complexity of the activation function. For instance, ReLu function is in its essence a simple if condition, and is approximated with a 30-degree polynomial.
The impact is less severe than expected, as the approximation is applied to the networks after their training phase, which is extremely computationally intensive. However, it is still significant if many inference calculations are performed. In a multi-layer polynomial approximation, the degree of polynomials increases leading to an increase of performance overhead as well.
In order to limit polynomial degree at layer l, the polynomials from l−1 layer are not explicitly expanded into a single function (see Equation 5), but rather kept as a sum of lower-degree polynomials. In this way, each polynomial Pij is calculated only once in the process of the calculation of each multi-layer polynomial. This will limit the degree of the polynomial and eliminate redundant calculations.
In case the network architecture in itself is valuable, it might be desired to conceal the correct architecture from cloud providers. The naïve multi-layer polynomial approximation hides the network architecture somewhat, even though, the degree of the polynomial might be a telling factor.
A way to conceal the exact network architecture is to add “pseudo”-nodes to the network. Those nodes will not contribute to the network inference but will add noise to the network architecture.
The location of the units is randomized and the number of the units depends on the need to hide the original network architecture.
The MPC calculations is to protect the published model from exposure to participating cloud providers. The model is trained by the data provider and has two components: architecture, which includes the layout, type, and interconnection of the neural units, as well as the weights of the input, which were refined during the training of the network, i.e., during a back-propagation phase.
It is required to protect the weights that were obtained by a costly process of training. While the architecture also might hold ingenious insights, it is considered less of a secret and may be exposed to the cloud providers.
Any MPC protocol can be used, preferably if it is compatible with the following requirements:
A number of existing MPC protocols answer those requirements [2, 4]. These MPC protocols based on Shamir secret sharing [14] can cope with a minority of semi-honest parties and even with a third of the malicious parties. BGW protocol [2] provides perfect security and [4] provides statistical security with any desirable certainty. In this case, the input is not a multi-variable that is secret-shared, but rather the weights and coefficients of the network are the secrets.
In a first scenario, the input is revealed to all participating parties. In this case, the secrets are the weights of the trained network. The input values can be considered as numerical constants for the MPC calculation and thus, communication rounds can be eliminated completely (see BGW [2] algorithm where additive “gates” are calculated locally without any communication).
Given a secret-share of coefficient a: s=[s1, s2].The polynomial p(x) can be calculated as p(x)=p1(x)+p2(x), where p1(x) and p2(x) use the corresponding secret share.
In the second scenario, the input values are protected as well, and thus, they are distributed by secret sharing. As the input values are raised to polynomial degree k, secret sharing is done on the set of values: X=[x, x2, . . . xk]. Multiplication of secret shares generally requires communication rounds, still when secret sharing every element of X, it is possible to eliminate the communications all-together using the method described in [3]. Nested polynomials cannot be used in this case, since the polynomial terms have to be regrouped for nesting, and secret-sharing of inputs prevents that.
The present invention also provuides the techniques for blindly computing a polynomial (i.e., some of its coefficients being secret shares of zero), to obtain blind execution of DNN. Since the neural network activation functions are not limited to a specific set, there might be networks that cannot be approximated. However, the majority of networks use a rather small set of functions and architectures.
Once the neural network is presented by a single polynomial it can be calculated without a single communication round (apart from the input distribution and output gathering) when the inputs are revealed, or with half the communication rounds when the inputs are secret. Therefore, the data owner can train DNN models, pre-process, and share them with multiple cloud providers. The providers then can collaboratively calculate interference of the network on common or secret-shared inputs without ever communicating with each other, thereby reducing the attack surface even further even for multi-layer networks.
Moreover, the data owner may sell the rights similarly to a Non-Fungible Token (NFT—a non-interchangeable unit of data stored on a blockchain, a form of digital asset on a ledger, that can be sold and traded), on the data model to others. This in turn, can be regarded as an executable cryptocoin (cryptocoin is a digital currency designed to work as a medium of exchange through a computer network), DNNCoin where the digital asset is a DNN network and the service it can provide,cryptocoin that provides udseful output with Blockchain registered owner. Other executable NFTs and cryptocoins can be based on executable-NFTs/executable-cryptocoin framework proposed by the present invention, and can be traded in the (crypto) market. For example, stock recommendation, executable-NFTs/SRCoin where the coin provide service of Stock Recommandation, Psychological Advisor executable-NFTs/PACoin, where the coin provide Psychological Advising service, and even entertaining jumping cat, where animated creature reacts to requests executable-NFTs/JCCoin such as a Jumping Cat.
The present invention also provides a method for managing a trained data model of a neural network, by allowing a blockchain registered data owner to sell the rights on the data model services and/or at least a part of the ownership to other parties by representing the owned data as an executable cryptocoin (such as DNNCoin, SRCoin, PACoin, JCCoin, stock recommendation executable, psychological advisor executable, entertaining jumping cat executable), in order to provide beneficial reaction to requests.
In this embodiment, the proposed polynomial neural network representation facilitates an efficient execution of the inference by an untrusted third party, without revealing the machine learning (big) data, the queries, and the results.The reduction of Neural Networks to nested polynomials, facilitate inference over encrypted polynomial coefficients and encrypted inputs using computational secure (unlike the perfect information theoretic secure of the other scheme proposed here) Fully Homomorphic Encryption (FHE) [8]. The nested polynomial that represents fully connected layers can still be calculated in polynomial time (the total number of connections is quadratic in the number of every two layers of neurons), so some of the encrypted coefficients (or edge weights) can be an encrypted zero which in fact yields an (unrevealed) subset of the Neural Network.
Accordingly, by using FHE and the nested polynomial, it is possible to perform delegation of machine learning to a third party (e.g., a cloud provider) without revealing anything about the (big) data (collected), the inputs/queries, and the outputs. For example, when the neuron computes the max function, the nested polynomial can integrate actual FHE computation of the max over the inputs arriving from the previous layer, rather than a polynomial over these inputs. A neuron is computed as polynomial over input polynomials (values), and two (or more) results can be computed for each neuron: one a polynomial over the inputs to the neuron and one an FHE max value over the input. Then an encrypted bit(s) is used to blindly choose among the results, i.e. between polynomial or “direct” FHE calculation of the neuron activation function.
The computation costs are increasing linearly with the polynomial degree (data not shown), where the original ReLu is similar to d=1 degree polynomial. Thus, it makes sense to choose the lowest degree that still provides consistent and accurate results.
The present invention is also directed to a Statistical Information Theoretic Secure (SITS) system utilizing Chinese Remainder Theorem (CRT—a theorem that states that if one knows the remainders of the Euclidean division of an integer n by several integers, then one can determine uniquely the remainder of the division of n by the product of these integers, under the condition that the divisors are pairwise coprime, such that no two divisors share a common factor other than 1) coupled with Fully Homomorphic Encryption (FHE) for Distributed Communication-less Secure Multiparty Computation (DCLSMPC) of any Distributed Unknown Finite State Machine (DUFSM). Accordingly, secret shares of the input(s) and output(s) are passed to/from the computing parties, while there is no communication between them throughout the computation.
The present invention also provides a transition table representation and polynomial representation for arithmetic circuits evaluation, joined with a CRT secret sharing scheme and FHE to achieve SITS communication-less within computational secure execution of DUFSM. FHE implementation has a single server limitation to cope with a malicious or Byzantine server. Several distributed memory-efficient solutions that are significantly better than the majority vote in replicated state machines are used, where each participant maintains an FHE replica. A DUFSM is achieved when the transition table is secret shared or when the (possible zero value) coefficients of the polynomial are secret shared, implying communication-less SMPC of an unknown finite state machine.
The processing of encrypted information where the computation program is unknown is an important task that can be solved using communication among several participants. However, this communication reveals the participants to each other and requires a non-negligible overhead concerning the communication between them. Computational secure communication-less approaches can also be suggested, either for the case of known automaton and global inputs, or for the case of computational security alone. Here, the first communication-less solution that is statistical information-theoretical secure, with (FHE-based) computational secure scheme is presented.
Distributed computing uses replicated state machines, which is implemented based on a distributed consensus [22]. The present invention also provides a sharing scheme that is based on a secret shared transition function or a unique polynomial over a finite ring for implementing e.g., Boolean function, state machine transition, control of RAM, or control of Turing Machine.
For any state machine, this polynomial encodes the information of all the transitions from a state x and input y to the next state z. The information may also contain the encoding of the output. Once the polynomial is adjusted, it is described by an arithmetic circuit that can be evaluated distributively by the SMPC participants. Each participant evaluates the arithmetic circuit using the CRT-SITS secret sharing scheme where the shares are encrypted using FHE. Consequently, the possibility for (value secured) additions and multiplications with no communication is achieved. This polynomial representation of the transition function keeps the actual computed function private by using secret shares for all (zero and non-zero) coefficients of the polynomial, while revealing only a bound on the maximal degree k of the polynomial.
The CRT representation allows independent additions and multiplications of the respective components of two (or more) numbers over a finite ring. This way, it is possible to compute arithmetic circuits in a distributed fashion, where each participant performs calculations over a finite ring defined by the (relatively) prime number they are in charge of. Thus, a distributed polynomial evaluation is obtained, where several participants do not need to communicate with each other.
The transition function of a state machine may be represented by a bi-variate polynomial from the current state and the input to the next state (and output). Namely, a bi-variate polynomial can be defined by the desired points that define the transition from the current state (x) and the input (y) to the next state (z), which may encode the output, as well. Alternatively, a univariate polynomial can be defined by using the most significant digits of (x+y) to encode the state (x) and the least significant digits, to encode the input (y). The output state (z) occupies the same digits of (x) that serve to encode the next state, while the rest of the digits in (z) are zeros. Thus, the next input can be added to the previous result and be used in computing the next transition, and so forth.
By using the scheme of the present invention, a more efficient version of the replicated state machine (and Blockchain) can be implemented, with only a logarithmic-sized memory compared to the legacy replicated state machine.
Naturally, several known error correction techniques that rely on features of the CRT (depicted in [19],[20]) can eliminate the influence of Byzantine participants. These schemes are not designed to preserve the fully homomorphic property of CRT secret sharing, just as the CRT threshold secret sharing does not support additions and multiplications as the values can exceed the global maximal value the (original, with no additional error-correcting values) mutual primes can represent. Still, when using FHE, a computation can be designed to never exceed this maximal value and be error corrected.
A distributed secure multiparty computation may be preferred when FHE is executed over a single server (since the server can be Byzantine).
Recently, extensive work on computationally secure communication-less computation has been done, see [14, 9] and in references therein. However, the computation security
is only based on the belief that one-way functions exist [10]. Several other works in the scope of perfect information-theoretically secure schemes were presented in [13, 15, 12, 5, 16]. Unfortunately, neither of them can compute all possible functions, and they require either communication or a need for exponential resources to maintain continuous functioning. Function secret sharing (FSS) is described in-depth in [8], and provides an efficient solution for MPC. However, the suggested scheme relies on secret keys and random masks that are applied to the inputs. While both techniques reflect computational difficulty, there is no backup in the form of SITS, in case the secret key is revealed or the Pseudo Random Generator (PRNG) is not sufficient. The suggested scheme is an alternative to a replicated state machine with no communication, while improving the communication overhead of the secret shared random-access machine presented in and the secret shared Turing machine presented in [13].
This SITS within FHE approach can also be used in implementations for distributed, efficient, databases [3], Accumulating Automata with no [15] communication or even for ALU operations in the communication-less RAM implementation [17].
Let p1<p2<. . . <pk where pi are relatively prime and a set of congruence equations a≡ai(modpi) for 1≤i≤k for k>0 and where ai are remainders. The original form of the CRT states that this given set of congruence equations always has one and exactly one solution modulo Π1kpi.
The most important feature of the CRT, is the possibility of independently adding and multiplying two vectors of congruence values, for performing fully homomorphic (addition and multiplication) operations on CRT-based secret shares. Unlike perfectly secure secret sharing (such as the schemes of Shamir [27] and Blakley [6]), CRT-based secret sharing that supports homomorphic additions and multiplications (unlike [2]) is only statistically secured. The present invention uses FHE to computationally mitigate information leakage from the individual CRT share.
The effectiveness of a joint secure operation is detailed in [21], introducing a series of arithmetic calculations, done over a finite field. The solution is perfect information-theoretic secure but requires communication among the participants to support polynomial degree reduction after a multiplication.
The CRT-based Secure Multiparty Computation proposed by the present invention is only statistical information-theoretic secure, but at the same time, uses significantly less memory per participant and enables communication-less operations.
The calculation results of each participant can be collected and recovered into a unique result in K where K=Πi=1kpi. The task of reducing all the results into a single solution can be performed by some known algorithms, such as Garner's Algorithm [24], which is used by the present invention.
In case of two n-bit numbers x, y that are multiplied distributively among k parties, the series of calculations is only 1-multiplication-long, so the result is bounded by 22n. First k primes whose product is large enough (line 2) are found, so the CRT recovery is possible. Then, all k primes are distributed to the parties, such that every party holds a different prime modulus (line 2). In this example, the calculation results are collected synchronously. Actual multiplication results are ecovered using Garner's algorithm (line 13).
The square of x=14 can be distributively computed by a group of 3 parties. Each party calculates the multiplication of x with itself such that:
the result of Garner's algorithm is y=196∈/385where:
y=196=1+4·5+5·(5·7) (2)
Which satisfies the following as expected:
y≡1(mod5), y≡0(mod7), y≡9(mod11), (3)
The result of the calculation did not exceed the size of /385. Specifically, in case that the result of a calculation does overflow the ring bounds (the maximal product value of the moduli, 5·7·11=385 in this example), an unexpected result in the recovery step may occur. This result is guaranteed to be the modulo reduction of the correct one with respect to the moduli product value, thus it might still be useful. Nevertheless, this problem can be resolved by adding more parties to the computation, or by choosing larger primes. While both options increase the ring's bounds, thereby preventing a calculation overflow, the first option is preferable as it has no penalty on the total memory usage.
Therefore, in the general case where a distributed calculation is carried out for some operator/function, one may follow the dealer-worker scheme. In this scheme, there is a single party that is responsible for the assignment of jobs and collection of the results while other parties have no responsibility besides the calculation itself. The first party is denoted as the “dealer” in this scheme and the other parties as “workers”.
The following Algorithm 2 and Algorithm 3 respectively describe their procedures. Initially, the dealer generates the appropriate primes (line 2) and distributes them to the workers (line 2). Throughout the computation, the dealer manages a queue that is shared with the workers in such a manner that every time an input arrives, it is pushed to the queue (line 2) and popped in turn by the workers. Thanks to this queue, the dealer can start and stop each worker asynchronously (line 2), and by that can be more efficient. The dealer ultimately recovers the result using a recovery function of their choice (line 2).
The operation is always executed with respect to the unique modulus, such that there is no risk of overflow, or exceeding the finite field by the computation. The computation's limit is defined by the maximal number that the CRT shares represent, thus keeping the whole memory footprint small during the process.
The present invention provides a DFSM approach that copes with several of the RSM drawbacks. To increase the privacy of the computation implied by this approach, suggest a local FHE based arithmetic circuit is used, that keeps the efficiency of memory while protecting the data.
An Arithmetic Circuit is based on additions and multiplications which support the implementation of any FSM transition function or table. One convenient way to do so is by representing each bit in the circuit as a vector of two different bits (just as a quantum bit is represented). Namely, the bit 0 is represented by 01, and the bit 1 by 10. If each directed edge in the transition function graph tuple representation being represented as CurrentState, Input→NextState, Output, then, given a (possibly secret shared) transition function, this structure allows to secret share the table among different participants, possibly even padding it with additional never-used tuples. CurrentState, Input, and NextState are represented by a sequence of 2-bits vectors. Thus, the logarithmic number of bits needed for the binary representation is doubled, rather than using (optimized for small degree polynomial, secret shares, and multiplication outcome) a linear number of bits in the unary representation as used in
[13].
In order to blindly compute the next state and output, given the current state and input, a participant multiplies each bit of the shared secret (2-bits vector representation) with the bits of each line of the transition table. Then, they sum up the resulting 2-bits vector into a single bit. For example, for the binary representation of the current state 110, the 2-bits vector representation is 101001.
Example of transition function representation:
010101,01→010101,442
010101,10→101001,065
101001,01→101001,542
101001,10→010101,324 (4)
Note that only two inputs are possible in the example here, either 0, represented by 01, or 1, represented by 10. Furthermore, the output can be agreed to be represented in binary, and express a number inside the finite ring of the CRT secret sharing. For example, when three participants are using the primes 3<11<19, then the finite ring being used for the secret sharing is 627. While the states and inputs representations are optimized for logical matching through arithmetic operations, the output representation can benefit from being memory efficient.
In case the current secret shared state and inputs are 101001 and 01 respectively, the next state and output are found by multiplication of every bit of the 2-bits vectors with each line of the table. Namely, the first two bits 10 of the current state are multiplied by the first two bits 01 in the table, resulting in 1·0=0 and 0·1=0, obtaining together 00. Then, by summing the two resulting bits, 0 is obtained, which (blindly) indicates that there is no match. However, the third line in the table yields a match, as a sum of 1 is obtained from the first two bits (10 in the current state and 10 in the table), same for the next two bits 10 and the last two bits 01, altogether yielding the desired output. Finally, if the input is 01, then the third line matches completely, as also the input matches by yielding 1. Therefore, only the third line of the table yields results consisting of only 1 bits that when are (blindly) multiplied among themselves result in 1.
FHE scheme is an encryption scheme that allows the evaluation of arbitrary functions on encrypted data. The problem was first suggested by Rivest, Adleman, and Dertouzos [25], and thirty years later, implemented in [18]. A major application of FHE is in cloud computing. This is because a user can store data on a remote server that has more storage capabilities and computing power than theirs. However, the user might not trust the remote server, as the data might be sensitive, so they send the encrypted data to the remote server and expect it to perform some arithmetic operations on it, without learning anything about the original raw data. The present invention uses an FHE scheme to preserve the privacy among the participants, each being a remote server, blindly following the computation process.
The dealer's procedure described in Algorithm 3 is extended to support FHE behavior. The dealer initializes an FHE context with which they encrypt both the initial value and the incoming inputs (lines 6,11). From this point, they continue in the same way as before (line 7,12), except for a decryption step at the end (line 16) and scheduled bootstrapping steps during the computation. For the sake of generality, the bootstrapping step is omitted but can be regarded as the assignment of the first share of the input to be the share of the initial state. After completing all of the decryptions, the results are reassembled by the CRT into a single solution, as shown before.
Equally, the participants (workers) are dealt with a plaintext modulus in which they operate. By keeping the modulus in the clear, any meaningful information is not leaked and the participant is aided in carrying out the computation with respect to their finite field. As before, after a worker is initialized, they start receiving encrypted inputs and apply the operator to them (line 3). As opposed to the operator application in a general field, these blind applications are expected
to be done in a finite field that is typically different from the binary field in computers (e.g., 8 bits for BYTE or 32/64 for a computer WORD). Therefore, the worker performs a dedicated balancing step after each iteration (line 4). Namely, they perform a blind modulo reduction to the result, thus keeping it inside the field. This step is possible due to a unique feature of FHE bitwise calculations that allows a blind conditioned output.
One popular library that supports this feature is IBM's HELib [28]. This implementation is based on an aggregation of the condition results. Namely, if one wishes to blindly increment a number i by 1 in case it is negative, or otherwise, blindly decrement it, they should first implement an indicator function:
Then, use this function in a context such that i′s value changes correctly:
F(x)=x+1·I(x)−1·(1−I(x)) (6)
A suggested implementation is outlined in the following algorithm. Line 3 creates an unknown bit and line 3 reflects a conditioned output based on that bit. The subtraction is aggregated by using the differences computed in line 3.
Utilizing this feature is essential during the procedure of a worker in the proposed CRT based approach as the worker should be oblivious to the fact they carry out the same procedure only on encrypted data. As long as they know how to perform homomorphic operations such as additions and multiplications, while staying within the boundaries of the computer's binary representation, the homomorphism of the operations over the CRT secret shares is preserved.
This embodiment further improves the secrecy of the transition function of the FSM that is based on polynomial representation.
It is often useful (e.g., [26]) to estimate the value of a function y=f (x) at a certain point x based on some known values of the function, i.e., f(x0), f(x1), . . . , f(xn). These values are evaluated at a set of n+1 points a=x0, x1, . . . , xn=b in the range {a . . . b}. One way to carry out this operation is to approximate the function f(x) by an n-th degree polynomial:
where the coefficients a0, . . . , an are obtained based on the n+1 given points. Specifically, to find the coefficients of Pn(x), it is required the polynomial to pass through all the points: {(xi, yi)=f(xi)|i=0, . . . , n} so that the following n+1 linear equations hold:
This polynomial Pn(x) can be obtained by the interpolation polynomial in the Lagrange form. This polynomial is defined as a linear combination denoted by:
L
n(x)=Σi=0nyili(x) (8)
where li(x) is the Lagrange basis polynomial of degree n, that together with the other basis polynomials span the space of all n-th degree polynomials:
when x=xj for j=0, . . . , n, then:
This way, the polynomial Ln(x) passes through all n+1 points, because:
This polynomial is only an approximation of f(x) in certain points. So, in fact, at any other point x≠xj for j=0, . . . , n, the polynomial value is unpredictable.
Conceptually, polynomial interpolation in finite rings should not differ from polynomial interpolation in general rings such as . That is because modular arithmetic can be used instead of regular arithmetic, thereby following a standard interpolation algorithm.
In case of using the Lagrange interpolation, it is essential to choose the parameter M>0 of the ring /MZ wisely. Otherwise, the interpolation fails. Since it is not guaranteed that every number x∈/MZ is invertible (e.g, zero), and the denominators in the basis polynomials are comprised of differences between two numbers, the different divisions might not be possible. Therefore, for a set of points {(xi, yi)|i=0, . . . , n}, it is crucial to choose such ring /MZ, where all the differences xi−xj are invertible.
Given a set of points in the finite ring /KZ, such that K=Πinpi for relatively primes pi, to successfully interpolate this set of points using Lagrange's method, it is required to verify that neither of the differences has a common factor in {p1, . . . , pn}.
If there exist i≠j such that the difference d=xi−xj has a prime factor pd∈{p1, . . . , pn} then Lagrange's polynomial interpolation is not possible.
Proof:
By contradiction, if such pd exists, then d=pi
Algorithm 8 is used to choose the relatively primes p1, . . . , pk before starting the interpolation process. First, all the differences that might not be invertible are found and factorized (line 5). Once all the factors to be avoided are obtained, primes that are coprime to these factors are found (line 12). Lastly, the prime set whose product is large enough (line 15) is returned.
If a given FSM is represented by a truth table, in the relations between the different states and the possible inputs or outputs interested. The present invention proposes a (non-perfect) encoding scheme that allows to represent this FSM completely by polynomials. First, the different states and transitions are encoded in some grid-compatible representation, where a transition in that context, is a 2-tuple e=(u, v) such that the state u has a valid input that leads to v. One simple encoding is through positive integers representation. Given a set of states V, and a set of transitions E, the 2-D point unique encoding of them is calculated, as follows in Alg. 9.
Since the y value of a point is comprised only of a state encoding, the decoding process is simple. It is however not guaranteed for the x value, as it is comprised of an encoded summation that might overlap other encoded values. One possible way to deal with this, is to simply work on different scales, more specifically, a factor f=10t where t>0 is used to choose the integers in line 4 from the range {f+1 . . . |V|·f}. Also, considering that there might be many transitions to cause an overflow between the scales, the parameter t needs to be bounded such that t>log|E| and f=10t>|E| holds.
Since the polynomials are both encrypted and already evaluated in a specific field, the only information a participant can learn stems from the encryption parameters and the finite field modulus assigned to him beforehand. By keeping the modulus clear, the assignment process is simplified, while not revealing any meaningful data to the participants, as all the other data they receive is encrypted. The encryption parameters, however, including the public key, might hint at the computational security of the scheme, in case the participant is interested in breaking it. The Homomorphic Encryption Standard [1] may assist in choosing recommended parameters for implementation.
In practice, the proposed process provides the participants with a reduced polynomial in some finite field, but the actual operation does not consider that fact. The result can be maintained in the respected finite field by applying a blind modulo operation on each polynomial evaluation. This can be done by the previous method described in Algorithm 7. Moreover, to successively evaluate the polynomial without consuming all noise budget of the FHE scheme, one can utilize a bootstrapping method, thus allowing the computation to carry on endlessly.
For the sake of simplicity and readability the task of searching the word “nano” in a (possibly unbounded streaming) large text is considered. In fact, any state machine computation, where the current state is maintained only in (FHE CRT) secret shared form can be supported, thereby eliminating the need of the computation delegating client, to protect the current state security and privacy (avoiding single point of failure) and carrying the computation of the transition function. This problem can be presented as a distributed computation problem, where each participant text characters are sent one after the other, and expect them to yield a positive result if and only if the sequence “nano” was detected among the received characters. The negative or positive results collect from all participants and the correct result is decided by using the majority, thus eliminating any Byzantine errors. In this scenario, both the string to search and the text itself are shared in clear-text for all participants, along with the string-matching algorithm. The RSM solution reflects a “naive” (as repetition codes) approach for error correction, where there are codes with equivalent Hamming distance, such that in total they use a smaller number of bits [4]. In turn, a different scenario is considered, where all inputs are kept secret and the algorithm is unknown to the parties participating in the distributed computation.
One can build a simple automaton for that specific task, disregarding any preprocessing operations as done in the Boyer Moore algorithm [7]. In this simple automaton, there are only five states—an “empty” state denoted by ϵ, and four other states, each of them representing a valid substring of “nano”. For the sake of clarity, the transition table is detailed in Table 1.
For the sake of convenience, the state machine is also detailed in
Following this data, all the possible letters (inputs) are encoded as integers and later used to simulate a transition from one state to another. The most straightforward method to use in this encoding is a map of each character to its real ASCII value. However, the English alphabet values are located in the ASCII table in a non-continuous manner. Namely, the values have undesired gaps between them. This is a disadvantage, mainly in case when an interpolating polynomial is created for the state machine. Also, using the ASCII table introduces limitation to a single text encoding, solely for the English language. It is possible to work around this limitation by creating a map of characters to values for every possible encoding. However, this solution might be unscalable for different texts, as texts are encoded differently, and creating as many tables as text encoding requires undesired preprocessing work.
A different hybrid method is to use these two techniques together. Namely, it is possible to define an encoding, where each character is mapped to an integer v from the respected text encoding table, only this time, it is subtracted by an offset value off, such that the minimum possible value vmin becomes vmin−off=1. This way, although the unwanted gaps are not eliminated, each character value is minimized, while allowing an application of this method for any text encoding.
Once the encoding of the input is completed, the the states of this machine are encoded as integers. This is done using a set of integers, with a constant distance d between them, such that the highest value of an input Vmax holds Vmax<d. In that way, it is guaranteed that there are no overlaps between states while computing the transitions (recall the algorithm of state and transition encoding, where the encoding of a state is summed together with the encoding of an input). Finally, based on the simple state machine from before, it is possible to build an encoded FSM as, shown in
The automaton using a decimal base is demonstrated, while in practice, binary base is more efficient. The five states above are encoded as integers with a factor of f=102 and a similarly a range of {1 . . . 5}, such that: 1·102=100,2·102=200,3·102=300,4·102=400,5·102=500. Also, all inputs are encoded as integers in the range {1 . . . 29} to represent the English alphabet and the punctuation signs such as spaces, dots, and newlines.
Following Algorithm 9, a list of points is built, each representing a transition in the state machine from one state to another with some input. This list is sparse, namely, it does not include invalid or non-existent transitions. Therefore, when the polynomial interpolation is applied, invalid values will result in invalid or unexpected results (see Appendix B for the complete list of encoded points).
Next, following Alg. 8, an interpolating polynomial P(x) is built such that P(x)∈/K[X]. Namely, besides that all the points detailed above fit into the polynomial, all the polynomial's coefficients are in /K for some relatively primes p1, . . . , pk and a product K=Πikpi.
As a result of a large number of encoded points, this polynomial has a high degree. However, this is acceptable as it is only evaluated under some finite field and there is no risk of overflowing or exceeding memory resources. As soon as the interpolation step is completed, modulo reduction is immediately applied to each participant and the reduced polynomial is distributed to start the computation.
The above examples and description have of course been provided only for the purpose of illustrations, and are not intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways, employing more than one technique from those described above, all without exceeding the scope of the invention.
The type of field that has a finite number of elements is first introduced. This number is a power pn of some prime number p. In fact, for any prime number p and any natural number n there exists a unique field of pn elements that is denoted by GF (pn) or by p
CRT Arithmetic. Let p1<p2<. . . <pk where pi are relatively prime and a set of congruence equations a≡ai(modpi) for 1≤i≤k for k>0 and where ai are remainders. The original form of the CRT states that this given set of congruence equations always has one and exactly one solution modulo Π1k pi.
Similar to the previous notation, this theorem is often restated as:
/a≅/p1×. . . ×/pk (11)
This means that by doing a sequence of arithmetic operations in /a one may do the same computation independently in each /pi and then get the result by applying the isomorphism from the right to the left. This operation is refer to as the recovery process later on.
The integer m=14 can be represented as a set of these congruence equations:
14≡2(mod3),14≡4(mod5),
14≡0(mod7),14≡3(mod11) (12)
More significantly, m=14 is the exact and only solution modulo 3·5·7·11=1155.
This feature of CRT allows to represent big numbers using a small array of integers. Namely, when performing arithmetic operations on big numbers this feature assists in preserving memory resources.
The assumption that every participant in a distributed system correctly follows the algorithm may not hold in reality. Either because of faults or malicious takeovers. Therefore, distributed systems are often designed to tolerate a part (typically less than ⅓) of the participants that are acting as if controlled by a malicious adversary. These participants are called Byzantine participants.
To further demonstrate the effectiveness of a joint operation as detailed in [21], consider a series of arithmetic calculations, done over t-bits numbers in the form of b1, . . . , bt. The additions in this series might add up to 1 bit between each calculation, but the multiplications might double the number of bits. Thus, if the whole operation is completed individually by a single party, it might require up to t2 bits in the worst case. However, if the CRT representation is use and the moduli are split into several parties, each calculation is perfored individually while achieving a bounded number of bits per party (bound as large as implied by the largest modulus). Namely, to support calculations of up to t2 bits, i.e numbers up to 2t
2t
This observation leads to the conclusion that it is possible to decide whether to use a few large primes or many small primes to carry out the same series of calculations. This decision might change according to the availability of more parties and the number of memory resources to be consumed.
As previously explained, the calculation result of each participant can be collected and recovered into a unique result in K where K=Πi=1kpi.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2022/050241 | 3/3/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63155751 | Mar 2021 | US | |
63155754 | Mar 2021 | US | |
63174052 | Apr 2021 | US |