Machine learning models provide important decision making features for various applications across a wide variety of fields. Given their ubiquity, greater importance has been placed on understanding the implications of machine learning model design and training data set choices on machine learning model performance. Systems and techniques that can provide greater adoption of machine learning models are, therefore, highly desirable.
Parameter permutation is performed for federated learning to train a machine learning model. A client system of an aggregation server, may update parameters of a machine learning model based on encrypted parameters received from the aggregation server and decrypted by the client system using a secret key of a public-secret key pair obtained at the client system. Local updates to the parameters of the machine model may be computed by the client system according to a machine learning technique using local training data. The parameters of the federated machine learning model may be randomized at the client system. An intra-model shuffling may be applied to the randomized parameters according to a shuffling pattern at the client system. The shuffled parameters may be encoded at the client system using the secret key. The client system may provide the encoded parameters to the server of the federated machine learning system using one or more Private Information Retrieval (PIR) queries generated according to the shuffling pattern that allow the aggregation server to retrieve each of the encoded local parameter privately during aggregation.
While the disclosure is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the disclosure is not limited to embodiments or drawings described. It should be understood that the drawings and detailed description hereto are not intended to limit the disclosure to the particular form disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word “may” is used in a permissive sense (e.g., meaning having the potential to) rather than the mandatory sense (e.g. meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
Various units, circuits, or other components may be described as “configured to” perform a task or tasks. In such contexts, “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation. As such, the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.” Reciting a unit/circuit/component that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation for that unit/circuit/component.
This specification includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment, although embodiments that include any combination of the features are generally contemplated, unless expressly disclaimed herein. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
Various techniques for private and robust federated learning by parameter permutation are described herein. Federated Learning (FL) is a distributed, collaborative machine learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL scenarios. At the same time, the machine learning model should also be protected from poisoning attacks from adversarial clients. While some techniques address these technical challenges in isolation, as described in detail below, various embodiments of parameter permutation for federated learning may address both these challenges, combining an intra-model parameter shuffling technique that amplifies data privacy, with Private Information Retrieval (PIR) based techniques that permit cryptographic aggregation of clients' machine learning model updates. The combination of these techniques further enables a federation server (sometimes referred to as an aggregation server) to constrain parameter updates from clients so as to curtail effects of model poisoning attacks by adversarial clients. Additionally, in various embodiments, the hyperparameters of parameter permutation techniques for federated learning as described below can be used to effectively trade off computation overheads with model utility.
In various embodiments, federated learning training involves a server (or collection of multiple computing devices that act as a server) that aggregates, using an aggregation rule (AGR), machine learning model updates that clients participating in federated machine learning compute using their local private data. The aggregated global machine learning model is thereafter broadcasted by the server to a subset of the clients. This process repeats for several rounds until convergence or a threshold number of rounds.
Though highly promising, federated learning faces multiple technical challenges to its practical deployment. As noted above, two of these technical challenges are (i) providing data privacy for clients' training data, and (ii) providing robustness of the global machine learning model in the presence of malicious clients. The data privacy challenge emerges from the fact that raw model updates of federation clients are susceptible to privacy attacks by an adversarial aggregation server. Two classes of approaches can address this problem in significantly different ways: First, Local Differential Privacy (LDP) enforces a strict theoretical privacy guarantee to model updates of clients. The guarantee is enforced by applying carefully calibrated noise to the clients' local model updates using a local randomizer R. While this technique provides the privacy guarantee that it can defend against poisoning attacks by malicious clients, the noise added to the client's local model updates used to provide the LDP guarantee significantly degrades model utility.
The other approach to enforce client data privacy is secure aggregation (sometimes referred to as “sAGR”), where model update aggregation is done using cryptographic techniques such as partially homomorphic encryption or secure multi-party computation. sAGR protects privacy of clients' data from an adversarial aggregation server because the server sees just the encrypted version of clients' model updates. Moreover, this privacy is enforced without compromising global model utility. However, the encrypted model updates themselves provide the perfect cover for a malicious client to poison the global model—the server cannot tell the difference between a honest model update and a poisoned one, since both are encrypted.
In various embodiments, an efficient federated learning algorithm that achieves local privacy for participating clients at a low utility cost, while ensuring robustness of the global model from malicious clients may be highly desirable as it addresses many of the technical challenges to federated learning, including those noted above.
The starting point of parameter permutation for federated learning techniques implemented and described in various embodiments below is privacy amplification by shuffling, which enables stronger (e.g., amplified) privacy with little model perturbation (using randomizer R) at each client. This technique differs from other techniques because intra-model parameter shuffling is applied rather than the inter-model parameter shuffling done previously.
Next, each parameter permutation for federated learning client chooses its shuffling pattern uniformly at random for each round of federated learning, which is private to that client. To aggregate the shuffled (and perturbed) model parameters, a client may utilize computational Private Information Retrieval (sometimes referred to as “cPIR”) to generate a set of PIR queries for its shuffling pattern that allows the server to retrieve each parameter privately during aggregation. All that the server observes is the shuffled parameters of the model update for each participating client, and a series of PIR queries (e.g., the encrypted version of the shuffling patterns). The server can aggregate the PIR queries and their corresponding shuffled parameters for multiple clients to get the encrypted aggregated model. The aggregated model is decrypted independently at each client.
The combination of LDP enforcement at each client and intra-model parameter shuffling achieves enough privacy amplification to let parameter permutation for federated learning preserve high model utility, such that the noise added for privacy does not degrade the predictive performance of the model when deployed to make predictions (sometimes referred to as “inferences”) as part of a system, service, or application that uses a machine learning model trained according to the below described federated learning techniques. At the same time, availability of the shuffled parameters at the federation server lets the federation server control a client's model update contribution by checking and enforcing norm-bounding, which is known to be highly effective against model poisoning attacks
In various embodiments, parameter permutation for federated learning utilizes cPIR may rely on homomorphic encryption, though it can be computationally expensive, particularly for large models. However, hyperparameters may be implemented for the federated learning techniques, in some embodiments, that allow for computation/utility trade off hyper-parameters in parameter permutation for federated learning, that enables us to achieve an interesting tradeoff between computational efficiency and model utility. For example, one or more hyperparameters can be specified or changed to adjust the computation burden for a proper utility goal by altering the size and number of shuffling patterns for the parameter permutation for federated learning clients. Such hyperparameters allow various embodiments to provide LDP-federated learning guarantees at low model utility cost. In another example, hyperparameters can create shuffling windows whose size can be reduced to drastically cut computation overheads, but at the cost of reducing model utility due to lower privacy amplification (given a fixed privacy budget). In some embodiments, hyperparameter configurations can be set to provide “light” or “heavy” parameter permutation. For the hyperparameter configuration that provides a light version of parameter permutation for federated learning, where client encryption, and server aggregation need to perform using a limited amount of training time (e.g., 52.2 seconds and 21 minutes respectively), the result of federated learning to train a model that still provides some accurate results (e.g., 32.85% test accuracy) while still providing client data privacy and protection against poisoning attacks. For a hyperparameter configuration for client encryption and server aggregation with larger time allowances (e.g., 32.1 minutes and 16.4 hours respectively) greater model accuracy can be provided (e.g., 72.38% test accuracy) again while providing client data privacy and protection against poisoning attacks. The choice of hyperparameters allows for techniques for parameter permutation for federated learning to fit within the resources (e.g., time, computing resource utilization, etc.) allotted to performing federated learning.
The discussion that follows provides various examples of the terminology that can be used when discussing federated learning and the implementation of privacy preserving techniques. Non-Private Federated Learning is a starting point for the discussion which may culminate in the techniques for parameter permutation for federated learning, in various embodiments. For example, in a federated learning technique N clients collaborate to train a global machine learning model without directly sharing their data. In round t, the federation server (also referred to as the “aggregation server” or the “server”) samples n out of N total clients and sends them the most recent global model θt. Each client re-trains θt on its private data using a machine learning technique, such as stochastic gradient descent (SGD), and sends back the model parameter updates (xi for ith client) to the server. The server then aggregates (e.g., averages) the collected parameter updates and updates the global model for the next round
Formally, consider adjacent datasets (X,X′∈n×d) that differ from each other by the data of one federation client. The following provides a description of implementing differential privacy: A randomized mechanism M:X→Y is said to be (ε, δ)-differential private if for any two adjacent datasets X,X′∈X and any set Y⊆:
Pr[M(X)∈Y]≤eεPr[M(X′)∈Y]+δ (1)
where ε is the privacy budget (lower the ε, higher the privacy), and δ is the failure probability.
Another approach to implementing privacy in federated learning is illustrated in
The following discussion provides a formal description of LDP. A randomized mechanism R:X→Y is said to be (εl, δl) locally differential private if for any two inputs x,x′∈X and any output y∈Y:
Pr[R(x)=y]≤eε
In LDP-federated learning, each client perturbs its local update, (xi), with (εl, δl)-LDP. Unfortunately, LDP hurts the utility, especially for high dimensional vectors. Its mean estimation error is bounded by
meaning that for better utility, a higher privacy privacy budget or larger number of users in each round may be implemented.
Another technique that utilizes privacy amplification as part of providing client data privacy is illustrated in
Federated learning frameworks based on shuffling clients' updates may include three building processes: M=A∘S∘R. Specifically, they introduce a shuffler S, which sits between the FL clients and the FL server, and it shuffles the locally perturbed updates (by randomizer R) before sending them to the server for aggregation (A). More specifically, given parameter index i, S randomly shuffles the ith parameters of model updates received from the n participant clients. The shuffler thus detaches the model updates from their origin client (e.g., anonymizes them).
In a shuffle model, if R is εl-LDP, where
M is (εl, δl)-DP with
where ‘Λ’ shows minimum function.
From the above corollary describing a shuffle model, the privacy amplification has a direct relationship with √{square root over (n)} where n is the number of selected clients for aggregation, (e.g., increasing the number of clients will increase the privacy amplification). Note that in parameter permutation for federated learning, the clients are responsible for shuffling, and instead of shuffling the n clients' updates (inter-model shuffling as depicted in
When considering the following description of techniques for parameter permutation for federated learning, Naïve and Strong Composition may be understood. (Naive Composition) ∀ε≥0, t∈, may be the family of ε-DP mechanism that satisfies tε-DP under t-fold adaptive composition.
(Strong Composition) ∀ε, δ, δ′>0, t∈, may be the family of (ε, δ)-DP mechanism that satisfies
under t-fold adaptive composition.
As discussed above, multiple threats to federated learning may exist. In
In some embodiments, parameter permutation for federated learning utilizes computational Private Information Retrieval (sometimes referred to as “cPIR”) for secure aggregation at the federation server. Algorithm 1, depicted in
In various embodiments, Paillier is a partial HE (PHE) algorithm that may be implemented as part of performing parameter permutation for federated learning. Paillier that relies on a public key encryption scheme. Since Paillier is employed to protect client updates from a curious federation server, parameter permutation for federated learning may use an independent key server that generates a pair of public and secret homomorphic keys (Pk, Sk) (as depicted in
In the tth round, the server randomly samples n clients among total N clients. Each sampled client locally retrains a copy of the global model it receives from the server (θgt), optimizing the model using its local data and local learning rate η (Algorithm 1,
Randomizing Update Parameters: After computing local updates θut client u clips the update using threshold C and normalizes the parameters to the range [0, 1] (Algorithm 1,
Shuffling: After clipping and perturbing the local update, each sampled client shuffles the parameters yut using the random shuffling pattern πu (Algorithm 1 line 9-10). This shuffling step amplifies the privacy budget εd.
Generating PIR queries: Now the client encodes the shuffle indices πu using a PIR protocol. This process comprises two steps, as indicated at local encrypt 146a, 146b, and 146c: first creating a binary mask of the shuffled index, and then encrypting it using the public key of HE that the client received in first step (Algorithm 1,
{right arrow over (b)}
j=[0 0 . . . 1 . . . 0 0] (3)
If the PIR client does not care about privacy, it would send {right arrow over (b)}j; to the PIR server, and the server would generate the client's response by multiplying the binary mask into the database matrix θ(θj={right arrow over (b)}j×θ). A PIR technique allows the client to obtain this response without revealing {right arrow over (b)}j; to the PIR server. For example, the PIR client may use HE to encrypt {right arrow over (b)}j; element by element before sending it to the PIR server. During the data recovery phase, the client extracts its target record by decrypting the component of E
A client creates cPIR queries to retrieve each parameter privately. In some embodiments, techniques may be implemented to reduce the number of cPIR queries. In some embodiments, the shuffled parameters ({tilde over (y)}ut) are the dataset located at the PIR server and each shuffled index in πu is the secret record row number (e.g., jth in above) that the PIR client is interested in. Client u first creates but which is a collection of d binary masks of shuffled indices in πu, similar to PIR query {right arrow over (b)}j; in Equation 3. Then the client encrypts the binary masks and sends the shuffled parameters and the PIR query (encrypted binary masks) to the server for aggregation.
Correctness: Note that for every client u and every round t, decrypting the multiplication of the encrypted binary masks to the shuffled parameters produces the original unshuffled parameters. It means that for yut=DEC(cut×{tilde over (y)}ut). So for any ({tilde over (y)}, c) there is:
Server: norm bounding. After collecting all the local updates ({tilde over (y)}ut, cut) for selected clients in round t, the parameter permutation for federated learning server first applies 2-norm bounding to the threshold M on the shuffled parameters {tilde over (y)}ut (Algorithm 1,
Server: Aggregation Then the server aggregates all the updates into global update
The expression cut×{tilde over (y)}ut has the effect of “‘unshuffling” client u's parameters. At the same time, the resulting vector is encrypted, thus kept hidden from the server.
Correctness of Aggregation: In Equation 5, it is shown that ∀t∈[T], u∈U yut=D
Updating Global Model. The parameter permutation for federated learning server aggregates the local updates ({tilde over (y)}ut, cut) without knowing the true position of the parameters as the parameters are detached from their positions. Result of aggregation
is vector of encrypted parameters, and they may have to be decrypted to be used for updating the global model (Algorithm 1,
Each parameter permutation for federated learning client perturbs its local update (vector x.; containing d parameters) with randomizer Rd which is εd-LDP, and then shuffles its parameters. In some embodiments, a Laplacian mechanism as the randomizer. Based on the naive composition theorem above, the client perturbs each parameter value with R which satisfies εwd-LDP where
The corollary below snows the privacy amplification from εd-LDP to (εl, δl)-DP after the parameter shuffling. This derived by substituting the number of participating clients n by the number of parameters d in the model:
Corollary: If R is εwd-LDP, where
then the federated permutation F=A∘S∘Rd satisfies (εl, δl)-DP with
Thus, larger the number of parameters in the model, greater is the privacy amplification. With large models containing millions or billions of parameters, the privacy amplification can be immense. However, the model dimensionality also affects the computation (and communication) cost in parameter permutation for federated learning. Each parameter permutation for federated learning client can generate a d-dimensional PIR query for every parameter in the model resulting in a PIR query matrix containing d2 entries. In various embodiments, techniques for parameter permutation for federated learning may use additional hyperparameters in some scenarios to configure federated learning performance to configure computation/communication overheads and model utility.
Instead of shuffling all the d parameters, the parameter permutation for federated learning client can partition its parameters into several identically sized windows, and shuffle the parameters in each window with the same shuffling pattern. Thus, instead of creating a very large random shuffling pattern π with d indices (e.g., π=R
The window size k1 is one hyperparameter for parameter permutation for federated learning that can be used, in some embodiments, to control the computation/communication and model utility trade off. Once the size of shuffling pattern is set to k1, each client may perform d·k1 encryptions and consumes O(d⋅k1) network bandwidth to send its PIR queries to the server.
Superwindow: A shuffling window size of k1, partitions each parameter permutation for federated learning client u's local update xu (d parameters) into w=d/k1 windows, each containing k1 parameters. Each parameter permutation for federated learning client, independently from other parameter permutation for federated learning clients, chooses its shuffling pattern π uniformly at random with indices ∈[1, k1], and shuffles each window with this pattern. This means that every position j(1≤j≤k1) in each window k(1≤k≤w)will have the same permutation index (πj). Thus all of the jth positioned parameters (xu(k,j) for 1≤k≤w) will contain the value from the πjth slot in window k. For a given index j(1≤k≤w), a superwindow may be defined as the set of all of the parameters xu(k,j) for 1≤k≤w. If the parameter vector xu is structured (with d parameters) as k
Shuffling of superwindows, instead of individual parameters, leads to a significant reduction in the computation (and communication) overheads for parameter permutation for federated learning clients. However, this comes at the cost of smaller privacy amplification. The privacy amplification of parameter permutation for federated learning goes from εd-LDP to (, )-DP after superwindow shuffling (with window size k1). After applying the randomizer R that is εd-LDP on the local parameters, each superwindow is εw-LDP where
Since we are shuffling the superwindows, parameter permutation for federated learning may be derived by setting the shuffling pattern size to k1. Accordingly:
For F=A∘S∘Rw with window size k1, where Rw is εw-LDP and
the amplified privacy is
Another way to adjust the computation/communication vs. utility tradeoff is by using multiple shuffling patterns, which may be implemented in some embodiments. Each parameter permutation for federated learning client chooses k2 shuffling patterns {π1, . . . , πk
with shuffling pattern πi such that i=k mod k2. In this case, each parameter permutation for federated learning client may have k2·k12 encryptions to generate the PIR queries.
When there are k2 shuffling patterns and each shuffling pattern has k1 indices, the size of each superwindow is w=d/(k1k2). Therefore, each client perturbs each superwindow with a randomizer Rw that satisfies εw-LDP where
Take εw on the superwindows to find the amplified local privacy and then using strong composition discussed, parameter permutation for federated learning with Sk
For F=A∘Sk
the amplified privacy is
There may be relationship between variables k1, k2, εd and d on the privacy amplification in parameter permutation for federated learning is discussed below. The privacy amplification effect from εd-LDP to (, )-DP for the local model updates after shuffling with k2 shuffling patterns each with size of k1 may be determined. For each client using larger shuffling patterns (e.g., increasing k1) or more shuffling patterns (e.g., increasing k2), larger privacy amplification may be obtained. A tradeoff to consider is that the privacy amplification increase imposes more computation/communication burden on the clients to create the PIR queries, as they may have to encrypt k2×k12 values and send them to the server, and it also imposes higher computation on the server as it should multiply larger matrices. Accordingly, to provide a certain privacy level for larger models, the values of k1 or k2 may have to be increased.
Client encryption time: In parameter permutation for federated learning, each client may perform k12·k2 encryptions for its query, therefore client encryption time may have a quadratic and linear relationship with window size (k1) and number of shuffling patterns (k2) respectively in some scenarios. Increasing the k1 has more impact compared to increasing k2 on the privacy amplification. This means that if more computation resources were utilized on the clients and are able to do more encryption, greater privacy amplification may be obtained by parameter shuffling. For instance, if the k1 is increased from 100 to 200 (while fixing k2=1), the average client encryption time increases, for example, from 3.4 to 13.1 seconds for d=7850 parameters. And while fixing the k1=100, the number of shuffling patterns may be increased from 1 to 10, the encryption time may increase, for example from 3.4 to 32.7 seconds. The value of k1 and k2 is fixed, the number of encryption is fixed at the clients, so the encryption time would be constant if the number of parameters (d) is increased each round.
Client decryption time: Changing k1, k2, and n may not have any impact on decryption time, as each client should decrypt d parameters. There may be a linear relationship of decryption time and number of parameters. For instance, by increasing the number of parameters from 105 to 106, the decryption time may increase from 1.01 to 9.91 seconds.
Server aggregation time: In parameter permutation for federated learning, the server first multiplies the encrypted binary mask to the corresponding shuffled model parameters for each client participating in the training round, and then sums the encrypted unshuffled parameters to compute the encrypted global model. In some embodiments, different software libraries may be implemented to perform parallel matrix multiplication over super windows (e.g., using joblib in Python). Thus, the larger the superwindows, the greater the parallelism. However, as k1 and/or k2 are increased, the superwindow size goes down, and consequently reducing the parallelism, which leads to increase in running time. Moreover, increasing n, d may have a linear relationship with server aggregation time. For instance, when increased from 5 to 10, the server aggregation time increases from 157.47 to 326.72 seconds for d 32 7850, k1, =100, and k2=1.
As discussed above, techniques for parameter permutation for federated learning may address threats from an aggregation server in addition to client-based threats.
One threat model may be Honest-but-Curious Aggregator. In this threat model, the aggregation server correctly follows the aggregation algorithm, but may try to learn clients' private information by inspecting the model updates sent by the participants. For creating the PIR queries, Paillier homomorphic encryption may be used (as discussed above). Before starting parameter permutation for federated learning, a key server may be used to generate and distribute the keys for the homomorphic encryption (HE). A key server generates a pair of public and secret homomorphic keys (Pk, Sk), sends them to the clients, and sends only the public key to the server. Either a trusted external key server or a leader client can be responsible for this role. For the leader client, before the training starts, the aggregation server randomly selects a client as the leader. The leader client then generates the keys and distributes them to the clients and the server as above.
For example, as illustrated in
Another threat model that can be addressed using parameter permutation for federated learning techniques may be Curious and Colluding Clients. In this threat model, some clients may collude with the FL server to get private information about a victim client by inspecting its model update. For this threat model, thresholded Paillier may be used. In the thresholded Paillier scheme, the secret key is divided to multiple shares, and each share is given to a different client. For this threat model, an external key server may generate the keys and sends (Pk, Ski) to each client, and sends the public key to the server. Now each client can partially decrypt an encrypted message, but if less than a threshold, say t, combine their partial decrypted values, they cannot get any information about the real message. On the other hand, if combining ≥t partial decrypted values, the secret can be recovered.
The following discussion provides further details for performing local randomization using a Laplace mechanism. In parameter permutation for federated learning techniques, each client i applies a Laplace mechanism as a randomizer R on its local model update (xi). A Laplace mechanism may be described where f:Xn→k such that the Laplace mechanism implementing randomizer R is
R(X)=f(X)+[Y1, . . . , Yk] (8)
Where the Yi's are drawn independent and identically distributed (i.i.d.) from Laplace (Δ(f)/ε) random variable. This distribution may have the density of
where b is the scale and equal to Δ(f)/ε.
Each model update contains d parameters in range of [0, 1], so the sensitivity of the entire input vector is d. It means that applying εd-DP on the vector xi, is equal to applying εwd=εd/d on each parameter independently. Therefore, applying εd-DP randomizer R on the vector x, means adding noise from Laplace distribution with scale
The following discussion provides further details for PIR techniques discussed above. PIR is a technique to provide query privacy to users when fetching sensitive records from untrusted database servers. That is, PIR enables users to query and retrieve specific records from untrusted database server(s) in a way that the servers cannot identify the records retrieved. There are two major types of PIR protocols. The first type is computational PIR (CPIR) in which the security of the protocol relies on the computational difficulty of solving a mathematical problem in polynomial time by the servers, e.g., factorization of large numbers. Most of the CPIR protocols are designed to be run by a single database server, and therefore to minimize privacy leakage they perform their heavy computations on the whole database (even if a single entry has been queried). Most of these protocols use homomorphic encryption to make their queries private. The second major class of PIR is information-theoretic PIR (ITPIR). ITPIR protocols provide information theoretic security, however, existing designs may have to be run on more than one database servers, and they may have to assume that the servers do not collude.
The following discussion provides further details of Homomorphic Encryption (HE) that may be used in various techniques. Homomorphic encryption (HE) allows application of certain arithmetic operations (e.g., addition or multiplication) on the ciphertexts without decrypting them. In some embodiments, partial HE, that only supports addition, may be implemented to make the federated learning aggregation more secure. The following are two examples of HE that can be used in various embodiments.
Paillier. An additively homomorphic encryption system provides following property:
E
where ∘ is a defined function on top of the ciphertexts.
The clients encrypt their updates, send them to the server, then the server can calculate their sum (using the ∘ operation) and sends back the encrypted results to the clients. Now, the clients can decrypt the global model locally and update their models. Using HE in these scenario does not produce any accuracy loss because no noise will be added to the model parameters during the encryption and decryption process.
Thresholded Paillier. In the thresholded Paillier scheme, the secret key is divided to multiple shares, and each share is given to a different client. Now each client can partially decrypt an encrypted message, but if less than a threshold, say t, combine their partial decrypted values, they cannot get any information about the real message. On the other hand, if combining ≥t partial decrypted values, the secrete can be recovered. In this system, the computations are in group n
Key generation: First, find two primes p and q that satisfies p=2p′+1 and q=2q′+1 where p′q′ are also prime. Now set the n=pq and m=p′q′. Pick d such that d=0 mod m and d=1 mod n2. Now the key server creates a polynomial f(x)=Σi=0k−1aixi mod n2m where ai are chosen randomly from n
Encryption: For message M, a random number r is chosen from n
Share decryption: The ith shareholder computes ci=c2Δs
Share combining: After collecting k partial decryption, the server can combine them into the original plain-text message M by c′=Πi∈∥k∥ci2λ
And use it to generate the M.
Utilizing recursion in cPIR. A solution to reduce the number encryptions and upload bandwidth at the clients would be using recursion in cPIR. In this technique, the dataset is represented by a k3-dimensional hypercube, and this allows the PIR client to prepare and send
encrypted values where k3 would be another hyperparameter. This technique can be used to reduce the number of encryptions which makes the upload bandwidth consumption lower too. For instance, if there were one shuffling pattern k2=1, the number of encryption decreases from k1d to
Plugging newer PIR protocol. Parameter permutation for federated learning techniques may utilize cPIR for private aggregation, in some embodiments. However, any other cPIR protocol can be used in parameter permutation for federated learning. For example, SealPIR can be used to reduce the number of encryptions at the client. SealPIR is based on a SEAL which is a lattice based fully HE.
In various embodiments, these techniques may be combined with an external client shuffler for more privacy amplification. For further privacy amplification, an external shuffler that shuffles the n sampled clients' updates similar to FLAME can be used. Double amplification by first shuffling the parameters at the clients (e.g., detaching the parameter values from their position) and then shuffle the client's updates at the external shuffler (e.g., detaching the updates from their client's ID).
The specification next discusses example implementations of a federated machine learning system that can implement the above parameter permutation techniques for federated learning. Then, various exemplary flowcharts illustrating methods and techniques, which may be implemented by these machine learning systems or other systems or applications are discussed. Finally, an example computing system is discussed upon which various embodiments may be implemented is discussed.
After receiving a current version of the machine learning model 612, individual ones of the federated model user systems 620, 630 and 640, may independently generate locally updated versions of the machine learning models 622, 632, and 642 by training the model using local, training data sets 624, 634, and 644. Individual ones of the federated model user systems 620, 630, and 640 may independently alter, by clipping and applying noise, to their local model parameter updates to generate modified model parameter updates, where the altering provides or ensures privacy of their local training data sets 624, 634, and 644, in some embodiments. Intra-model parameter shuffling may also be performed, as discussed above.
Upon receipt of the collective modified model parameter updates, the federation server 610 may then aggregate the respective modified model parameter updates to generate aggregated model parameter updates 614. For example, as discussed above and below with regard to
Various different systems, services, or applications may implement the techniques discussed above. For example,
As indicated at 720, local updates to the parameters of the machine model according to a machine learning technique may be applied using local training data, in some embodiments. For example, a stochastic gradient learning technique may be applied in some embodiments. Other machine learning techniques can be used in other embodiments.
As indicated at 730, parameters of the machine learning model may be randomized, in some embodiments. For example, a Laplace mechanism, as discussed above, may be used to ensure that the noise added by randomization fits within a privacy budget.
As indicated at 740, intra-model shuffling may be applied to the randomized parameters according to a shuffling pattern. For example, as discussed above with regard to
As indicated at 750, the shuffled parameters using may be encoded using the secret key, in some embodiment. As indicated at 760, the encoded parameters may be provided to the aggregation server using one or more Private Information Retrieval (PIR) queries generated according to the shuffling pattern that allow the aggregation server to retrieve each of the encoded local parameter privately during aggregation. Different PIR techniques may be used, such as cPIR or sealPIR, as discussed above.
As indicated at 820, a norm bounding technique as part of averaging the respective encoded parameters to generate an updated set of encoded parameters, in some embodiments. For example, as discussed above the 2-norm of the respective encoded parameters as a whole irrespective of their order. As indicated at 830, the updated set of encoded parameters may be provided to the client systems.
The mechanisms for implementing federated learning by parameter permutation, as described herein, may be provided as a computer program product, or software, that may include a non-transitory, computer-readable storage medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to various embodiments. A non-transitory, computer-readable storage medium may include any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable storage medium may include, but is not limited to, magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; electrical, or other types of medium suitable for storing program instructions. In addition, program instructions may be communicated using optical, acoustical or other form of propagated signal (e.g., carrier waves, infrared signals, digital signals, etc.)
In various embodiments, computer system 1000 may include one or more processors 1070; each may include multiple cores, any of which may be single or multi-threaded. Each of the processors 1070 may include a hierarchy of caches, in various embodiments. The computer system 1000 may also include one or more persistent storage devices 1060 (e.g. optical storage, magnetic storage, hard drive, tape drive, solid state memory, etc.) and one or more system memories 1010 (e.g., one or more of cache, SRAM, DRAM, RDRAM, EDO RAM, DDR 10 RAM, SDRAM, Rambus RAM, EEPROM, etc.). Various embodiments may include fewer or additional components not illustrated in
The one or more processors 1070, the storage device(s) 1050, and the system memory 1010 may be coupled to the system interconnect 1040. One or more of the system memories 1010 may contain program instructions 1020. Program instructions 1020 may be executable to implement various features described above, including a machine learning system 1022 as discussed above with regard to
In one embodiment, Interconnect 1090 may be configured to coordinate I/O traffic between processors 1070, storage devices 1070, and any peripheral devices in the device, including network interfaces 1050 or other peripheral interfaces, such as input/output devices 1080. In some embodiments, Interconnect 1090 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 1010) into a format suitable for use by another component (e.g., processor 1070). In some embodiments, Interconnect 1090 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example. In some embodiments, the function of Interconnect 1090 may be split into two or more separate components, such as a north bridge and a south bridge, for example. In addition, in some embodiments some or all of the functionality of Interconnect 1090, such as an interface to system memory 1010, may be incorporated directly into processor 1070.
Network interface 1050 may be configured to allow data to be exchanged between computer system 1000 and other devices attached to a network, such as other computer systems, or between nodes of computer system 1000. In various embodiments, network interface 1050 may support communication via wired or wireless general data networks, such as any suitable type of Ethernet network, for example; via telecommunications/telephony networks such as analog voice networks or digital fiber communications networks; via storage area networks such as Fibre Channel SANs, or via any other suitable type of network and/or protocol.
Input/output devices 1080 may, in some embodiments, include one or more display terminals, keyboards, keypads, touchpads, scanning devices, voice or optical recognition devices, or any other devices suitable for entering or retrieving data by one or more computer system 1000. Multiple input/output devices 1080 may be present in computer system 1000 or may be distributed on various nodes of computer system 1000. In some embodiments, similar input/output devices may be separate from computer system 1000 and may interact with one or more nodes of computer system 1000 through a wired or wireless connection, such as over network interface 1050.
Those skilled in the art will appreciate that computer system 1000 is merely illustrative and is not intended to limit the scope of the methods for providing enhanced accountability and trust in distributed ledgers as described herein. In particular, the computer system and devices may include any combination of hardware or software that may perform the indicated functions, including computers, network devices, internet appliances, PDAs, wireless phones, pagers, etc. Computer system 1000 may also be connected to other devices that are not illustrated, or instead may operate as a stand-alone system. In addition, the functionality provided by the illustrated components may in some embodiments be combined in fewer components or distributed in additional components. Similarly, in some embodiments, the functionality of some of the illustrated components may not be provided and/or other additional functionality may be available.
Those skilled in the art will also appreciate that, while various items are illustrated as being stored in memory or on storage while being used, these items or portions of them may be transferred between memory and other storage devices for purposes of memory management and data integrity. Alternatively, in other embodiments some or all of the software components may execute in memory on another device and communicate with the illustrated computer system via inter-computer communication. Some or all of the system components or data structures may also be stored (e.g., as instructions or structured data) on a computer-accessible medium or a portable article to be read by an appropriate drive, various examples of which are described above. In some embodiments, instructions stored on a computer-accessible medium separate from computer system 1000 may be transmitted to computer system 800 via transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link. Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Accordingly, the present invention may be practiced with other computer system configurations.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This application claims benefit of priority to U.S. Provisional Application Ser. No. 63/371,379, entitled “Federated Learning by Parameter Permutation,” filed Aug. 12, 2022, and which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63371379 | Aug 2022 | US |