The computational requirements for machine learning have become increasing demanding as both models and the diversity of training data have increased in complexity. Distributed learning techniques, where multiple devices contribute to a single aggregated machine learning model, have emerged as one solution to this problem. However, distributed machine learning requires multiple devices to share information with one another, introducing a risk that the privacy of device information can be compromised. For example, the privacy of the data used to compute each devices share of the aggregate model is at risk through a model inversion attack, in which an attacker can reconstruct the training data from model parameters received or intercepted from a device. This is particularly troublesome when a device is training a model using private or protected information.
The systems and methods of this technical solution solve these and other issues by improving upon federated learning techniques to protect the privacy of device models. In addition, the techniques described herein eliminate computationally costly mask-reconstruction operations utilized in other techniques, thereby effecting a significant increase in computational performance without sacrificing device privacy. For example, other techniques may provide a secure aggregation but suffer from an overhead of O(N2) or O(log N). In addition, other techniques often provide lower privacy and dropout guarantees compared to the techniques described herein. In addition, the systems and methods described herein improve over other techniques by not requiring a trusted third party and by requiring much less randomness generation and at much smaller storage cost for each participating device. In addition, the systems and methods described herein can be applied to any aggregation-based approaches (e.g., FedNova, FedProx, FedOpt, etc.), as well as personalized FL frameworks (e.g., pFedMe, Ditto, Per-FedAvg, etc.).
At least one aspect of the disclosure relates to a system to generate a model based on a subset of models generated at remote devices. The system can include a first device operatively coupled with a second device. The first device can include a processor and memory. The processor and memory can generate, based on a model parameter and data restricted to the first device, a first model via machine learning. The processor and memory can partition the first model into a plurality of local mask shares each including a distinct portion of the first model. The processor and memory can encode one or more of the plurality of local mask shares into a corresponding first plurality of encoded shares. The processor and memory can generate an aggregation of encoded shares including a first encoded share having a first index among the first plurality of encoded shares and a second encoded share having the first index among a second plurality of encoded shares. The second encoded share can include a distinct portion of a second model generated by a second device via machine learning.
In some implementations, the first device can transmit, to the second device and based on a device index of the second device, the first plurality of encoded shares.
In some implementations, the first device can receive, from the second device and based on a device index of the first device, the second plurality of encoded shares.
In some implementations, the first device can generate a plurality of random masks each corresponding to one or more of the distinct portions of the first model. In some implementations, the first device can partition, based on the plurality of random masks, the plurality of local mask shares.
In some implementations, the first device can be remote from the second device.
In some implementations, the first device can transmit, to a server operatively coupled with the first device, the aggregation of encoded masks. In some implementations, the first device can transmit, to the server, the first plurality of encoded shares.
In some implementations, the first device can cause, in response to the transmission to the server, the server to generate, in response to a determination that the second device satisfies a dropout condition and based on the first plurality of encoded shares and the first aggregation of encoded shares, an aggregate model corresponding to a machine learning model comprising the first model and the second model.
In some implementations, the first device can cause, in response to the transmission to the server, the server to determine that the second device satisfies the dropout condition by a determination that an absence of transmission, from the second device, of second plurality of encoded shares each including a distinct portion of the second model generated by the second device.
In some implementations, the first device can receive, from a server operatively coupled with the first device, an instruction to generate via machine learning the first model based on the model parameter and the data restricted to the first device.
At least one aspect of the disclosure relates to a method. The method can generate a model based on a subset of models generated at remote devices. The method can include generating, based on a model parameter and data restricted to a first device operatively coupled with a second device, a first model via machine learning. The method can include partitioning the first model into a plurality of local mask shares each including a distinct portion of the first model. The method can include encoding one or more of the plurality of local mask shares into a corresponding first plurality of encoded shares. The method can include generating an aggregation of encoded shares including a first encoded share having a first index among the first plurality of encoded shares and a second encoded share having the first index among a second plurality of encoded shares. The second encoded share can include a distinct portion of a second model generated by a second device via machine learning.
In some implementations, the method can include transmitting, to the second device and based on a device index of the second device, the first plurality of encoded shares.
In some implementations, the method can include receiving, from the second device and based on a device index of the first device, the second plurality of encoded shares.
In some implementations, the method can include generating a plurality of random masks each corresponding to one or more of the distinct portions of the first model. In some implementations, the method can include partitioning, based on the plurality of random masks, the plurality of local mask shares.
In some implementations, the first device can be remote from the second device.
In some implementations, the method can include transmitting, to a server operatively coupled with the first device, the aggregation of encoded masks; and transmitting, to the server, the first plurality of encoded shares.
In some implementations, the method can include causing, in response to the transmission to the server, the server to generate, in response to a determination that the second device satisfies a dropout condition and based on the first plurality of encoded shares and the first aggregation of encoded shares, an aggregate model corresponding to a machine learning model comprising the first model and the second model.
In some implementations, the method can include causing, in response to the transmission to the server, the server to determine that the second device satisfies the dropout condition by a determination that an absence of transmission, from the second device, of second plurality of encoded shares each including a distinct portion of the second model generated by the second device.
In some implementations, the method can include receiving, from a server operatively coupled with the first device, an instruction to generate via machine learning the first model based on the model parameter and the data restricted to the first device.
At least one implementation relates to a computer readable medium. The computer readable medium can include one or more instructions stored thereon. The instructions can be executable by a processor. The instructions can be executable by the processor to generate, by the processor and based on a model parameter and data restricted to the first device, a first model via machine learning. The instructions can be executable by the processor to partition, by the processor, the first model into a plurality of local mask shares each including a distinct portion of the first model. The instructions can be executable by the processor to encode, by the processor, one or more of the plurality of local mask shares into a corresponding first plurality of encoded shares. The instructions can be executable by the processor to generate, by the processor, an aggregation of encoded shares including a first encoded share having a first index among the first plurality of encoded shares and a second encoded share having the first index among a second plurality of encoded shares, the second encoded share including a distinct portion of a second model generated by a second device via machine learning.
The computer readable medium can include one or more instructions executable by the processor to transmit, to the second device and based on a device index of the second device, the first plurality of encoded shares.
At least one other aspect of the present disclosure is directed to a method of securely aggregating masked model parameters generated by client devices. The method can be performed, for example, by one or more processors coupled to memory. The method can include transmitting a set of initial model parameters to a plurality of client devices participating in a distributed machine learning technique. The method can include receiving, from each of a first subset of the plurality of client devices, a respective set of masked model parameters updated using a machine learning technique executed at each of the subset of the plurality of client devices. The method can include identifying a second subset of the plurality of client devices that satisfy a dropout condition. The method can include transmitting, to each client device of the first subset of the plurality of client devices, a request for a local mask share corresponding to each client device in the second subset of the plurality of client devices. The method can include receiving, from each client device of the first subset, a set of local mask shares each corresponding to a respective client device in the first subset of the plurality of client devices. The method can include generating an aggregate model based on the respective set of masked model parameters received from the first subset and each set of local mask shares of the first subset of the plurality of client devices.
At least one aspect of the present disclosure is directed to a method. The method can be performed, for example, by a client device comprising one or more processors coupled to memory. The method can include generating, based on an initial set of model parameters, a set of updated model parameters using training data associated with the client device. The method can include generating a set of masked model parameters based on a generated local mask and the set of updated model parameters. The method can include generating, based on the local mask, a plurality of mask shares that each correspond to a respective one of a plurality of client devices participating in a distributed machine learning technique with the client device. The method can include transmitting each mask share of the plurality of mask shares to a respective client device of the plurality of client devices. The method can include receiving a plurality of encoded masks from the plurality of client devices. The method can include transmitting the set of masked model parameters to a machine learning system.
In some implementations, the method can include receiving a request for a subset of the plurality of encoded masks corresponding to a subset of the plurality of client devices that satisfy a participation condition. In some implementations, the method can include transmitting the subset of the plurality of encoded masks to the machine learning system.
These and other aspects and implementations are discussed in detail below. The foregoing information and the following detailed description include illustrative examples of various aspects and implementations, and provide an overview or framework for understanding the nature and character of the claimed aspects and implementations. The drawings provide illustration and a further understanding of the various aspects and implementations, and are incorporated in and constitute a part of this specification. Aspects can be combined and it will be readily appreciated that features described in the context of one aspect of the technical solution can be combined with other aspects. Aspects can be implemented in any convenient form. For example, by appropriate computer programs, which may be carried on appropriate carrier media (computer readable media), which may be tangible carrier media (e.g. disks) or intangible carrier media (e.g. communications signals). Aspects may also be implemented using suitable apparatus, which may take the form of programmable computers running computer programs arranged to implement the aspect. As used in the specification and in the claims, the singular form of ‘a’, ‘an’, and ‘the’ include plural referents unless the context clearly dictates otherwise.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
For purposes of reading the description of the various embodiments below, the following descriptions of the sections of the specification and their respective contents may be helpful:
Section A describes embodiments of systems and methods for improved secure aggregation in federated learning;
Section B includes experimental data and proofs of theorems relating to features described in Section A; and
Section C describes a network environment and computing environment which may be useful for practicing embodiments described herein
Federated learning (FL) has emerged as a promising approach to enable distributed training over a large number of devices, while protecting the privacy of each device. The approach of FL is to keep devices' data on their devices and instead train local models at each device. The locally trained models would then be aggregated via a server to update a global model, which is then pushed back to devices. Due to model inversion attacks, a critical consideration in FL design is to also ensure that the server does not learn the locally trained model of each device during model aggregation. Furthermore, model aggregation should be robust to likely device dropouts (due to poor connectivity, low battery, unavailability, etc.) in FL systems.
Other secure aggregation protocols rely on two main principles: (1) pairwise random-seed agreement between devices in order to generate masks that hide devices' models while having an additive structure that allows their cancellation when added at the server, such as SecAgg and SecAgg+; and (2) secret sharing of the random-seeds, in order to enable the reconstruction and cancellation of masks belonging to dropped devices. However, such approaches are severely bottlenecks by the number of mask reconstructions at the server, which grows substantially as more devices are dropped. This causes a major computational bottleneck and greatly reduces the efficiency of federated learning. Other approaches aimed at reducing this computational bottleneck either significantly increase round or communication complexity, as in TurboAgg (thereby further reducing security and computational performance), or compromise the dropout and privacy guarantees, thereby reducing the efficiency of aggregate model reconstruction. Other approaches like FastSecAgg provide lower privacy and dropout guarantees compared with other implementations.
The systems and methods described herein provide a secure model aggregation in FL, by performing a one-shot aggregate-mask reconstruction of surviving devices rather than performing a pairwise random-seed reconstruction of dropped devices, as in SecAgg or SecAgg+. Using these techniques, the systems and methods described herein provide the optimal privacy and dropout-resiliency guarantees while substantially reducing the aggregation and run-time complexity. This provides a significant improvement to federated learning techniques. In addition, the systems and methods described herein are in improvement over other implementations because the techniques described herein do not require a trusted third party to prepare device dropout patterns.
Using the techniques described herein, each device protects its local model using a locally generated random mask. This mask is then encoded and shared to other device, in such a way that the aggregate-mask of any sufficiently large set of surviving (e.g., still connected to the server and performing training, etc.) devices can be directly reconstructed at the server. In contrast to other techniques, such as SecAgg or SecAgg+, in this approach the server only needs to reconstruct one mask in the recovery phase, independent of the number of dropped devices. The systems and methods described herein further provide a system-level optimizations to improve overall run-time by taking advantage of the fact that the generation of random masks is independent of the computation of the local model, hence each device can parallelize these two operations via a multi-thread processing, which is beneficial to all evaluated secure aggregation protocols in reducing the total running time.
The systems and methods described herein can be utilized to train any type of machine learning model, including logistic regression, convolutional neural network (CNN), MobileNetV3, and EfficientNet-B0, for image classification over datasets of different image sizes: low resolution images (FEMNIST, CIFAR-100), and high resolution images (Google Landmarks Dataset 23k), among others. The example results described herein show that the present techniques provide significant speedup for all considered FL training tasks in the running time over other approaches, and in some implementations achieves a performance gain of 12.7×. Compared to other approaches, the techniques described herein can even survive and speedup the training of large deep neural network models on high resolution image datasets.
The systems and methods described herein can be utilized with any type of federated learning technique. As described herein, federated learning is a distributed training framework for machine learning in mobile networks while preserving the privacy of device information and training data information. One goal of federated learning is to learn the parameter for the global model x with dimension d, using data held at mobile devices. This can be represented by minimizing a global objective function F: F(x)=Σi=1Npi Fi (x) where N is the total number of devices, Fi is the local objective function of device i, and pi≥0 is a weight parameter assigned to device i to specify the relative impact of each device such that Σi=1Npi=1. For example, all devices can have equal-sized datasets, e.g.,
for all i∈[N].
Training can be performed through an iterative process where mobile devices interact through the central server to update the global model. At each iteration, the server shares the current state of the global model, denoted by x(t), with the mobile devices. Each device i creates a local update, xi(t). The local models of the N devices can be sent to the server and then aggregated by the server. Using the aggregated models, the server updates the global model x(t+1) for the next iteration. In FL, some devices may potentially drop from the learning procedure due to unreliable communication connections. The goal of the server is to obtain the sum of the remaining surviving devices' local models. This update equation is given by
where (t) denotes the set of surviving devices at iteration t. Then, the server pushes the updated global model x(t+1) to the mobile devices.
As the local models carry extensive information about the local datasets stored at the devices, e.g., the private data from the local models can be reconstructed by using a model inversion attack. To address such privacy leakage from the local models, secure aggregation has been introduced. A secure aggregation protocol enables the computation of the aggregation operation while ensuring that the server learns no information about the local models xi(t) beyond their aggregated model. In particular, our goal is to securely evaluate the aggregate of the local models y=Σi∈Uxi, where the iteration index t is omitted for simplicity. Since secure aggregation protocols build on cryptographic primitives that require all operations to be carried out over a finite field, it is assumed that the elements of xi and y can be from a finite field Fq for some field size q. The performance of a secure aggregation protocol for FL is evaluated through the following two key guarantees.
The systems and methods described herein provide a privacy guarantee. A threat model is considered where the devices and the server are honest but curious. It is assumed that up to T devices can collude with each other as well as with the server to infer the local models of other devices. The secure aggregation protocol has to guarantee that nothing can be learned beyond the aggregate-model, even if up to T devices cooperate with each other. Privacy is considered in the information-theoretic sense. For every subset of devices ⊆[N] of size at most T, there must be mutual information I({xi}i∈[N]; Y|Σi∈Uxi, )=0, where Y is the collection of information at the server, and is the collection of information at the devices in .
The systems and methods described herein provide a drop-out resiliency guarantee. In FL, devices may get dropped or delayed at any time during protocol execution due to various reasons, e.g., poor wireless channel conditions, low battery, etc. It is assumed that there can be at most D dropped devices during the execution of protocol, e.g., there can be at least N−D surviving devices after potential dropouts. The protocol has to guarantee that the server can correctly recover the aggregated models of the surviving devices, even if up to D devices drop.
The systems and methods described herein provide an efficient and scalable secure aggregation protocol that simultaneously achieves strong privacy and dropout-resiliency guarantees, scaling linearly with the number of devices N, e.g., simultaneously achieves privacy guarantee
and dropout-resiliency guarantee
It is noted that SecAgg, and other secure aggregation protocols, requires the server to compute a PRG function on each of reconstructed seeds to recover the aggregated masks, which incurs the overhead of O(N2) and dominates the overall execution time of the protocol. SecAgg+ reduces the overhead of mask reconstructions from O(N2) to O(N logN) by replacing the complete communication graph of SecAgg with a sparse random graph of degree O(log N) to reduce both communication and computation loads. Reconstructing pairwise random masks in SecAgg and SecAgg+ poses major bottlenecks in scaling to a large number of devices. To overcome such computational bottleneck, the systems and methods described herein enable the server to recover the aggregate-mask of all surviving devices in one shot, while maintaining the same privacy and dropout-resiliency guarantees. Both SecAgg, SecAgg+, and other secure aggregation protocols, lack such one-shot mask generation of the techniques described herein, making the techniques described herein a significant improvement over other secure aggregation techniques in terms of privacy guarantees, dropout resiliency, and overall computational performance.
Referring now to
The server 11 can include at least one processor and a memory, e.g., a processing circuit. The memory can store processor-executable instructions that, when executed by processor, cause the processor to perform one or more of the operations described herein. The processor may include a microprocessor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), etc., or combinations thereof. The memory may include, but is not limited to, electronic, optical, magnetic, or any other storage or transmission device capable of providing the processor with program instructions. The memory may further include a floppy disk, CD-ROM, DVD, magnetic disk, memory chip, ASIC, FPGA, read-only memory (ROM), random-access memory (RAM), electrically erasable programmable ROM (EEPROM), erasable programmable ROM (EPROM), flash memory, optical media, or any other suitable memory from which the processor can read instructions. The instructions may include code from any suitable computer programming language. The server 11 can include one or more computing devices or servers that can perform various functions as described herein. The server 11 can include any or all of the components and perform any or all of the functions of the computer system 900 described herein in conjunction with
Each device 150 (e.g., device 1, device 2, . . . , device N) (sometimes referred to herein as “client device(s) 150”) can include at least one processor and a memory, e.g., a processing circuit. The memory can store processor-executable instructions that, when executed by processor, cause the processor to perform one or more of the operations described herein. The processor may include a microprocessor, an ASIC, an FPGA, etc., or combinations thereof. The memory may include, but is not limited to, electronic, optical, magnetic, or any other storage or transmission device capable of providing the processor with program instructions. The memory may further include a floppy disk, CD-ROM, DVD, magnetic disk, memory chip, ASIC, FPGA, ROM, RAM, EEPROM, EPROM, flash memory, optical media, or any other suitable memory from which the processor can read instructions. The instructions may include code from any suitable computer programming language. Each device 150 (e.g., device 1, device 2, . . . , device N) can include one or more computing devices or servers that can perform various functions as described herein. Each device 1, 2, ... , N can include any or all of the components and perform any or all of the functions of the computer system 900 described herein in conjunction with
The communication network via which the server 11 and the devices 1, 2, . . . , N communicate can include computer networks such as the Internet, local, wide, metro or other area networks, intranets, satellite networks, other computer networks such as voice or data mobile phone communication networks, and combinations thereof. The server 11 can communicate via the network, for instance with one or more of the devices 1, 2, . . . , N. The network may be any form of computer network that can relay information between the server 11, the one or more devices 1, 2, . . . , N, and one or more information sources, such as web servers or external databases, amongst others. In some implementations, the network may include the Internet and/or other types of data networks, such as a local area network (LAN), a wide area network (WAN), a cellular network, a satellite network, or other types of data networks. The network may also include any number of computing devices (e.g., computers, servers, routers, network switches, etc.) that can be configured to receive and/or transmit data within the network. The network may further include any number of hardwired and/or wireless connections. Any or all of the computing devices described herein (e.g., the server 11, the devices 1, 2, . . . , N, the computer system 900, etc.) may communicate wirelessly (e.g., via WiFi, cellular, radio, etc.) with a transceiver that is hardwired (e.g., via a fiber optic cable, a CAT5 cable, etc.) to other computing devices in the network. Any or all of the computing devices described herein (e.g., the server 11, the devices 1, 2, . . . , N, the computer system 900, etc.) may also communicate wirelessly with the computing devices of the network via a proxy device (e.g., a router, network switch, or gateway). In some implementations, the network can form a part of a cloud computing system.
As shown in phase 105 of the diagram 100, each device 150. One of the devices 150 may be referred to herein as device i∈{1,2,3}. Device i∈{1,2,3}, holds (e.g., stores in a memory, etc.) a local model xi∈qd. Each device 150 first generates a single mask. Each mask of a device is encoded and shared to other devices. Each device's local model is protected by its generated mask. Suppose that device 1 drops during the execution of protocol. The server 11 directly recovers the aggregate-mask in one shot. In this example, the techniques described herein (sometimes referred to as “LightSecAgg,” the “LightSecAgg protocol,” or the “present techniques”) reduces the computational cost at the server 11 from 4d (e.g., of other protocols) to d. Before providing a general description of the techniques described herein, aspects of the protocol can be described through the 3-device example shown in
The first phase 105 of the present techniques includes offline encoding and sharing of local masks between client devices participating in the distributed machine learning technique. Device i∈{1,2,3} randomly picks zi and ni from qd. Device i∈{1,2,3} creates the masked version of zi by encoding zi and ni,
and device i sends {tilde over (z)}i,j to each device j∈{1,2,3}. Device j∈{1,2,3} may refer to one of the devices 150 that is not device i∈{1,2,3}. Thus, device i receives {tilde over (z)}j,i for j∈{1,2,3}. In this case, this procedure provides robustness against 1 dropped device, shown in
The next phase 110 of the present techniques includes masking and uploading of local models. To make each individual model private, each surviving device 175 (e.g., device i∈{1,2,3}) masks its local model as follows:
and sends its masked model to the server 11, as shown by models 160 and 165 in
The third phase 115 of the present techniques includes one-shot aggregate-model recovery. Suppose that device 1 drops in the previous phase (i.e., device 1 is a dropped device 170). To recover the aggregate of models x2+x3, the server 11 only needs to know the aggregated masks z2+z3. To recover z2+z3, the surviving devices 175 (e.g., device 2 and device 3) send {tilde over (z)}2,2+{tilde over (z)}3,2 and {tilde over (z)}2,3+{tilde over (z)}3,3,
to the server 11, respectively. After receiving the messages from surviving devices 175 (e.g., device 2 and device 3), the server 11 can directly recover the aggregated masks via a one-shot computation as follows:
Then, the server 11 recovers the aggregate-model x2+x3 by subtracting z2+z3 from {tilde over (x)}2+{tilde over (x)}3. As opposed to other techniques, such as SecAgg, which has to reconstruct the dropped random seeds, the present techniques enable the server to reconstruct the desired aggregate of masks via a direct one-shot recovery. Compared with SecAgg, for example, present techniques reduce the server's computational cost from 4d to d in this simple example, which is a significant technical improvement. Although the foregoing has been described in the context of only three devices, it should be understood that the present techniques can be expanded to accommodate any number of client devices 150 participating in a distributed learning protocol, such as federated learning.
Referring now to
The present techniques can include three phases. First, as shown in
As described herein above, and as shown in
With the randomly picked
device i∈[N] encodes sub-masks [zi]k's as
where Wj is j'th column of a T-private MDS matrix W∈qU×N of dimension U×N . In particular, an MDS matrix is T-private if the submatrix consists of its {U−T+1, . . . , U}−th rows is also MDS. A T-private MDS matrix guarantees I ({tilde over (z)}i; {[{tilde over (z)}i]j)=0, for any i∈[N] and any ⊆[N] of size T, if [ni]k's can be jointly uniformly random. T-private MDS matrices can be calculated for any U, N, and T. As described herein, the values U, N, D, and T can be predetermined, specified by an operator of the server, or received from another computing device via the communication network. Each device i∈[N] sends [{tilde over (z)}i]j to device j∈[N]\{i} In the end of offline encoding and sharing 220 of local masks, each device i∈[N] has [{tilde over (z)}j]i from j∈[N]. A T-private MDS guarantees privacy against any T colluding devices.
As described herein, another aspect of the present techniques includes masking and uploading of local models. To protect local models, each device i masks its local model as {tilde over (x)}i=xi+zi, and sends it to the server 11. Since some devices may drop in this phase, the server 11 identifies the set of surviving devices, denoted by 1⊆[N]. The server 11 intends to recover xi.
As described herein, another aspect of the present techniques includes one-shot aggregate-model recovery. After identifying the surviving devices in the previous phase, device j∈1, is informed to send its aggregated encoded sub-masks [{tilde over (z)}i]j to the server 11 for the purpose of one-shot recovery. It is noted that each [{tilde over (z)}i]j is an encoded version [zi]k for 5∈[U−T] using the MDS matrix W (see more details in Section B). Thus, the server 11 is able to recover a set of any U messages from the participating devices, where this set is denoted by 2, |2|=U. The server 11 obtains the aggregated masks zi by concatenating [zi]k's 250. Lastly, the server 11 recovers the desired aggregate of models 260 for the set of participating devices 1 by subtracting zi from {tilde over (x)}i. In other words, the server 11 can recover zi by any U encoded masks, providing robustness against D dropped devices 235.
Theorem 1. Consider a secure aggregation problem in federated learning with N devices, the proposed LightSecAgg protocol can simultaneously achieve (1) privacy guarantee against up to any T colluding devices, and (2) dropout-resiliency guarantee against up to any D dropped devices, for any pair of privacy guarantee T and dropout-resiliency guarantee D such that T+D<N.
The proof of Theorem 1 is presented in Section B.
Remark 1. Theorem 1 provides a trade-off between privacy and dropout-resiliency guarantees, i.e., LightSecAgg can increase the privacy guarantee by reducing the dropout-resiliency guarantee and vice versa. LightSecAgg achieves the worst-case dropout-resiliency guarantee. That is, for any privacy guarantee T and the number of dropped devices D<N−T, LightSecAgg ensures that any set of dropped devices of size D) in secure aggregation can be tolerated. Differently, SecAgg+, FastSecAgg and TurboAgg relax the worst-case constraint to random dropouts, and provide probabilistic dropout-resiliency guarantee, i.e., the desired aggregate-model can be correctly recovered with high probability. Therefore, LightSecAgg is an improvement over other secure aggregation techniques such as SecAgg, SecAgg+, FastSecAgg and TurboAgg.
The storage cost, communication load, and computation load of the present techniques can be measured in unit of elements or operations in q. Recall that U is a design parameter chosen such that N−D≥U>T. The offline storage cost of the present techniques is as follows. Each device i independently generates a random mask zi oflength d. Also, each device i stores a coded mask [zj]i of size
for each j=1, . . . , N. Hence, the total offline storage cost at each device is
The offline communication and computation loads of the present techniques can be as follows. For each iteration of secure aggregation, before the local model is computed, each device prepares offline coded random masks and distributes them to the other devices. Specifically, each device encodes U local data segments with each of size
into N coded segments and distributes each of them to one of N devices. Hence, the offline computation and communication load of LightSecAgg at each device is
respectively.
The communication load of the present techniques during aggregation of the present techniques is as follows. While each device uploads a masked model of length d, in the phase of aggregate-model recovery, no matter how many devices drop, each surviving device in U1 sends The server 11 is guaranteed to recover the aggregate-model of the U1 a coded mask of size
The server 11 is guaranteed to recover the aggregate-model of the U1 after receiving messages from any U devices. The total required communication load at the server 11 in the phase of mask recovery is therefore
The computation load of the present techniques during aggregation. For LightSecAgg, the major computation bottleneck is the decoding process to recover zj at the server 11. This involves decoding a dimension-U MDS code from U coded symbols, which can be performed with O(U log U) operations on elements in q, hence a total computation load of
The communication and computation complexities of LightSecAgg can be compared with baseline protocols. In particular, the case where secure aggregation protocols aim at providing privacy guarantee
and dropout-resiliency guarantee D=pN simultaneously is considered, for some
As shown in Table 1, by choosing U=(1−p)N, LightSecAgg significantly improves the computation efficiency at the server during aggregation. SecAgg and SecAgg+ incurs a total computation load of O(dN2) and O(dN log N) respectively at the server, while the server complexity of LightSecAgg almost stays constant with respect to N. The is expected to substantially reduce the overall aggregation time for a large number of devices, which has been bottlenecked by the server's computation in SecAgg. More detailed discussions, as well as a comparison with another
recently proposed secure aggregation protocol, which achieves similar server complexity as LightSecAgg, can be carried out in Section B.
The performance of the present techniques over two baseline protocols, SecAgg and SecAgg+ in a realistic FL framework with up to N=200 devices to train various machine learning models, is as follows.
In an example, the present techniques can be implemented on a distributed platform to train various machine learning models for image classification tasks, and examine its total running time of one global iteration with respect to the two baseline protocols. To model a FL framework, m3.medium instances over Amazon EC2 cloud can be used, and implement communication using the MPI4Py message passing interface on Python. To provide a comprehensive coverage of realistic FL settings, four machine learning models can be trained over datasets of different sizes, which is summarized in Table 2. The hyper-parameter settings can be provided in Section B.
The dropout and privacy modeling of an example can be as follows. To model the dropped devices, pN devices can be randomly selected, where p is the dropout rate. This can be considered the worst-case scenario in which the selected pN devices artificially drop after uploading the masked model. All three protocols provide privacy guarantee
and resiliency for three different dropout rates, p=0.1, p=0.3, and p=0.5.
The system-level optimization of an example is as follows. Three protocols can be implemented, LightSecAgg, SecAgg, and SecAgg+, and the following system-level optimizations can be applied to speed up their executions. The parallelization of offline phase and model training for an example is as follows. In all three protocols, communication and computation time to generate and exchange the random masks in the offline phase can be overlapped with model training. That is, each device locally trains the model and carries out the offline phase simultaneously by running two parallel threads.
The amortized communication for an example is as follows. LightSecAgg can further speed up the communication time in the offline phase by parallelizing the transmission and reception of [{tilde over (z)}i]j via multi-threading.
Example performance evaluation of an example is as follows. For the performance analysis, the total running time for a single round of a global iteration is measured, which includes model training and secure aggregation with each protocol while increasing the number of devices N gradually for different device dropouts. Example results to train CNN on the FEMNIST dataset are depicted in
The total running time of SecAgg and SecAgg+ increases monotonically with the dropout rate. This is because their total running time is dominated by the mask recovery at the server, which increases quadratically with the number of devices. As shown in
As shown in
The speedup is due to the fact that LightSecAgg requires more communication and computation cost in the offline phase than the baseline protocols and the overlapped implementation helps to mitigate this extra cost. LightSecAgg provides the smallest speedup in the most training-intensive task, training EfficientNet-B0 on GLD-23K dataset. This is due to the fact that training time is dominant in this task, and training takes almost the same time in LightSecAgg and baseline protocols. In particular, LightSecAgg provides significant speedup of the aggregate-model recovery phase at the server over the baseline protocols in all considered tasks.
In an example, the present techniques incur the smallest running time for the case of p=0.3, which is almost identical to the case of p=0.1. Recall that LightSecAgg can select design parameter U between T=0.5N and N−D=(1−p)N. Within this range, while increasing U reduces the size of the symbol to be decoded, it also increases the complexity of decoding each symbol. The example experimental results suggest that one choice for the cases of p=0.1 and p=0.3 can be both=[0.7N], which leads to a faster execution than the case of p=0.5 where U can only be chosen as U=0.5N+1.
The systems and methods described herein provide an improved approach for secure aggregation in federated learning. Compared with other secure aggregation protocols, the present techniques reduce the overhead of model aggregation in FL by leveraging one-shot aggregate-mask reconstruction of the active devices, while providing the same privacy and dropout-resiliency guarantees. In a realistic FL framework, extensive empirical results show that the present techniques can provide substantial speedup over baseline protocols for training diverse machine learning models with a performance gain, for example, of 12.7×, which is a significant improvement over other secure aggregation protocols.
The systems and methods described herein provide societal benefit of protecting device privacy in FL, where hundreds of devices can jointly train a global machine learning model without revealing information about their individual datasets. In addition, the systems and methods described herein can be combined byzantine resilient aggregation protocols in FL to address potential active/byzantine adversaries via model/data poisoning attacks.
Referring now to
In further detail of the method 300A, at step 302, the server (e.g., the server 11, etc.) can transmit a set of initial model parameters to client devices (e.g., one or more of the devices 150, such as the device 1, 2, i, . . . , N described herein in connection with
In some implementations, at the start of a round of federated learning, the server can determine which client devices 150 can be participating in the round of federated learning. To do so, the server 11 can attempt to initiate one or more communications sessions with each client device in a list of candidate client devices that can be eligible to participate in the federated learning protocol. In some implementations, the list of candidate client devices can be received from an external computing device, from a local configuration file, or generated by the server based on messages received from candidate client devices. To identify which client devices in the list will participate in a round, the server 11 can, for example, attempt to establish a communication session with each client device in the list, and transmit the set of initial model parameters for the round of federated learning to the client device via the communication session.
At step 304, the server can receive a set of masked model parameters from a device. As described herein, each of the client devices 150 can perform a training procedure using the initial model parameters and local training data to generate a local model. However, simply transmitting the locally trained models to the server exposes the models to potential model inversion attacks and impacts the privacy of the local training data at each client device 150. To protect the privacy of the local models at each client device 150, the server and the client devices 150 can participate in the secure aggregation protocol techniques described herein. In doing so, after each client device 150 trains their local model, the client device 150 masks the local model and transmits the masked model to the server. The server 11 can receive the local model, for example, in response to a request transmitted by the server 11 to the client device 150. Masking local model parameters can include adding information (e.g., the mask zi) to the model to preserve the privacy of the model. In masking the updated model parameters, a client device “masks,” or obfuscates, the model parameters such that the unmasked updated model parameters cannot be obtained without subtracting the mask from the model. Using the processes described herein, the server can generate a one-shot reconstruction of the model using local mask shares generated by dropped and active client devices.
In some implementations, one or more client devices 150 participating in the techniques described herein may “drop out,” or become unresponsive to communications. In such implementations, the server 11 can attempt to communicate with a client device for a predetermined amount of time before determining whether the cease communications with the client device 150, and store a flag in association with an identifier of said client device 150 indicating that the client device 150 has dropped from the secure aggregation techniques described herein. In some implementations, a client device 150 can be identified as a “dropped” client device in response to other conditions (e.g., a detected signal quality between the server and the client device being below a predetermined threshold, etc.). If the server 11 can establish a valid communication session with a client device 150, the server 11 can transmit a request for the masked updated model parameters to the client device 150, and the receive the masked model parameters from the client device 150 in response to the request. In some implementations, client devices themselves can attempt to establish a communication with the server 11 upon completing generation of the updated masked model parameters, and transmit the updated masked model parameters to the server without the need for a corresponding request. The server 11 can store an indication that a client device 150 is active if the server 11 receives valid masked model parameters from the client device 150, and likewise store an indication that a client device 150 is dropped if the server 11 does not receive valid masked model parameters.
At step 306, the server 11 can determine whether parameters have been received from a first subset of devices. As described herein, the server 11 can operate the secure aggregation techniques using a predetermined parameter U, which indicates a required number of client devices that can be not dropped from the federated learning round in order for secure aggregation to be successful. The parameter U can be specified, for example, prior to starting the round of federated learning, and can likewise be shared with each of the client devices in step 302, with the initial model parameters for the round. To continue with secure aggregation, the server can determine whether updated and masked model parameters have been received from at least U client devices (e.g., a first subset of client devices). If updated and masked model parameters have not been received from at least U non-dropped client devices, the server can continue to execute step 304 of the method 300A and continue to request or wait for information to be received from client devices participating in the federated learning process. If updated and masked model parameters have been received from at least U client devices, the server can execute step 308 of the method 300A. In some implementations, the server may execute step 308 even when more than U client devices have provided updated and masked model parameters.
At step 308, the server 11 can identify a second subset of dropped. At this stage in the secure aggregation protocols described herein, the server 11 can determine which of the client devices 150 have dropped from the federated learning round. Using this information, the server 11 can request aggregated local mask share data from the remaining U client devices 150 to reconstruct the aggregate model in a one-shot reconstruction, as described herein. The server 11 can determine which of the client devices 150 can be dropped by accessing the list of client device identifiers identify client devices that have dropped from the federated learning protocol. Because the identifiers of client devices that failed to provide updated and masked model parameters can be stored in association with a flag, the server 11 can identify the dropped client devices (e.g., the second subset) by iterating through the list and identifying any “dropped” flags. The server 11 can then generate a list of dropped client devices as including identifiers of each client device 150 that is stored in association with a “dropped” flag. D represents the number of dropped devices.
At step 310, the server 11 can request the aggregate of local mask shares from the first subset of devices. Once the dropped client devices have been identified, the server 11 can request the aggregate of local mask shares corresponding to each of the active client devices from each of the active client devices in the first subset. In some implementations, the request can include an identification of each of the non-dropped client devices (e.g., client devices that provided the masked and updated model parameters). Upon receiving the request, each of the client devices 150 can determine the aggregate of mask shares for the active clients identified in the request, and transmit the aggregate of the local mask shares to the server 11 for secure one-shot aggregation of the machine learning model. The server 11 can store each of the received sets of local mask shares in one or more data structures in the memory of the server.
At step 312, the server can generate an aggregate model from the masked model parameters and local mask shares. Once the updated and mask model parameters and each set of local mask shares have been received from the first subset of the client devices 150, the server 11 can generate a one-shot reconstruction of the model. To do so, the server 11 can aggregate the local mask shares received from the active client devices 1 to calculate zi by concatenating [zi]k's. As described herein, the sum of each set of local mask shares [{tilde over (z)}i]j is an encoded version of [zi]k for k∈[U−T] using the MDS matrix W. The proof for this theorem is described in Section B of the present disclosure. Once the aggregate mask is calculated by concatenating the sum of the local mask shares, the server 11 can perform one-shot model reconstruction to calculate the aggregate model 1xi by subtracting the aggregate mask zi from the sum of the aggregate masked model parameters {tilde over (x)}i. In doing so, the server 11 is able to recover an aggregate model using any U encoded masks, which provides robustness against D dropped devices and T colluding devices.
Referring now to
In further detail of the method 300B, at step 322, the client device 150 (e.g., any of the devices 1, 2, i, . . . , N described herein in connection with
At step 324, the client device 150 can generate masked parameters based on updated model parameters. To make sure the models can be resistant to model inversion attacks that attempt to reconstruct training data from trained model parameters, the client device i can generate a mask zi for its share of the model xi. By summing the mask z and the set of model parameters x, the client device can effectively mask the true model parameters trained by the client device to protect against model inversion attacks. To generate the mask z, the client device selects a random set of values z (e.g., each corresponding to a respective parameter in the set of model parameters xi) from the F-space qd of the model xi, where the local model xi∈qd. In some implementations, randomly selecting values can include executing a random number generator that returns random values from qd. In some implementations, the random number generator can be a pseudo-random number generator that utilizes a suitable pseudo-random number generation process. To generate an updated and masked set of model parameters {tilde over (x)}i, the client device can calculate a sum between the set of model parameters xi and the generated mask zi, such that {tilde over (x)}i=xi+zi, where i corresponds to the respective client device performing the method 300B in the secure aggregation protocol. In some implementations, the generation of the mask and the generation and sharing of local mask shares (e.g., steps 324-330) can be executed prior to generating the updated model parameters xi from the initial model parameters received from the server 11.
At step 326, the client device 150 can generate local mask shares for another client device. Once the local mask zi is generated by the client device, the local mask is partitioned to U−T sub-masks
k∈[U−T] of equal size, where U is a predetermined number of client devices that must survive (e.g., be able to communicate information with the server at the aggregation phase of the federated learning round), and T is the predetermined number of devices that can collude without risking privacy of any piece of the aggregate model xi. To protect against T colluding devices, the client device 150 can randomly select a value [ni]k's from the F-space
for k∈{U−T+1, . . . , U} device i∈[N] encodes sub-masks [zi]k's as [{tilde over (z)}i]j=([zi]1, . . . , [zi]U−T, [ni]U−T+1, . . . , [ni]U)·Wj, where N is the number of client devices participating in the federated learning round. The value N can be provided by the server to the client device, for example, when the server transmits the initial model parameters to the client device. In the formula above, the data structure Wj is j-th column of a T-private MDS matrix W∈qU×N of dimension U×N. An MDS matrix is considered T-private if the submatrix consists of its {U−T+1, . . . , U}−th rows is also MDS. A T-private MDS matrix guarantees I ({tilde over (z)}i; {[{tilde over (z)}i]j=0, for any i∈[N] and any ∈[N] of size T, if [ni]k's can be jointly uniformly random. The values of [ni]k can be selected or generated by the client device to be jointly uniformly random. The T-private MDS matrices can be calculated for any U, N, and T.
At step 328, the client device 150 can determine whether local mask shares have been generated for each other client device. The client device can calculate each value [{tilde over (z)}i]j for j∈[N], and therefore generate an encoded mask partition for each [{tilde over (z)}i]N, where i represents the client device executing the method 300B. The client device can iteratively perform these calculations, for example, sequentially or in parallel. The client device can determine whether the set of local mask shares has been generated by incrementing the counter register j each time the value of [{tilde over (z)}i]j is generated. If the counter register j is equal to the number of participating client devices N, the client device can execute STEP 330 of the method 300B to share the local mask shares with the other client devices participating in the federated learning round. If the counter register j is not equal to the number of participating client devices N, the client device can increment the counter register j and execute step 326.
At step 330, the client device can transmit the local mask shares to the other client devices. After computing the mask shares [{tilde over (z)}i]j for each client device N, the client device i can transmit [{tilde over (z)}i]j to each device j∈[N]\{i}. In the end of offline encoding and sharing of local masks, the client device i, and each client device i∈[N], has local mask shares [{tilde over (z)}j]i from each client device j∈[N]. Transmitting the mask shares to each client device can include, for example, initiating a communication session between each of the client devices. Once each client device j has received mask shares [{tilde over (z)}i]j generated by the client device i, each client device j can transmit a confirmation message to the client device i indicating that mask shares [{tilde over (z)}i]j have been received.
at step 332, the client device can receive encoded masks from the other client devices. Likewise, the client device i can receive the local mask shares [{tilde over (z)}j] generated by each of the client devices j. As such, in the end of offline encoding and sharing of local masks, the client device i, and each client device i∈[N], has local mask shares [{tilde over (z)}j]i from each client device j∈[N]. Upon the client device i receiving all of the local mask shares [{tilde over (z)}j]i from a client device j, the client device i can transmit a confirmation message to the client device j indicating that all of the local mask shares generated by the client device j have been received. As described herein above, in some implementations, mask share generation and sharing can occur prior to generating the updated model parameters in step 322.
At step 334, the client device can transmit the masked model parameters to the server. Once the masked model parameters can be generated by the client device, and local mask shares have been transmitted to the other client devices participating in the federated learning round, the client device i can transmit the updated and masked model parameters {tilde over (x)}i to the client device. As described herein above, the server 11 can utilize the updated and masked model parameters from each client device to perform a secure one-shot aggregation of the updated model as described at least in the method 300A shown in
The client device 150 can receive the request from the server 11. The request can identify a subset of the participating client devices that can be not dropped from the federated learning round. Client devices 150 can be dropped from a federated learning round for by satisfying a drop condition. For example, the server may be unable to establish a communication session between the server and a client device, and subsequently determine that the client device should be dropped. In some implementations, the drop condition can be a connection quality metric between the server and a client device that falls below a predetermined connectivity threshold.
In response to the request, the client device 150 can transmit the aggregate of local mask shares that correspond to surviving client devices identified in the request to the server 11. In some implementations, the client device 150 can aggregate the subset of the local mask shares prior to transmission, for example, by computing a sum of the local mask shares as [{tilde over (z)}i]j, where 1 represents the subset of surviving client devices, and i represents the client device performing the method 300B. Using the aggregated subset of the local mask shares, the server can perform one-shot reconstruction of the aggregate model of all surviving client devices as described herein in connection with the method 300B of
For any pair of privacy guarantee T and dropout-resiliency guarantee D such that T+D<N, an arbitrary U is selected such that N−D≥U>T. In the following, it is shown that LightSecAgg with chosen design parameters T, D and U can simultaneously achieve (1) privacy guarantee against up to any T colluding devices, and (2) dropout-resiliency guarantee against up to any D dropped devices. The concatenation of {[ni]k}k∈U−T+1, . . . , U is denoted by ni for i∈[N]. Since each device encodes its sub-masks by the same MDS matrix W, each [{tilde over (z)}i]j is an encoded version of [{tilde over (z)}i]k for k∈[U−T] and [ni]k for k∈{U−T+1, . . . , U} as follows:
Lemma 1 is presented below, whose proof is provided in Section B.
Lemma 1. For any ⊆[N] of size T and any 1⊆[N], |1|≥U such that U≥T, if the random masks [ni]k's can be jointly uniformly random:
The worst-case can be that all the messages sent from the devices can be received by the server during the execution of LightSecAgg, e.g., the devices identified as dropped can be delayed. Thus, the server receives xi+zi from device i∈[N] and [{tilde over (z)}j]i from device i∈1. It is now shown that LightSecAgg provides privacy guarantee T, i.e., for an arbitrary set of colluding devices of size T, the following holds,
The proof is as follows:
As shown in Table 3, compared with the SecAgg protocol, LightSecAgg significantly improves the computation efficiency at the server during aggregation. While SecAgg requires the server to retrieve T+1 secret shares of a secret key for each of the N devices, and to compute a single PRG function if the device survives, or N−1 PGR functions to recover N−1 pairwise masks if the device drops off, yielding a total computation load of O(N2d) at the server. In contrast, for U=O(N), LightSecAgg incurs an almost constant O(dlogN)) computation load at the server. This admits a scalable design and is expected to achieve a much faster end-to-end execution for a large number of devices, given the fact that the overall execution time is dominated by the server's computation in SecAgg. SecAgg has a smaller storage overhead than LightSecAgg as secret shares of keys with small size (e.g., as small as an integer) can be stored, and the model size d is much larger than the number of devices N in typical FL scenarios. This effect will also allow SecAgg to have a smaller communication load in the phase of aggregate-model recovery. Finally, it is noted that another advantage of LightSecAgg over SecAgg is the reduced dependence on cryptographic primitives like public key infrastructure and key agreement mechanism, which further simplifies the implementation of the protocol. SecAgg+ improves both communication and computation load of SecAgg by considering a sparse random graph of degree O(logN), and the complexity is reduced by factor of
However, SecAgg+ still incurs O(dN log N) computation load at the server, which is much larger than O(d log N) computation load at the server in LightSecAgg when U=O(N).
In other techniques, all randomness can be generated at some external trusted party, and for each subset of 1 of size |1|≥U the trusted party needs to generate T random symbols in
which account to a total amount of randomness that increases exponentially with N. LightSecAgg does not require a trusted third party, and each device generates locally a set of T random symbols. It significantly improve the practicality of LightSecAgg to maintain model security, and further reduce the total amount of needed randomness to scale linearly with N. Consequently, the local offline storage of each device in LightSecAgg scales linearly with N, as opposed to scaling exponentially.
In this section, example experiment details are provided. Besides the results on training CNN on the FEMNIST dataset as shown in
For an arbitrary set of colluding devices of size T:
The T-private MDS matrix used in LightSecAgg guarantees I (zi; {[{tilde over (z)}i]j)=0. Thus,
where equation (25) follows from the chain rule. Equation (26) follows from the independence of zi's and I (zi; {[{tilde over (z)}i]j)=0. Equation (27) follows from the fact that joint entropy is less than or equal to the sum of the individual entropies, combined with the non-negativity of mutual information
Having discussed specific embodiments of the present solution, it may be helpful to describe aspects of the operating environment as well as associated system components (e.g., hardware elements) in connection with the methods and systems described herein. The details of an embodiment of each device (e.g., the devices participating in the federated learning technique, the server, etc) are described in greater detail with reference to
Network connections between the devices or servers described herein may include any type and/or form of network and may include any of the following: a point-to-point network, a broadcast network, a telecommunications network, a data communication network, a computer network. The topology of the network may be a bus, star, or ring network topology. The network may be of any such network topology as known to those ordinarily skilled in the art capable of supporting the operations described herein. In some embodiments, different types of data may be transmitted via different protocols. In other embodiments, the same types of data may be transmitted via different protocols.
The device(s) or server(s) described herein may be deployed as and/or executed on any type and form of computing device, such as a computer, network device or appliance capable of communicating on any type and form of network and performing the operations described herein.
The central processing unit 921 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 922. In many embodiments, the central processing unit 921 is provided by a microprocessor unit, such as: those manufactured by Intel Corporation of Mountain View, California; those manufactured by International Business Machines of White Plains, New York; or those manufactured by Advanced Micro Devices of Sunnyvale, California. The computing device 900 may be based on any of these processors, or any other processor capable of operating as described herein.
Main memory unit 922 may be one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the microprocessor 921, such as any type or variant of Static random access memory (SRAM), Dynamic random access memory (DRAM), Ferroelectric RAM (FRAM), NAND Flash, NOR Flash and Solid State Drives (SSD). The main memory 922 may be based on any of the above described memory chips, or any other available memory chips capable of operating as described herein. In the embodiment shown in
A wide variety of I/O devices 930a-930n may be present in the computing device 900. Input devices include keyboards, mice, trackpads, trackballs, microphones, dials, touch pads, touch screen, and drawing tablets. Output devices include video displays, speakers, inkjet printers, laser printers, projectors and dye-sublimation printers. The I/O devices may be controlled by an I/O controller 923 as shown in
Referring again to
Furthermore, the computing device 900 may include a network interface 918 to interface to the network 904 through a variety of connections including, but not limited to, standard telephone lines, LAN or WAN links (e.g., 802.11, T1, T3, 56 kb, X.25, SNA, DECNET), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethernet-over-SONET), wireless connections, or some combination of any or all of the above. Connections can be established using a variety of communication protocols (e.g., TCP/IP, IPX, SPX, NetBIOS, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), RS232, IEEE 802.11, IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.11n, IEEE 802.11ac, IEEE 802.11ad, CDMA, GSM, WiMax and direct asynchronous connections). In one embodiment, the computing device 900 communicates with other computing devices 900′ via any type and/or form of gateway or tunneling protocol such as Secure Socket Layer (SSL) or Transport Layer Security (TLS). The network interface 918 may include a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing device 900 to any type of network capable of communication and performing the operations described herein.
In some embodiments, the computing device 900 may include or be connected to one or more display devices 924a-924n. As such, any of the I/O devices 930a-930n and/or the I/O controller 923 may include any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of the display device(s) 924a-924n by the computing device 900. For example, the computing device 900 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display device(s) 924a-924n. In one embodiment, a video adapter may include multiple connectors to interface to the display device(s) 924a-924n. In other embodiments, the computing device 900 may include multiple video adapters, with each video adapter connected to the display device(s) 924a-924n. In some embodiments, any portion of the operating system of the computing device 900 may be configured for using multiple displays 924a-924n. One ordinarily skilled in the art will recognize and appreciate the various ways and embodiments that a computing device 900 may be configured to have one or more display devices 924a-924n.
In further embodiments, an I/O device 930 may be a bridge between the system bus 950 and an external communication bus, such as a USB bus, an Apple Desktop Bus, an RS-232 serial connection, a SCSI bus, a FireWire bus, a FireWire 800 bus, an Ethernet bus, an AppleTalk bus, a Gigabit Ethernet bus, an Asynchronous Transfer Mode bus, a FibreChannel bus, a Serial Attached small computer system interface bus, a USB connection, or a HDMI bus.
A computing device 900 of the sort depicted in
The computer system 900 can be any workstation, telephone, desktop computer, laptop or notebook computer, server, handheld computer, mobile telephone or other portable telecommunications device, media playing device, a gaming system, mobile computing device, or any other type and/or form of computing, telecommunications or media device that is capable of communication. The computer system 900 has sufficient processor power and memory capacity to perform the operations described herein.
In some embodiments, the computing device 900 may have different processors, operating systems, and input devices consistent with the device. For example, in one embodiment, the computing device 900 is a smart phone, mobile device, tablet or personal digital assistant. In still other embodiments, the computing device 900 is an Android-based mobile device, an iPhone smart phone manufactured by Apple Computer of Cupertino, California, or a Blackberry or WebOS-based handheld device or smart phone, such as the devices manufactured by Research In Motion Limited. Moreover, the computing device 900 can be any workstation, desktop computer, laptop or notebook computer, server, handheld computer, mobile telephone, any other computer, or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described herein.
At 1010, the method 1000 can generate a first model via machine learning. For example, the method 1000 can generate a first model via machine learning based on a model parameter and data restricted to a first device operatively coupled with a second device. The method 1000 can then continue to 1020.
At 1020, the method 1000 can generate a mask corresponding to the first model. For example, the method 1000 can generate a plurality of random masks each corresponding to one or more of the distinct portions of the first model. For example, a random mask can be generated based on a random or pseudorandom number generation process. The method 1000 can then continue to 1022. At 1022, the method 1000 can partition, based on the mask, the shares. For example, the method 1000 can partition, based on the plurality of random masks, the plurality of local mask shares. The method 1000 can then continue to 1024. At 1024, the method 1000 can partition the first model into a plurality of shares each including a portion of the first model. For example, the method 1000 can partition the first model into a plurality of local mask shares each including a distinct portion of the first model. For example, a distinct portion can include a portion that contains content unique to the portion and not present in another portion. The method 1000 can then continue to 1030.
At 1030, the method 1000 can encode shares into a first plurality of encoded shares. For example, the method 1000 can encode one or more of the plurality of local mask shares into a corresponding first plurality of encoded shares. For example, each local mask share can be encoded into a particular and distinct encoded share. The method 1000 can then continue to 1040.
At 1040, the method 1000 can transmit to a second device the first plurality of encoded shares. For example, the method 1000 can transmit, to the second device and based on a device index of the second device, the first plurality of encoded shares. The method 1000 can then continue to 1050.
At 1050, the method 1000 can receive from the second device the second plurality of encoded shares. For example, the method 1000 can receive, from the second device and based on a device index of the first device, the second plurality of encoded shares. The method 1000 can then continue to 1060.
At 1060, the method 1000 can generate an aggregation of encoded shares including a first encoded share and a second encoded share. For example, the method 1000 can generate an aggregation of encoded shares including a first encoded share having a first index among the first plurality of encoded shares and a second encoded share having the first index among a second plurality of encoded shares, the second encoded share including a distinct portion of a second model generated by a second device via machine learning. The method 1000 can then continue to 1070.
At 1070, the method 1000 can transmit the aggregation of encoded masks. For example, the method 1000 can transmit, to a server operatively coupled with the first device, the aggregation of encoded masks. The method 1000 can then continue to 1072. At 1072, the method 1000 can transmit the first plurality of encoded shares. For example, the method 1000 can transmit, to the server, the first plurality of encoded shares.
At 1110, the method 1100 can transmit to a first device a first instruction to generate a first model. For example, the method 1100 can transmit, to a first device, a first instruction to generate via machine learning a first model based on a model parameter and data restricted to the first device. A model parameter can include, for example, a constraint on execution of training of a model, a constraint on or identification of data input to or output by the model, or any combination thereof. The method 1100 can then continue to 1112. At 1112, the method 1100 can transmit to a second device a second instruction to generate a second model. For example, the method 1100 can transmit, to the second device, a second instruction to generate via machine learning a second model based on the model parameter and data restricted to the second device. The method 1100 can then continue to 1120.
At 1120, the method 1100 can receive, from the first device encoded shares of the first model. For example, the method 1100 can receive, from the first device, a plurality of encoded shares each including a distinct portion of the first model generated by the first device. The method 1100 can then continue to 1122. At 1122, the method 1100 can receive, from the first device, an aggregation of encoded shares including a first encoded share and a second encoded share. For example, the method 1100 can receive, from the first device, an aggregation of encoded shares including a first encoded share having a first index among the plurality of encoded shares and a second encoded share having the first index among the second plurality of encoded shares, the second encoded share including a distinct portion of the second model generated by the second device via machine learning. For example, an index can correspond to a relative position or an absolute position in a sequence, vector or the like. The method 1100 can then continue to 1130.
At 1130, the method 1100 can determine that the second device satisfies a dropout condition by a determination of an absence of transmission, from the second device. For example, the method 1100 can determine that the second device satisfies the dropout condition by a determination of an absence of transmission, from the second device, of the second plurality of encoded shares each including a distinct portion of the second model generated by the second device. A dropout condition can include, for example, a timeout condition indicating that at least a portion of a data object or model has not been received with a predetermined time period or by a predetermined timestamp or the like. The method 1100 can then continue to 1140.
At 1140, the method 1100 can generate an aggregate model corresponding to a machine learning model with the first model and the second model. For example, the method 1100 can generate, in response to a determination that the second device satisfies a dropout condition and based on the first plurality of encoded shares and the first aggregation of encoded shares, an aggregate model corresponding to a machine learning model comprising the first model and the second model.
Although the disclosure may reference one or more “users,” such “users” may refer to user-associated devices or stations (STAs), for example, consistent with the terms “user” and “multi-user” typically used in the context of a multi-user multiple-input and multiple-output (MU-MIMO) environment.
Although examples of communications systems described above may include devices and APs operating according to an 802.11 standard, it should be understood that embodiments of the systems and methods described can operate according to other standards and use wireless communications devices other than devices configured as devices and APs. For example, multiple-unit communication interfaces associated with cellular networks, satellite communications, vehicle communication networks, and other non-802.11 wireless networks can utilize the systems and methods described herein to achieve improved overall capacity and/or link quality without departing from the scope of the systems and methods described herein.
It should be noted that certain passages of this disclosure may reference terms such as “first” and “second” in connection with devices, mode of operation, transmit chains, antennas, etc., for purposes of identifying or differentiating one from another or from others. These terms are not intended to merely relate entities (e.g., a first device and a second device) temporally or according to a sequence, although in some cases, these entities may include such a relationship. Nor do these terms limit the number of possible entities (e.g., devices) that may operate within a system or environment.
It should be understood that the systems described above may provide multiple ones of any or each of those components and these components may be provided on either a standalone machine or, in some embodiments, on multiple machines in a distributed system. In addition, the systems and methods described above may be provided as one or more computer-readable programs or executable instructions embodied on or in one or more articles of manufacture. The article of manufacture may be a floppy disk, a hard disk, a CD-ROM, a flash memory card, a PROM, a RAM, a ROM, or a magnetic tape. In general, the computer-readable programs may be implemented in any programming language, such as LISP, PERL, C, C++, C#, PROLOG, or in any byte code language such as JAVA. The software programs or executable instructions may be stored on or in one or more articles of manufacture as object code.
While the foregoing written description of the methods and systems enables one of ordinary skill to make and use what is considered presently to be the best mode thereof, those of ordinary skill will understand and appreciate the existence of variations, combinations, and equivalents of the specific embodiment, method, and examples herein. The present methods and systems should therefore not be limited by the above described embodiments, methods, and examples, but by all embodiments and methods within the scope and spirit of the disclosure.
This application is a National Stage Entry under 35 U.S.C. § 371 of International Application No. PCT/US2022/040805, filed Aug. 18, 2022, which claims the benefit of priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application Ser. No. 63/235,015, entitled “SYSTEMS AND METHODS FOR IMPROVED SECURE AGGREGATION IN FEDERATED LEARNING,” filed Aug. 19, 2021, the contents of such applications being hereby incorporated by reference in their entireties and for all purposes as if completely and fully set forth herein.
This invention was made with government support under Grant Number CCF-1763673, awarded by the National Science Foundation (NSF). The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/040805 | 8/18/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63235015 | Aug 2021 | US |