The present invention relates to methods and systems for secure federated learning in a machine learning model.
Machine learning (ML) has gradually made its way into many day-to-day applications. Due to the widespread use of machine-learning technologies, deployed machine learning models are often extensively trained. Training allows machine learning systems to provide accurate results on a diverse set of inputs. In particular, a large and variegated training dataset is needed to obtain an accurate and versatile model for the machine learning system.
Machine learning algorithms typically rely on training data being directly and centrally accessible. Therefore, the entity which effectively operates the model training has access to every training sample. Therefore, most machine learning as a service application which benefit from user's own data (to enrich the training dataset) explicitly demands users to reveal/expose their data. This produces an inherent tension between functionality and privacy. That is, to benefit from the advantages of machine learning technologies, one must potentially compromise their privacy by sharing private data.
Systems that attempt to federate the machine learning process by letting the training process be performed in a distributed manner, without the need of centralizing private data suffer from numerous potential security and privacy issues.
In an embodiment, a method for performing federated learning is provided. The method includes initializing, by a server, a global model G0. The server shares G0 with a plurality of participants (N) using a secure communications channel. The server selects n out of N participants, according to filtering criteria, to contribute training for a round r. The server partitions the selected participants n into s groups and informs each participant about the other participants belonging to the same group. The server obtains aggregated group updates AU1, . . . , AUg from each group and compares the aggregated group updates and identifies suspicious aggregated group updates. The server combines the aggregated group updates by excluding the updates identified as suspicious, to obtain an aggregated update Ufinal. The server derives a new global model Gr from the previous model Gr-1 and the aggregated update Ufinal and shares Gr with the plurality of participants.
The present invention will be described in even greater detail below based on the exemplary figures. The invention is not limited to the exemplary embodiments. All features described and/or illustrated herein can be used alone or combined in different combinations in embodiments of the invention. The features and advantages of various embodiments of the present invention will become apparent by reading the following detailed description with reference to the attached drawings which illustrate the following:
A problem unique to computer systems, and machine learning systems in particular, and solved by embodiments of the present invention is to provide security and privacy to participants in a machine learning system. Embodiments provide a federated learning (FL) system. The federated learning system is a collaborative machine learning method which alleviates privacy and security concerns by performing the machine learning training process in a distributed manner, without the need of centralizing private data. In some embodiments federated learning may run on tens of millions of clients that are coordinated by a central entity. The federated learning system trains a model through a distributed protocol, run among the clients and the central entity. The system lets each client submit their contribution in a privacy-preserving manner, thus doing away with the need to collect data centrally. In this way, security and privacy are improved in the computer, machine learning system.
In an embodiment, the invention provides a method that improves security and privacy in a federated learning system. More specifically, a federated learning protocol generates a global model (G) throughout several rounds of training. In a round (r), the central entity randomly selects a subset of clients to contribute that round. Each of these clients is expected to derive a local model (L) by training the current global model G{circumflex over ( )}(r−1) with their own training data. Hence, the selected clients send updates to the central server for improving the global model based on their local data. These updates are then aggregated to derive a new global model (G{circumflex over ( )}r) for the next round. In an example implementation, an entity can coordinate the training of millions of mobile phones, and then aggregating their locally trained models in a centralized fashion.
The federated learning system preserves the data privacy of users contributing the training process. In addition to limiting or preventing leakage of private data, the system may account for the presence of possibly malicious clients, so-called “sybils”, which may try to subvert the trained model. In fact, the distributed nature of federated learning exposes the system to sybils which may deviate from the protocol to their benefit, and which are difficult to detect due to the lack of accountability in the system.
It is a challenge to secure a distributed system like a federated learning system due to several competing goals. Since the system aims to be open (to benefit as much as possible from variegated data) while doing it at scale, the participants are generally not vetted. Being privacy friendly also precludes auditing of participants and building a history of their updates. These requirements make it difficult to validate the computations of the participants and thus limit the damage done by rogue nodes.
Embodiments of the invention provide a novel security mechanism to protect federated learning against malicious participants (sybils) who attempt to subvert the global model by submitting specially crafted updates. Embodiments provide protection to the server and honest participants against sybils and can be combined with existing security systems.
In an embodiment, the present invention provides a method for performing federated learning, comprising: initializing, by a server, a global model G{circumflex over ( )}0; sharing, by the server, G{circumflex over ( )}0 with a plurality of participants (N) using a secure communications channel; selecting, by the server, n out of N participants, according to filtering criteria, to contribute training for a round r; partitioning, by the server, the selected participants n into s groups; informing, by the server, each participant about the other participants belonging to the same group; obtaining, by the server, aggregated group updates AU{circumflex over ( )}1, . . . , AU{circumflex over ( )}g from each group; comparing, by the server, the aggregated group updates and identifying suspicious aggregated group updates; combining, by the server, the aggregated group updates by excluding the updates identified as suspicious, to obtain an aggregated update U{circumflex over ( )}final; deriving, by the server, a new global model G{circumflex over ( )}r from the previous model G{circumflex over ( )}(r−1) and the aggregated update U{circumflex over ( )}final; and sharing, by the server, G{circumflex over ( )}r with the plurality of participants.
In the same or other embodiment, the aggregated group updates are obtained from a locally trained model L_i using local training data by each participant; relevant updates U_i needed to obtain a local model L_i from the previous global model G{circumflex over ( )}(r−1) are generated by each participant; and all participants belonging to the same group kϵ{1, . . . , g}, execute a secure aggregation protocol to combine all of their updates U_1{circumflex over ( )}k, . . . , U_s{circumflex over ( )}k, to obtain an aggregated group update AU{circumflex over ( )}k.
In the same or other embodiment, further comprising: locally training, by each participant, a model L_i using local training data; generating, by each participant, relevant updates U_i needed to obtain a local model L_i from the previous global model G{circumflex over ( )}(r−1); and running, by all participants belonging to the same group kϵ{1, . . . , g}, a secure aggregation protocol to combine all of their updates U_1{circumflex over ( )}k, . . . , U_s{circumflex over ( )}k, to obtain an aggregated group update AU{circumflex over ( )}k.
In the same or other embodiment, identifying suspicious aggregated group updates is performed using a machine learning algorithm.
In the same or other embodiment, the plurality of participants is greater than 10,000.
In the same or other embodiment, the secure communications channel is a pairwise secure channel.
In the same or other embodiment, all communication between the server and a participant is authenticated and confidential.
In the same or other embodiment, the s groups contain differing numbers of participants.
In the same or other embodiment, identifying suspicious aggregated group updates is performed deterministically.
In the same or other embodiment, further comprising aggregating the received updates by averaging all contributions.
In an embodiment, the present invention provides a server comprising one or more processors which, alone or in combination, are configured to provide for performance of the following steps: initializing a global model G{circumflex over ( )}0; sharing G{circumflex over ( )}0 with a plurality of participants (N) using a secure communications channel; selecting n out of N participants, according to filtering criteria, to contribute training for a round r; partitioning the selected participants n into s groups; informing each participant about the other participants belonging to the same group; obtaining aggregated group updates AU{circumflex over ( )}1, . . . , AU{circumflex over ( )}g from each group; comparing the aggregated group updates and identifying suspicious aggregated group updates; combining the aggregated group updates by excluding the updates identified as suspicious, to obtain an aggregated update U{circumflex over ( )}final; deriving a new global model G{circumflex over ( )}r from the previous model G{circumflex over ( )}(r−1) and the aggregated update U{circumflex over ( )}final; and sharing G{circumflex over ( )}r with the plurality of participants.
In the same or other embodiment, the aggregated group updates are obtained from a locally trained model L_i using local training data by each participant; relevant updates U_i needed to obtain a local model L_i from the previous global model G{circumflex over ( )}(r−1) are generated by each participant; and all participants belonging to the same group kϵ{1, . . . , g}, execute a secure aggregation protocol to combine all of their updates U_1{circumflex over ( )}k, . . . , U_s{circumflex over ( )}k, to obtain an aggregated group update AU{circumflex over ( )}k.
In an embodiment, the present invention provides a non-transitory computer readable medium storing instructions that when executed by a processor cause the following steps to be performed: initializing a global model G{circumflex over ( )}0; sharing G{circumflex over ( )}0 with a plurality of participants (N) using a secure communications channel; selecting n out of N participants, according to filtering criteria, to contribute training for a round r; partitioning the selected participants n into s groups; informing each participant about the other participants belonging to the same group; obtaining aggregated group updates AU{circumflex over ( )}1, . . . , AU{circumflex over ( )}g from each group; comparing the aggregated group updates and identifying suspicious aggregated group updates; combining the aggregated group updates by excluding the updates identified as suspicious, to obtain an aggregated update U{circumflex over ( )}final; deriving a new global model G{circumflex over ( )}r from the previous model G{circumflex over ( )}(r−1) and the aggregated update U{circumflex over ( )}final; and sharing G{circumflex over ( )}r with the plurality of participants.
In the same or other embodiment, the aggregated group updates are obtained from a locally trained model L_i using local training data by each participant; relevant updates U_i needed to obtain a local model L_i from the previous global model G{circumflex over ( )}(r−1) are generated by each participant; and all participants belonging to the same group kϵ{1, . . . , g}, execute a secure aggregation protocol to combine all of their updates U_1{circumflex over ( )}k, . . . , U_s{circumflex over ( )}k, to obtain an aggregated group update AU{circumflex over ( )}k.
In the same or other embodiment, the secure communications channel is a pairwise secure channel.
The federated learning system 100 includes a central server (S) 102 and a number (N) of participants P1, . . . , PN (clients). N can be any number. Examples include 1,000, 10,000, 1 million and so on. The goal of the federated learning process is to train a “global” machine learning (ML) model G for classification. G should assign to samples xϵX the correct label yϵY, using as a training set the union of private datasets Di, for i=1, . . . , N, provided by the various participants P1, . . . , PN.
To initialize the federated learning system, each of the various participants P1, . . . PN engages with the server S 102 in a (two-party) authenticated key exchange protocol, and establishes a pairwise secure channels with the server S. Therefore, all messages exchanged among the server S and any of the participants P1, . . . , PN are authenticated and confidential. Further, the server S prepares a global model G{circumflex over ( )}0 and makes the model available to all participants.
The various operations and protocol steps to be executed in a generic round of the federated learning process are explained below. Let G=Gr-1 be the “current” global model prior to entering round r, and let G′=Gr denote the updated global model upon completion of the considered round. The mechanism to update G into G′ is explained below.
Next, the system performs local training (LT). Each of the selected contributors Pi retrieves the current global model G, and proceeds with training a local model Li starting from G and using their own dataset Di. Let di=|Di| be the number of labeled samples (x, y) contained in Di. Finally, the contributor prepares the model updates Ui, which may include the size di of the used training set, to turn model G into Li. The updates {Ui}i=1, . . . , n constitute the input data of contributors for the next step.
The system next performs group specific secure aggregation (SA). The contributors of each group {P1k, . . . . Psk} run a secure aggregation protocol, possibly with the support of the server S, to combine their local updates in a privacy-preserving manner. Upon completion of this step, the server obtains the aggregated group-updates AU1, . . . , AUg resulting from combining the updates {U1k, . . . . Usk} of the group members, for each group kϵ{1, . . . , g}.
Next, the system performs final aggregation. The server combines the group updates AU′1, . . . , AU′t that survived the filtering step with the previous global model G. The system therefore obtains an updated global model G′.
In one embodiment, the global model G is a deep-learning (DL) model, and can thus be represented through a function G: X×V→Y, y:=G(x, v), where v is a high-dimensional vector which parametrizes the function.
The local training, performed by each selected participant Pi given a description of the current model <G>=v, consists in finding a new parameter-vector v′ which minimizes the loss function associated to the DL model. In this case, the model updates Ui derived by Pi can be described as a vector ui=v′i−vi, namely the differences between “current” and “previous” parameter-vector. In every round, vectors v′i are meant to be an improvement of the previous model parameter v.
Updates are aggregated in a privacy-preserving manner. The secure aggregation protocol may be used within each group. Let AUk=uk, for k=1, . . . , g, denote the updated parameter-vector obtained by aggregating all contributions of the participants belonging to the k-th group.
Upon receiving aggregated updates {u1, . . . , ug} from each group, the server identifies possibly malicious updates by comparing the vectors uk directly, or by training a (local) classifier C to discern “regular” updates from “suspicious” ones.
Finally, the server may aggregate the received updates by averaging all contributions, making sure to assign a low weight (possibly zero) to the vectors u* that were identified as suspicious, and adding the resulting aggregated update ufinal to the previous parameter-vector v to obtain the new parameter v′=v+ufinal.
Secure federated Learning has a number of applications in security critical scenarios. For example, federated learning may be used to collaboratively train a shared prediction model by mobile devices without sharing their private data. The model makes predictions to complete sentences typed by users. In this scenario a malicious device may try to poison the model such that a chosen business or restaurant is suggested when someone searches for food in a particular street. For instance, “best burger in 5th Avenue . . . ” can be completed by inputting a chosen business, rather than the business with the best burger. In such a scenario it is important to make the model robust for people to trust it.
In another example, an application may be classifying apps as benign or malicious. A back doored model can let carefully designed apps through despite being malicious. Hence, robustness plays a critical role here as well. Additionally, machine learning based facial recognition is widely used these days in consumer devices for authentication. Poisoning such models can allow the adversary to take over devices.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. It will be understood that changes and modifications may be made by those of ordinary skill within the scope of the following claims. In particular, the present invention covers further embodiments with any combination of features from different embodiments described above and below. Additionally, statements made herein characterizing the invention refer to an embodiment of the invention and not necessarily all embodiments.
The terms used in the claims should be construed to have the broadest reasonable interpretation consistent with the foregoing description. For example, the use of the article “a” or “the” in introducing an element should not be interpreted as being exclusive of a plurality of elements. Likewise, the recitation of “or” should be interpreted as being inclusive, such that the recitation of “A or B” is not exclusive of “A and B,” unless it is clear from the context or the foregoing description that only one of A and B is intended. Further, the recitation of “at least one of A, B and C” should be interpreted as one or more of a group of elements consisting of A, B and C, and should not be interpreted as requiring at least one of each of the listed elements A, B and C, regardless of whether A, B and C are related as categories or otherwise. Moreover, the recitation of “A, B and/or C” or “at least one of A, B or C” should be interpreted as including any singular entity from the listed elements, e.g., A, any subset from the listed elements, e.g., A and B, or the entire list of elements A, B and C.