This non-provisional application claims priority under 35 U.S.C. § 119(a) on Patent Application No(s). 202210677971.4 filed in China on Jun. 15, 2022, the entire contents of which are hereby incorporated by reference.
The present disclosure relates to federated learning, and more particularly to a federated learning system and method using data digest.
Federated Learning (FL) addresses many privacy and data sharing issues through cross-device and distributed learning via central orchestration. Existing FL methods mostly assume a collaborative setting among clients can tolerate temporary client disconnection from the moderator.
In practice, however, extended client absence or departure can happen due to business competitions or other non-technical reasons. The performance degradation can be severe when the data are unbalanced, skewed, or non-independent-and-identically-distributed (non-IID) across clients.
Another issue arises when the moderator needs to evaluate and release the model to the consumers. As private client data are not accessible by the moderator, the representative data would be lost when clients cease to collaborate, resulting in largely biased FL gradient update and long-term training degradation. The naive approach of memorizing gradients during training is not a suitable solution, as gradients become unrepresentative very quickly as iteration progresses.
Overall, current federated learning still fails to perform well in the following three scenarios in combinations: (1) unreliable clients, (2) training after removing clients, and (3) training after adding clients.
Accordingly, the present disclosure provides a federated learning system and method using data digest. This is a federated learning framework that can address client absence by synthesizing representative client data at the moderator. The present disclosure addresses the privacy issues introduced in the digest and proposes a feature-mixing solution to reduce the privacy concerns.
According to an embodiment of the present disclosure, a federated learning method using data digest comprises: sending a general model to each of a plurality of client devices by a moderator; executing a digest producer by each of the plurality of client devices to generate a plurality of encoded features according to a plurality of raw data; performing a training procedure by each of the plurality of client devices, wherein the training procedure comprises: updating the general model to generate a client model according to the plurality of raw data, the plurality of encoded features, a plurality of labels corresponding to the plurality of encoded features, and a present client loss function; selecting at least two of the plurality of encoded features to compute a feature weighted sum, selecting at least two of the plurality of labels to compute a label weighted sum, and sending the feature weighted sum and the label weighted sum to the moderator as a digest when receiving a digest request; and sending an update parameter of the client model to the moderator; determining an absent client and a present client among the plurality of client devices by the moderator; generating a replacement model according to the general model, the digest of the absent client and an absent client loss function by the moderator; performing an aggregation to generate an aggregation model according to the update parameter of the client model of the present client and an update parameter of the replacement model of the absent client by the moderator; and training the aggregation model to update the general model according to a moderator loss function by the moderator.
According to an embodiment of the present disclosure, a federated learning system using data digest comprises a plurality of client devices and a moderator. Each of the plurality of client devices comprises: a first processor configured to execute a digest producer to generate a plurality of encoded features according to a plurality of raw data, further configured to update a general model to generate a client model according to the plurality of raw data, the plurality of encoded features, a plurality of labels corresponding to the plurality of encoded features, and a present client loss function, and further configured to select at least two of the plurality of encoded features to compute a feature weighted sum and select at least two of the plurality of labels to compute a label weighted sum when receives a digest request; and a first communication circuit electrically connected to the first processor and configured to send the feature weighted sum and the label weighted sum as a digest and send an update parameter of the client model. The moderator is communicably connected to each of the plurality of client devices, and comprises: a second communication circuit configured to send the general model to each of the plurality of client devices; and a second processor electrically connected to the second communication circuit, wherein the second processor is configured to determine an absent client and a present client among the plurality of client devices, generate a replacement model according to the digest of the absent client and an absent client loss function, perform an aggregation to generate an aggregation model according to the update parameter of the client model of the present client and an update parameter of the replacement model of the absent client, and train the aggregation model to update the general model according to a moderator loss function.
The present disclosure will become more fully understood from the detailed description given hereinbelow and the accompanying drawings which are given by way of illustration only and thus are not limitative of the present disclosure and wherein:
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. According to the description, claims and the drawings disclosed in the specification, one skilled in the art may easily understand the concepts and features of the present invention. The following embodiments further illustrate various aspects of the present invention, but are not meant to limit the scope of the present invention.
The detailed description of the embodiments of the present disclosure includes a plurality of technical terms, and the following are the definitions of these technical terms:
The present disclosure proposes a federated learning system using data digest (also called FedDig framework) and a federated learning method using data digest.
The hardware architecture of each of the client devices Ci, Cj is basically the same. The client device Ci in
The client device Ci is configured to collect raw data. The raw data include a private part and a non-private part other than the private part. For example, the raw data is an integrated circuit diagram, and the private part is a key circuit design in the integrated circuit diagram. For example, the raw data is a product design layout, and the private portion is the product logo. For example, the raw data is the text, and the private portion is the personal information such as name, phone, and address.
The first processor i1 is configured to execute a digest producer , and thus generating a plurality of encoded features according to the plurality of raw data. In the embodiment shown in
In an embodiment, the federated learning system adopts an appropriate neural network model as the digest producer according to the type of raw data. For example, EfficientNetV2 may be adopted as the digest producer when the raw data is CIFAR-10 (CanadianInstitute for Advanced Research), and VGG16 may be adopted as the digest producer when the raw data is EMINST (Extend Modified National Institute of Standards and Technology).
In an embodiment, the raw data is directly inputted to the digest producer to generate the encoded features. In another embodiment, the first processor i1 preprocesses the private portion of the raw data before the raw data is inputted to the digest producer . For example, when the raw data is an image, the preprocessing is to crop out the private portion from the image; when the private data is a text, the preprocessing is to remove the specified field or to mask the specific string. The digest producer converts one piece of raw data into one encoded feature. In general, the dimention of raw data is greater than the dimension of encoded features.
If the number of samples of the raw data is K, after the digest producer generates K encoded features according to the K pieces of raw data, the first processor it updates the general model from the moderator Mo to generate the client model according to the K pieces of raw data, K encoding features, K labels corresponding to the K encoding features, and a present client loss function.
=ce((Ri,dRi),yi) (Equation 1),
where is the present client loss function, ce is cross entropy, is the client model of the client device Ci, Ri is the raw data, dRi is the encoded features, (Ri, dRi)={tilde over (y)}i represents the predicted result, and yi is the actual result (also called label). The condition for the general model to complete training is that the output of the present client loss function is smaller than a certain threshold. The general model trained at the client device Ci is called the client model and is sent to the moderator device Mo.
In addition, when the first communication circuit i2 receives a digest request from the moderator Mo, the first processor i1 is further configured to select at least two of the encoded features dRi to compute a feature weighted sum, and select at least two of the labels yi to compute a label weighted sum.
In an embodiment, the feature weighted sum is shown in the following Equation 2, and the label weighted sum is shown in the following Equation 3:
D
R=Σk=1SpDwkdk (Equation 2),
D
y=Σk=1SpDwkyk (Equation 3),
where DR is the feature weighted sum, Dy is the label weighted sum, w k is the weight, dk is the encoded features, yk is the label, SpD represents the number of samples included in each digest (Samples per Digest). In other words, one digest D is a pair of the feature weighted sum DR and the label weighted sum Dy. In an embodiment, the weight wk is set to an average value. For example, if SpD=4, then w1=0.25, w2=0.25, w3=0.25, w4=0.25. However, the present disclosure does not limit the setting of weights wk.
The first processor it performs a multiplication on the 6 encoded features d1-d6 and 6 default weights w1-w6 respectively, then performs an addition on the 3 multiplication results corresponding to d1-d3 to generate the feature weighted sum DR1, and performs an addition on the 3 multiplication results corresponding to d4-d6 to generate the feature weighted sum DR2. The present disclosure does not limit how the first processor it selects a plurality of multiplication results that meet the SpD value to perform the addition. For example, in the example of
In an embodiment, one of the following devices may be employed as the first processor i1: Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Field Programmable Gate Array (FPGA), system-on-a-chip (SoC), and deep learning accelerator.
The first communication circuit i2 is configured to send the feature weighted sum DR and the label weighted sum Dy as the digest D to the moderator Mo, and send an update parameter of the client model to the moderator Mo. The first communication circuit i2 is further configured to receive the general model and the updated general model from the moderator Mo. In an embodiment, the first communication circuit i2 performs the aforementioned transmission and reception tasks through a wired network or a wireless network.
The first storage circuit i3 is configured to store the raw data Ri, the digest D, the general model , and the client model . In an embodiment, one of the following devices may be employed as the first storage circuit i3: Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Double Data Rate Synchronuous Dynamic Random Access Memory (DDR SDRAM), flash memory, and hard disk.
The moderator Mo is communicably connected to each of the client devices Ci, Cj. The moderator Mo includes a second processor M1, a second communication circuit M2, and a second storage circuit M3. The second processor M1 is electrically connected to the second communication circuit M2, and the second storage circuit M3 is electrically connected to the second processor M1 and the second communication circuit M2. The hardware implementation of the moderator Mo and its internal components M1, M2, M3 may refer to the client devices Ci and its internal component i1, i2, i3, and thus the detail is not repeated here. The second processor M1 is configured to determine one or more absent client devices and one or more present devices among the plurality of client devices Ci, Cj. In an embodiment, the second processor M1 checks the communication connection between the second communication circuit M2 and each of the client devices Ci, Cj and thereby determining whether one or more of the client devices Ci, Cj is (are) disconnected. The client device Ci keeping the connection is called the present client, while the client device Cj breaking the connection is called the absent client.
The second processor M1 is configured to execute a guidance producer , and thereby generating a piece of guidance G according to the digest D of the absent client. In the initial training stage of federated learning, each client device Ci, Cj converts the raw data R into the digest D and send the digest D to the moderator Mo. Therefore, the guidance G recovered from the digest D is equivalent to the representative part of the raw data R, and the guidance G does not include the privacy portion of the raw data R. When the moderator Mo updates the general model , the guidance producer is trained together with the general model , and the detail is described later. In the embodiment shown in
In the initial training stage of federated learning, the second processor is further configured to initialize the general model , and send the general model to each of the client devices Ci, Cj through the second communication circuit M2. During the training progress of federal learning, if the second processor M1 determines absent client (such as Cj), the second processor M1 generates a replacement model according to the general model , the digest DRj of the absent client Cj, and an absent client loss function.
=ce(((DRj),DRj),Dyj) (Equation 4),
where is the absent loss function, ce is the cross entropy, is the replacement model (assumed that the absent client is the client device Cj), is the guidance producer, DRj is the digest corresponding to the absent client Cj, (DRj)=Gj represents the guidance, ((DRj), DRj)= represents the predicted result of the replacement model , Dyj is the actual result. The condition for the replacement model to complete training is that the output of the absent client loss function is smaller than a certain threshold. The general model completing the training is called the replacement model .
Overall, if the client device is not an absent client, the client device may train the client model based on the general model and the raw data. In contrast, if the client device becomes an absent client, the moderator may train the general model on behalf of the absent client based on the digest representing the raw data to generate a replacement model. From
The second processor M1 is further configured to perform an aggregation to generate an aggregation model according to the general model , the update parameter of the client model of the present client Ci and the update parameter of the replacement model of the absent client Cj. In an embodiment, the update parameter of the model may be, for example, gradient or weight. In an embodiment, the aggregation is shown in the following Equation 5:
t=t+Σiwti∇ti+Σjwtj∇tj (Equation 5),
where t is the aggregation model, t is the general model (t represents the t-th iteration), wti is the weight corresponding to the present client Ci, ∇ is the update parameter of the client model of the present client Ci, wtj is the weight corresponding to the absent client Cj, ∇ is the update parameter of the replacement model of the absent client Cj.
In an embodiment, the weight wti corresponding to the present Ci and the weight wtj corresponding to the absent client Cj satisfy the following Equation 6:
Σiwti+Σjwtj=1 (Equation 6).
In other embodiments, the aggregation may be FedAvg, FedProx, or FedNova, and the present disclosure does not limit thereof.
The second processor M1 is further configured to train the aggregation model t to update the general model according to the moderator loss function. In an embodiment, the moderator loss function is shown in the following Equation 7:
server=ce(t((DR),DR),Dy) (Equation 7),
where server is the moderator loss function, ce is the cross entropy, t is the aggregation model, is the guidance producer, DR is the feature weighted sum of all client devices, Dy is the label weighted sum of all client devices. The condition for the aggregation model t to complete training is that the output of the moderator loss function server is smaller than a certain threshold. In addition, in the training process that the output of the moderator loss function server reduces, the training of the guidance producer is also achieved at the same time.
The second communication circuit M2 is configured to send the general model t, the digest producer to each of the client devices Ci, Cj. In other words, the moderator Mo and each of the client devices Ci, Cj have identical digest producer . In addition, in the initial training stage of federated learning, the second processor M1 controls the second communication circuit M2 to send the digest request to each of the client devices Ci, Cj, and then to receive the digest D returned from each of the client devices Ci, Cj.
The second storage circuit M is configured to store digests D of all client devices Ci, Cj, and further store the digest producer , the guidance G, the general model t, and the replacement model .
Before the timing corresponding to
The moderator Mo receives the digests DRi, DRj from the client devices Ci, Cj and stores thereof. The moderator Mo receives the update parameters of the client models , from the client devices Ci, Cj, performs the aggregation according to the update parameters of the client models , , and thereby updating the general model . Finally, the trained general model may be deployed on the device of the consumer U.
At the timing corresponding to
In this way, regardless of whether the client device Ci exists or not, the training of the federated learning system using data digest proposed by the present disclosure will not be interrupted.
The training of federated learning includes a plurality of iterations, and steps S3-S7 in
In an embodiment, step S1 is performed in the first iteration of federated learning. In step S1, the moderator initializes a general model, and sends the general model to each client device. In addition, the moderator sends the digest producer to each client device to ensure that all client devices have the identical digest producer. A fixed digest producer allows the digest generated by the client device to remain fixed in each iteration.
In step S2, each client device inputs the plurality of raw data into the digest producer to generate the plurality of encoded features, and selects some of the plurality of encoded features to mix according to the specified number, and thereby generating the digest to send to the moderator. In an embodiment, step S2 is performed in the first iteration of the federated learning. In other embodiment, step S2 is performed as long as the client device receives the digest request from the moderator.
In step S3, the details of the training procedure may refer to
In step S32, the client device determines whether a digest request has been received. Step S33 is performed if the determination is “yes”. Step S35 is performed if the determination is “no”. In step S33, the client device selects at least two encoded features from the plurality of encoded features to compute a feature weighted sum and selects at least two labels from the plurality of labels to compute a label weighted sum. In step S34, the client device sends the feature weighted sum and the label weighted sum as the digest to the moderator. In step S35, the client device sends the update parameter of the client device to the moderator.
In step S4, the moderator detects the connection between itself and each client device, thereby classifying the client device that keeps the connection as a present client, and the client device that breaks the connection as an absent client.
In step S5, the details of generating the replacement model may refer to
In step S6, the details of generating the aggregation model may refer to
In step S7, the details of updating the general model may refer to
The following algorithm is the pseudo code of the federated learning method using data digest according to an embodiment of the present disclosure:
where is the general model, is the guidance producer, t is the number of iterations, t is the general model at the t-th iteration, dRi is the encoded feature, Ri is the raw data, n is the number of client devices, is the client model of present client Ci, yi is the actual result (also called label), is the present client loss function, Di is the digest of the present client Ci, SpD represents the number of encoded features per digest mix, is the replacement model of the absent client Cj, Dj is the digest of the absent client, is the absent client loss function, t is the aggregation model, ∇ is the update parameter of the client model of the present client Ci, ∇ is the update parameter of the replacement model of the absent client Cj, is the updated general model, server is the moderator loss function.
Please refer to
In view of the above, the present disclosure provides a federated learning method using data digest. This is a federated learning framework that can address client absence by synthesizing representative client data at the moderator. The present disclosure proposes a data memorizing mechanism to handle the client's absence effectively. Specifically, the present disclosure handles the following three scenarios: (1) unreliable clients, (2) training after removing clients, and (3) training after adding clients.
The present disclosure deals with potential client absence during FL training is to encode and aggregate information of the raw data and corresponding labels as data digests. When clients leave, the moderator may recover information from these digests to generate training guidance that can mitigate the catastrophic forgetting caused by the absent data. Since digests may be shared and stored at the moderator for training use, information that can lead to data privacy infringement should not be recoverable from the digests. To increase privacy protection of the proposed data digest, the present disclosure introduces the sample disturbance by mixing features extracted from the raw data. Furthermore, the present disclosure introduces a trainable guidance producer into the ordinary FL training process, such that the moderator may learn to extract information and generate training guidance from the digests automatically. The digest and guidance proposed by the present disclosure are adaptable to most FL systems.
In the training process of FL, the following four training scenarios are common: (1) a client temporarily leaves during the FL training, (2) a client leaves the training forever, (3) all clients leave the FL training sequentially, and (4) multiple client groups join the FL training in different time slots.
In FedDig, the moderator must transmit the digest producer to participating clients for training use. If a malicious attacker can monitor the transmission and hack to obtain its pseudo-inverse , then in this case, data in the raw sample domain can be recovered using . To test the ability of the FedDig against an attack on recovering samples from the digests and to investigate to what extent the feature-mixing digests proposed by the present disclosure can protect data privacy under malicious attacks, the present disclosure simulates this attack by training an autoencoder with the network structures of and . Specifically, the trained encoder is served as and the trained decoder is served as the pseudo-inverse , and then the present disclosure trains the guidance producer and the general model as the regular training process of FedDig with , and visualize the guidance produced by and as shown in
The present disclosure also uses the CIFAR-10 data to repeat the above experiment. The experimental result shows that it cannot obtain a suitable from training that is capable to recover raw data from the digest even if SpD is set to one. This result also suggests that training a pseudo-inverse function of a complex digest producer is not straightforward. Overall, recovering raw samples from the digest is difficult due to the permanent information loss during feature mixing.
Number | Date | Country | Kind |
---|---|---|---|
202210677971.4 | Jun 2022 | CN | national |