This application is based upon and claims the benefit of priority from Japanese patent application No. 2022-092301, Jun. 7, 2022, the disclosure of which is incorporated herein in its entirety by reference.
The present invention relates to a server apparatus, a calculation method, and a client apparatus.
From the viewpoint of privacy protection and the like, a technique called federated learning is known in which a plurality of clients cooperate to perform machine learning without directly exchanging training data.
A literature describing federated learning is, for example, Patent Literature 1. Patent Literature 1 describes a machine learning system that includes a plurality of client terminals and an integrated server. According to Patent Literature 1, the client terminal executes machine learning of a training target local model by using data existing in the system to which the client terminal belongs as training data in accordance with an instruction of received distribution information. The client terminal then transmits the result of learning of the local model to the integrated server. The integrated server transmits distribution information to the respective client terminals, receives the learning results from the respective client terminals, and integrates the received learning results to update a master model.
Further, although an averaged model is obtained in general federated learning, a technique called personalized federated learning is known as a federated learning approach of obtaining not an averaged model but a model optimized for each client.
A literature describing personalized federated learning is, for example, Non-Patent Literature 1. For example, Non-Patent Literature 1 describes a method in which a client receives local models from other clients, gives a large weight to a local model that fits data of the client to add to a local model of the client, and thereby updates the local model.
Patent Literature 1: WO2021/193815
Non-Patent Literature 1: Zhang, Michael, et al. “Personalized Federated Learning with First Order Model Optimization.”, ICLR2021
In the technique described by Non-Patent Literature 1, a client needs to obtain local models of other clients, and the local models are shared among all the clients. Therefore, the risk of information leakage is high. Thus, there is a problem that it is difficult to obtain a model appropriate for each client while suppressing the risk of information leakage.
Accordingly, an object of the present invention is to provide a server apparatus, a calculation method and a client apparatus that solve the abovementioned problem.
In order to achieve the object, a server apparatus as an aspect of the present disclosure includes: at least one memory configured to store instructions; and at least one processor configured to execute the instructions. The processor is configured to: receive, from each of a plurality of client apparatuses performing federated learning of a neural network model having multiplex branches capable of performing different operations on a common input, a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches; calculate a parameter of a global model based on the local model parameter and the weight received by the receiving unit; and transmit the parameter calculated by the calculating unit to the client apparatuses.
Further, a calculation method as another aspect of the present disclosure is a calculation method by an information processing apparatus, and includes: receiving, from each of a plurality of client apparatuses performing federated learning of a neural network model having multiplex branches capable of performing different operations on a common input, a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches; calculating a parameter of a global model based on the received local model parameter and weight; and transmitting the calculated parameter to the client apparatuses.
Further, a client apparatus as another aspect of the present disclosure includes: at least one memory configured to store instructions; and at least one processor configured to execute the instructions. The processor is configured to: learn, using training data owned by the client apparatus, a local model parameter of each of multiplex branches included by a neural network model having the multiplex branches capable of performing different operations on a common input and a weight for each branch used in superposing outputs from the respective multiplex branches; and transmit the local model parameter and the weight learned by the learning unit to a server apparatus that generates a global model based on the local model parameter.
With the respective configurations as described above, the abovementioned problem can be solved.
A first example embodiment of the present disclosure will be described with reference to
In the first example embodiment of the present disclosure, a learning system 100 that performs federated learning in which a plurality of client apparatuses 200 and a server apparatus 300 learn in cooperation will be described. As illustrated in
Further, in the case of this example embodiment, in the machine learning model learned by the learning system 100, a linear transformation can be multiplexed. For example, as illustrated in
Further, in the case of this example embodiment, in the machine learning model learned by the learning system 100, outputs from the respective multiplex branches are superposed using weights for the respective branches. For example, in the machine learning model, by calculating a weighted sum that is adding the results of multiplying the outputs from the branches by weights corresponding to the branches, the outputs from the respective branches are superposed using the weights for the respective branches. For example, in the case illustrated in
The respective branches perform a common operation using parameters common in each client apparatus 200. In other words, the parameters of the respective branches are learned by federated learning by the client apparatuses 200 and the server apparatus 300. On the other hand, a weight for each branch used in superposing the outputs from the respective multiplex branches, a normalization parameter used in a normalization layer, and the like, are learned by each client apparatus 200. Therefore, the weight, the normalization parameter, and the like, may vary for each client apparatus 200. By learning the weight and the like for each client apparatus 200, each client apparatus 200 can learn a local model appropriate for its own data.
The client apparatus 200 is an information processing apparatus that updates a parameter and the like received from the server apparatus 300 by using training data of the client apparatus 200.
The operation input unit 210 is formed of an operation input device such as a keyboard and a mouse. The operation input unit 210 detects an operation by an operator who operates the client apparatus 200 and outputs to the operation processing unit 250.
The screen display unit 220 is formed of a screen display device such as an LCD (Liquid Crystal Display). The screen display unit 220 can display on a screen a variety of information stored in the storing unit 240 in accordance with an instruction from the operation processing unit 250.
The communication I/F unit 230 is formed of a data communication circuit and the like. The communication I/F unit 230 performs data communication with an external apparatus such as the server apparatus 300 connected via a communication line.
The storing unit 240 is a storage device such as a HDD (hard disk drive), a SSD (Solid State Drive), and memory. The storing unit 240 stores therein processing information necessary for a variety of processing by the operation processing unit 250 and a program 243. The program 243 is loaded and executed by the operation processing unit 250 to realize various processing units. The program 243 is loaded in advance from an external device and a recording medium via a data input/output function such as the communication I/F unit 230 and is stored in the storing unit 240. Major information stored in the storing unit 240 includes training data information 241, local model information 242, and the like.
The training data information 241 includes training data used when the learning unit 252 to be described later performs learning. For example, the training data information 241 is acquired in advance by a method such as acquiring from an external device via the communication I/F unit 230 or inputting with the operation input unit 210 and the like, and is stored in the storing unit 240. For example, the training data included by the training data information 241 may vary for each client apparatus 200. In this example embodiment, a specific content of the training data is not particularly limited. The training data information 241 may include any training data.
The local model information 242 includes information indicating various parameters and values configuring a local model, such as a parameter used in an operation corresponding to each branch (for example, a local model parameter), a weight for each branch used in superposing the outputs from the respective branches, and a normalization parameter. For example, the local model information 242 is updated in accordance with various processes such as reception of the parameters of the respective branches from the server apparatus 300 and learning with the training data information 241 by a learning unit 252 to be described later.
The operation processing unit 250 has an arithmetic logic unit such as a CPU (Central Processing Unit) and a peripheral circuit thereof. By loading the program 243 from the storing unit 243 and executing the program 243, the operation processing unit 250 makes the abovementioned hardware and the program 243 cooperate and realizes various processing units. Major processing units realized by the operation processing unit 250 include a parameter receiving unit 251, a learning unit 252, a transmitting unit 253, and the like.
The operation processing unit 250 may have, instead of the abovementioned CPU, a GPU (Graphic Processing Unit), a DSP (Digital Signal Processor), a MPU (Micro Processing Unit), a FPU (Floating point number Processing Unit), a PPU (Physics Processing Unit), a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.
The parameter receiving unit 251 receives a parameter corresponding to each of the branches in each of the layers configuring the training target neural network, from the server apparatus 300. For example, the parameter receiving unit 251 receives weight values used in performing an operation such as a convolution operation as parameters. Moreover, the parameter receiving unit 251 stores the received parameters as the local model information into the storing unit 240.
The learning unit 252 performs machine learning using the training data included by the training data information 241 on a model having the parameter received by the parameter receiving unit 251, and thereby updates a parameter of each branch, a weight for each branch, and the like. In other words, the learning unit 252 performs machine learning using the training data included by the training data information 241 and updates the parameter, the weight and the like for each branch, and thereby generates a local model having a new local model parameter. For example, the learning unit 252 can update, for each of the layers configuring the neural network, a parameter for each branch, a weight for each branch, and the like. In other words, the parameters of the respective branches and the weights for the respective branches may vary for each layer. Other than the parameters for the respective branches, the weights for the respective branches, and the like, the learning unit 252 may learn a normalization parameter, and the like. For example, the learning unit 252 may perform the abovementioned machine learning using a known method such as stochastic gradient descent.
For example, the learning unit 252 performs the machine learning using the training data, and thereby updates the parameter received by the parameter receiving unit 251 and calculates a new local model parameter. That is to say, a target parameter for update by the learning unit 252 is a value received by the parameter receiving unit 251, and is common in each client apparatus 200. On the other hand, the learning unit 252 performs the machine learning using the training data, and thereby updates the weight calculated in previous local model parameter calculation and calculates a new weight. That is to say, a target weight for update by the learning unit 252 is a value calculated before by each client apparatus 200, and can vary for each client apparatus 200.
The learning unit 252 can separate a phase of learning a parameter of each branch and a phase of learning a weight corresponding to each branch as illustrated in
The transmitting unit 253 transmits a local model parameter, which is the parameter updated by the learning unit 252, and the weight for each branch to the server apparatus 300. In other words, the transmitting unit 253 in this example embodiment transmits not only the local model parameter but also the weight corresponding to each branch to the server apparatus 300.
The above is an example of the configuration of the client apparatus 200. Meanwhile, the configuration of the client apparatus 200 is not limited to the case illustrated above. For example, the operation processing unit 250 of the client apparatus 200 may execute the program 243 and thereby realize a converting unit that converts a plurality of branches to one branch by using the parameters of the respective branches and the weights corresponding to the respective branches. Moreover, the operation processing unit 250 of the client apparatus 200 may execute the program 243 and thereby realize an interring unit that performs inference using a local model determined in accordance with the parameter (local model parameter), the weight, and the like, indicated by the local model information 242. For example, as described above, the client apparatus 200 may have a configuration other than illustrated above.
The server apparatus 300 is an information processing apparatus that calculates the parameter of a global model by using the local model parameters and the weights received from the respective client apparatuses 200.
The operation input unit 310 is formed of an operation input device such as a keyboard and a mouse. The operation input unit 310 detects operation of an operator who operates the server apparatus 300, and outputs the operation to the operation processing unit 350.
The screen display unit 320 is formed of a screen display device such as an LCD. The screen display unit 320 can display on a screen a variety of information stored in the storing unit 340 in accordance with an instruction from the operation processing unit 350.
The communication I/F unit 330 is formed of a data communication circuit, and the like. The communication I/F unit 330 performs data communication with an external apparatus such as the client device 200 connected via a communication line.
The storing unit 340 is a storage device such as a HDD, a SSD, and memory. The storing unit 340 stores therein processing information necessary for a variety of processing by the operation processing unit 350 and a program 343. The program 343 is loaded and executed by the operation processing unit 350 and thereby realizes various processing units. The program 343 is loaded in advance from an external device and a recording medium via a data input/output function such as the communication I/F unit 330 and is stored in the storing unit 340. Major information stored in the storing unit 340 includes, for example, reception information 341 and global model information 342.
The reception information 341 includes information indicating local model parameters and weights received from the respective client apparatuses 200. For example, the reception information 341 is updated when a receiving unit 351 receives information indicating local model parameters and weights from the client apparatuses 200 via the communication I/F unit 330.
The global model information 342 includes information indicating a model parameter of a global model, calculated based on the reception information 341. For example, the global model information 342 is updated when a calculating unit 352 to be described later calculates a parameter based on the reception information 341.
In the storing unit 340, information other than illustrated above may be stored. For example, information indicating the number of training data owned by each of the client apparatuses 200 included by the learning system 100 can be stored in the storing unit 340.
The operation processing unit 350 has an arithmetic logic unit such as a CPU and a peripheral circuit thereof. The operation processing unit 350 loads the program 343 from the storing unit 340 and executes the program 343, and thereby makes the abovementioned hardware and the program 343 cooperate and realizes various processing units. Major processing units realized by the operation processing unit 350 includes, for example, the receiving unit 351, the calculating unit 352, and a parameter transmitting unit 353. As in the case of the operation processing unit 250 included by the client apparatus 200, the operation processing unit 350 may have, instead of the CPU, a GPU or the like.
The receiving unit 351 receives a local model parameter of each branch and a weight corresponding to each branch from each client apparatus 200. Moreover, the receiving unit 351 stores the received local model parameter and weight as the reception information 341 into the storing unit 340.
The calculating unit 352 uses the local model parameter and the weight received by the receiving unit 351 to calculate and update the parameter of the global model for each branch for each layer configuring the neural network. For example, the calculating unit 352 calculates the parameter of the global model by performing weighting with the weights and taking the average of the local model parameters. The calculating unit 352 may calculate the parameter of the global model by using the local model parameters, the weights, the number of the training data owned by the client apparatus 200, and the like.
For example, as shown in
where n denotes the number of data owned by the corresponding client apparatus 200, k denotes an identifier for identifying the client apparatus 200, K denotes the number of the client apparatuses 200 included by the learning system 100, Wi,j denotes the parameter of a jth branch of an ith layer, and a denotes a weight.
The parameter transmitting unit 353 transmits the parameter of the global model calculated by the calculating unit 352 to the client apparatus 200.
The above is an example of the configuration of the server apparatus 300. Subsequently, an example of operation of the learning system 100 will be described with reference to
The learning unit 252 performs machine learning using training data included by the training data information 241 on a model having the parameter received by the parameter receiving unit 251, and thereby updates a parameter of each of the branches, a weight for each of the branches, and the like. For example, the learning unit 252 updates a weight α in a state where a parameter of each of the branches is fixed (step S102). Moreover, the learning unit 252 updates the parameter of each of the branches by using the updated weight α in a state where the weight α is fixed, and thereby calculates a new local model parameter (step S103). Meanwhile, the learning unit 252 may update the weight and the parameter simultaneously.
The transmitting unit 253 transmits a local model parameter, which is the parameter updated by the learning unit 252, and the weight for each branch to the server apparatus 300 (step S104).
The above is an example of the operation of the client apparatus 200. Subsequently, with reference to
The calculating unit 352 calculates and updates the parameter of the global model for each branch, for each of the layers configuring the neural network, by using the local model parameters and the weights received by the receiving unit 351 (step S202). For example, the calculating unit 352 performs weighting with the weights and takes the average of the local model parameters, and thereby calculates the parameter of the global model. The calculating unit 352 may calculate the parameter of the global model by using the local model parameters, the weights, the number of training data owned by the client apparatuses 200, and the like.
The parameter transmitting unit 352 transmits the parameter of the global model calculated by the calculating unit 352 to the client apparatuses 200 (step S203).
The above is an example of the operation of the server apparatus 300. In the learning system 100, a series of steps as illustrated with reference to
Thus, the server apparatus 300 has the receiving unit 351, the calculating unit 352, and the parameter transmitting unit 353. With such a configuration, the calculating unit 352 can calculate and update the parameter of the global model by using the local model parameters and the weights received by the receiving unit 351. The parameter transmitting unit 353 can transmit the parameter calculated by the calculating unit 352 to the client apparatuses 200. As a result, each of the client apparatuses 200 can update the local model parameter and the weight by using the parameter received from the server apparatus 300. Consequently, without sharing the local model between the client apparatuses 200, each of the client apparatuses 200 can learn the weight and thereby train the local model appropriate for its own data. As a result, it is possible to reduce the risk of information leakage.
Furthermore, in the case of the server apparatus 300 described in this example embodiment, the calculating unit 352 can calculate and update the parameter of the global model by using the local model parameters and the weights received by the receiving unit 351. As a result, it is possible to perform weighting that contributes to update of the global model, in accordance with a weight for each branch of each client apparatus 200, and it is possible to consider a contribution ratio in each client apparatus at the time of update of the global model. Consequently, it is possible to make the learning stable and also possible to increase the accuracy as compared with the case of calculating the parameter of the global model without a weight. Also, it becomes possible to shorten a learning time.
For example,
where, as in Equation 1, n denotes the number of data owned by the corresponding client apparatus 200, k denotes an identifier for identifying the client apparatus 200, K denotes the number of the client apparatuses 200 included by the learning system 100, and Wi,j denotes the parameter of a jth branch of an ith layer.
Further, the client apparatus 200 has the learning unit 252 and the transmitting unit 253. With such a configuration, the transmitting unit 253 can transmit the local model parameter, which is the parameter updated by the learning unit 252, and a weight for each of the branches to the server apparatus 300. As a result, the server apparatus 300 can perform update of the global model using the weight.
Further, the client apparatus 200 can be configured to update the weight in a state where the parameter is fixed, and thereafter update the parameter in a state where the weight is fixed. By thus determining the optimum weight and then updating the parameter, it is possible to appropriately evaluate the degree of update of the parameter in each of the client apparatuses 200.
Next, a second example embodiment of the present disclosure will be described with reference to
In the second example embodiment of the present disclosure, an example of the configuration of the server apparatus 400 that performs learning in cooperation with the client apparatus 500 and an example of the configuration of the client apparatus 500 will be described.
The server apparatus 400 may use, instead of the abovementioned CPU, a GPU (Graphic Processing Unit), a DSP (Digital Signal Processor), a MPU (Micro Processing Unit), a FPU
(Floating point number Processing Unit), a PPU (Physics Processing Unit), a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.
Further, the server apparatus 400 can realize functions as a receiving unit 421, a calculating unit 422 and a transmitting unit 423 shown in
The receiving unit 421 receives, from a plurality of client apparatuses 500 that perform federated learning of a neural network having multiplex branches capable of performing different operations on a common input, a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches.
The calculating unit 422 calculates a parameter of a global model based on the local model parameter and the weight received by the receiving unit 421.
The transmitting unit 423 transmits the parameter calculated by the calculating unit 422 to the client apparatus 500.
Thus, the server apparatus 400 has the receiving unit 421, the calculating unit 422, and the transmitting unit 423. With such a configuration, the calculating unit 422 can calculate the parameter of the global model based on the local model parameter and the like received by the receiving unit 421. Moreover, the transmitting unit 423 can transmit the parameter calculated by the calculating unit 422 to the client apparatus 500. As a result, the client apparatus 500 can update the local model parameter and the weight using the received parameter, for example. Consequently, without sharing the local model among the client apparatuses 500, each of the client apparatuses 500 can learn a local model appropriate for its own data by learning a weight for each of the client apparatuses 500, for example. As a result, the risk of information leakage can be reduced.
Furthermore, the server apparatus 400 described in this example embodiment calculates the parameter of the global model based on the local model parameters and the weights. As a result, it is possible to perform weighting that contributes to the update of the global model in accordance with the weight for each of the branches of each of the client apparatuses 200, and it is possible to consider a contribution ratio in each of the client apparatuses when updating the global model. Consequently, it is possible to make learning more stable as compared with the case of using no weight, and it is possible to increase the accuracy.
The server apparatus 400 described above can be realized by installation of a predetermined program in an information processing apparatus such as the server apparatus 400. Specifically, a program as another aspect of the present invention is a computer program for causing an information processing apparatus such as the server apparatus 400 to realize processes to: receive, from a plurality of client apparatuses that perform federated learning of a neural network having multiplex branches capable of performing different operations on a common input, a local model parameter of each of the multiplex branches and a weight for each of the branches used in superposing outputs from the respective multiplex branches; calculate a parameter of a global model based on the received local model parameter and weight; and transmit the calculated parameter to the client apparatus.
Further, a calculation method executed by an information processing apparatus such as the server apparatus 400 described above is a method by an information processing apparatus such as the server apparatus 400 of: receiving, from a plurality of client apparatuses that perform federated learning of a neural network having multiplex branches capable of performing different operations on a common input, a local model parameter of each of the multiplex branches and a weight for each of the branches used in superposing outputs from the respective multiplex branches; calculating a parameter of a global model based on the received local model parameter and weight; and transmitting the calculated parameter to the client apparatus.
The inventions of a program, a computer-readable recording medium having a program recorded thereon and a calculation method with the abovementioned configurations also have the same actions and effects as the server apparatus 400 described above, and therefore, can achieve the abovementioned object of the present invention.
Further, the client apparatus 500 that transmits a local model parameter and the like to the server apparatus 400 can realize functions as a learning unit 521 and a transmitting unit 522 shown in
The learning unit 521 learns, using training data thereof, a local model parameter of each branch included by a neural network having multiplex branches capable of performing different operations on a common input and a weight for each branch used in superposing outputs from the respective multiplex branches.
The transmitting unit 522 transmits the local model parameter and the weight learned by the learning unit 521 to a server apparatus that generates a global model based on local model parameters.
Thus, the client apparatus 500 has the learning unit 521 and the transmitting unit 522. With such a configuration, the transmitting unit 522 can transmit the local model parameter and the weight learned by the learning unit 521 to the server apparatus 400. As a result, the server apparatus 400 can calculate the parameter of the global model using the weight. Consequently, it is possible to increase the accuracy.
The client apparatus 500 described above can be realized by installation of a predetermined program in an information processing apparatus such as the client apparatus 500. Specifically, a program as another aspect of the present invention is a computer program for causing an information processing apparatus such as the client apparatus 500 to realize processes to: learn, by using training data included by the client apparatus, a local model parameter of each branch included by a neural network having multiplex branches capable of performing different operations on a common input and a weight for each branch used in superposing outputs from the respective multiplex branches; and transmit the learned local model parameter and weight to a server apparatus that generates a global model based on the local model parameter.
Further, a learning method executed by an information processing apparatus such as the client apparatus 500 described above is a method by an information processing apparatus such as the client apparatus 500 of: learning, by using training data included by the client apparatus, a local model parameter of each branch included by a neural network having multiplex branches capable of performing different operations on a common input and a weight for each branch used in superposing outputs from the respective multiplex branches; and transmitting the learned local model parameter and weight to a server apparatus that generates a global model based on the local model parameter.
The inventions of a program, a computer-readable recording medium having a program recorded thereon and a learning method with the abovementioned configurations also have the same actions and effects as the client apparatus 500 described above, and therefore, can achieve the abovementioned object of the present invention.
The whole or part of the example embodiments disclosed above can be described as the following supplementary notes. Below, the overview of a server apparatus and the like according to the present invention. However, the prevent invention is not limited to the following configurations.
A server apparatus comprising:
The server apparatus according to Supplementary Note 1, wherein
The server apparatus according to Supplementary Note 2, wherein
The server apparatus according to Supplementary Note 2, wherein
The server apparatus according to Supplementary Note 1, wherein
A calculation method by an information processing apparatus, the calculation method comprising:
A computer program comprising instructions for causing an information processing apparatus to realize processes to:
A client apparatus comprising:
The client apparatus according to Supplementary Note 8, comprising
A learning method by an information processing apparatus, the learning method comprising:
Although the present invention has been described above with reference to the example embodiments, the present invention is not limited to the abovementioned example embodiments. The configurations and details of the present invention can be changed in various manners that can be understood by one skilled in the art within the scope of the present invention
100 learning system
200 client apparatus
210 operation input unit
220 screen display unit
230 communication I/F unit
240 storing unit
241 training data information
242 local model information
243 program
250 operation processing unit
251 parameter receiving unit
252 learning unit
253 transmitting unit
300 server apparatus
310 operation input unit
320 screen display unit
330 communication I/F unit
340 storing unit
341 reception information
342 global model information
343 program
350 operation processing unit
351 receiving unit
352 calculating unit
353 parameter transmitting unit
400 server apparatus
401 CPU
402 ROM
403 RAM
404 programs
405 storage device
406 drive device
407 communication interface
408 input/output interface
409 bus
410 recording medium
411 communication network
421 receiving unit
422 calculating unit
423 transmitting unit
500 client apparatus
521 learning unit
522 transmitting unit
| Number | Date | Country | Kind |
|---|---|---|---|
| 2022-092301 | Jun 2022 | JP | national |