This application is based upon and claims the benefit of priority from Japanese patent application No. 2022-092302, Jun. 7, 2022, the disclosure of which is incorporated herein in its entirety by reference.
The present invention relates to a server apparatus, a calculation method, and a recording medium.
From the viewpoint of privacy protection and the like, a technique called federated learning is known in which a plurality of clients cooperate to perform machine learning without directly exchanging training data.
A literature describing federated learning is, for example, Patent Literature 1. Patent Literature 1 describes a machine learning system that includes a plurality of client terminals and an integrated server. According to Patent Literature 1, the client terminal executes machine learning of a training target local model by using data existing in the system to which the client terminal belongs as training data in accordance with an instruction of received distribution information. The client terminal then transmits the result of learning of the local model to the integrated server. The integrated server transmits distribution information to the respective client terminals, receives the learning results from the respective client terminals, and integrates the received learning results to update a master model.
Although an averaged model is obtained in general federated learning, a technique called personalized federated learning is also known as a federated learning approach of obtaining not an averaged model but a model optimized for each client.
A literature describing personalized federated learning is, for example, Non-Patent Literature 1. For example, Non-Patent Literature 1 describes a method in which a client updates a local model of the client by receiving local models from other clients, gives a large weight to a local model that fits data of the client, and adding to the local model of the client.
Patent Literature 1: WO2021/193815
Non-Patent Literature 1: Zhang, Michael, et al. “Personalized Federated Learning with First Order Model Optimization.”, ICLR2021
In the technique described by Non-Patent Literature 1, a client needs to obtain local models of other clients, and the local models are shared among all the clients. Therefore, the risk of information leakage is high. Thus, there is a problem that it is difficult to obtain a model appropriate for each client while reducing the risk of information leakage.
Accordingly, an object of the present invention is to provide a server apparatus, a calculation method and a recording medium that solve the abovementioned problem.
In order to achieve the object, a server apparatus as an aspect of the present disclosure includes: at least one memory configured to store instructions; and at least one processor configured to execute the instructions. The processor is configured to: receive, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches; calculate a degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses; calculate a parameter of a global model based on the local model parameter selected based on a result of calculation by the similarity degree calculating unit; and transmit the parameter calculated by the parameter calculating unit to the client apparatus.
Further, a calculation method as another aspect of the present disclosure is a calculation method by an information processing apparatus, and includes: receiving, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches; calculating a degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses; calculating a parameter of a global model based on the local model parameter selected based on a result of the calculating; and transmitting the calculated parameter to the client apparatus.
Further, a recording medium as another aspect of the present disclosure is a non-transitory computer-readable recording medium having a program recorded thereon, and the program includes instructions for causing an information processing apparatus to realize process to: receive, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches; calculate a degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses; calculate a parameter of a global model based on the local model parameter selected based on a result of the calculation; and transmit the calculated parameter to the client apparatus.
With the respective configurations as described above, the abovementioned problem can be solved.
A first example embodiment of the present disclosure will be described with reference to
In the first example embodiment of the present disclosure, a learning system 100 that performs federated learning in which a plurality of client apparatuses 200 and a server apparatus 300 learn in cooperation will be described. As illustrated in
Further, in the case of this example embodiment, in the machine learning model learned by the learning system 100, a linear transformation can be multiplexed. For example, as illustrated in
Further, in the case of this example embodiment, in the machine learning model learned by the learning system 100, outputs from the respective multiplex branches are superposed using weights for the respective branches. For example, in the machine learning model, by calculating a weighted sum that is adding the results of multiplying the outputs from the branches by weights corresponding to the branches, the outputs from the respective branches are superposed using the weights for the respective branches. For example, in the case illustrated in
The respective branches perform a common operation using parameters common in each client apparatus 200. In other words, the parameters of the respective branches are learned by federated learning by the client apparatuses 200 and the server apparatus 300. On the other hand, a weight for each branch used in superposing the outputs from the respective multiplex branches, a normalization parameter used in a normalization layer, and the like, are learned by each client apparatus 200. Therefore, the weight, the normalization parameter, and the like, may vary for each client apparatus 200. By learning the weight and the like for each client apparatus 200, each client apparatus 200 can learn a local model appropriate for its own data.
The client apparatus 200 is an information processing apparatus that updates a parameter and the like received from the server apparatus 300 by using training data of the client apparatus 200.
The operation input unit 210 is formed of an operation input device such as a keyboard and a mouse. The operation input unit 210 detects an operation by an operator who operates the client apparatus 200 and outputs to the operation processing unit 250.
The screen display unit 220 is formed of a screen display device such as an LCD (Liquid Crystal Display). The screen display unit 220 can display on a screen a variety of information stored in the storing unit 240 in accordance with an instruction from the operation processing unit 250.
The communication I/F unit 230 is formed of a data communication circuit and the like. The communication I/F unit 230 performs data communication with an external apparatus such as the server apparatus 300 connected via a communication line.
The storing unit 240 is a storage device such as a HDD (hard disk drive), a SSD (Solid State Drive), and memory. The storing unit 240 stores therein processing information necessary for a variety of processing by the operation processing unit 250 and a program 243. The program 243 is loaded and executed by the operation processing unit 250 to realize various processing units. The program 243 is loaded in advance from an external device and a recording medium via a data input/output function such as the communication I/F unit 230 and is stored in the storing unit 240. Major information stored in the storing unit 240 includes training data information 241, local model information 242, and the like.
The training data information 241 includes training data used when a learning unit 252 to be described later performs learning. For example, the training data information 241 is acquired in advance by a method such as acquiring from an external device via the communication I/F unit 230 or inputting with the operation input unit 210 and the like, and is stored in the storing unit 240. For example, the training data included by the training data information 241 may vary for each client apparatus 200. In this example embodiment, a specific content of the training data is not particularly limited. The training data information 241 may include any training data.
The local model information 242 includes information indicating various parameters and values configuring a local model, such as a parameter used in an operation corresponding to each branch (for example, a local model parameter), a weight for each branch used in superposing the outputs from the respective branches, and a normalization parameter. For example, the local model information 242 is updated in accordance with various processes such as reception of the parameters of the respective branches from the server apparatus 300 and learning with the training data information 241 by the learning unit 252 to be described later. The operation processing unit 250 has an arithmetic logic unit such as a CPU (Central
Processing Unit) and a peripheral circuit thereof. By loading the program 243 from the storing unit 240 and executing the program 243, the operation processing unit 250 makes the abovementioned hardware and the program 243 cooperate and realizes various processing units. Major processing units realized by the operation processing unit 250 include a parameter receiving unit 251, a learning unit 252, a parameter transmitting unit 253, and the like.
The operation processing unit 250 may have, instead of the abovementioned CPU, a GPU (Graphic Processing Unit), a DSP (Digital Signal Processor), a MPU (Micro Processing Unit), a FPU (Floating point number Processing Unit), a PPU (Physics Processing Unit), a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.
The parameter receiving unit 251 receives a parameter corresponding to each of the branches in each of the layers configuring the training target neural network, from the server apparatus 300. For example, the parameter receiving unit 251 receives weight values used in performing an operation such as a convolution operation as parameters. Moreover, the parameter receiving unit 251 stores the received parameters as the local model information 242 into the storing unit 240.
The learning unit 252 performs machine learning using the training data included by the training data information 241 on a model having the parameter received by the parameter receiving unit 251, and thereby updates a parameter of each branch, a weight for each branch, and the like. In other words, the learning unit 252 performs machine learning using the training data included by the training data information 241 and updates the parameter, the weight and the like for each branch, and thereby generates a local model having a new local model parameter. Other than the parameters for the respective branches, the weights for the respective branches, and the like, the learning unit 252 may learn a normalization parameter, and the like. For example, the learning unit 252 may perform the abovementioned machine learning using a known method such as stochastic gradient descent.
For example, the learning unit 252 performs the machine learning using the training data, and thereby updates the parameter received by the parameter receiving unit 251 and calculates a new local model parameter. That is to say, a target parameter for update by the learning unit 252 is a value received by the parameter receiving unit 251, and is common in each client apparatus 200. On the other hand, the learning unit 252 performs the machine learning using the training data, and thereby updates the weight calculated in previous local model parameter calculation and calculates a new weight. That is to say, a target weight for update by the learning unit 252 is a value calculated before by each client apparatus 200, and can vary for each client apparatus 200.
The parameter transmitting unit 253 transmits a local model parameter, which is the parameter updated by the learning unit 252, to the server apparatus 300. In other words, the parameter transmitting unit 253 in this example embodiment does not transmit the weight corresponding to each of the branches, whereas transmits the local model parameter to the server apparatus 300.
The above is an example of the configuration of the client apparatus 200. Meanwhile, the configuration of the client apparatus 200 is not limited to the case illustrated above. For example, the operation processing unit 250 of the client apparatus 200 may execute the program 243 and thereby realize a converting unit that converts a plurality of branches to one branch by using the parameters of the respective branches and the weights corresponding to the respective branches. Moreover, the operation processing unit 250 of the client apparatus 200 may execute the program 243 and thereby realize an interring unit that performs inference using a local model determined in accordance with the parameter (local model parameter), the weight, and the like, indicated by the local model information 242. For example, as described above, the client apparatus 200 may have a configuration other than illustrated above.
The server apparatus 300 is an information processing apparatus that calculates the parameter of a global model by using the local model parameters received from the respective client apparatuses 200.
The operation input unit 310 is formed of an operation input device such as a keyboard and a mouse. The operation input unit 310 detects an operation by an operator who operates the server apparatus 300, and outputs the operation to the operation processing unit 350.
The screen display unit 320 is formed of a screen display device such as an LCD. The screen display unit 320 can display on a screen a variety of information stored in the storing unit 340 in accordance with an instruction from the operation processing unit 350.
The communication I/F unit 330 is formed of a data communication circuit, and the like. The communication I/F unit 330 performs data communication with an external apparatus such as the client apparatus 200 connected via a communication line.
The storing unit 340 is a storage device such as a HDD, a SSD, and memory. The storing unit 340 stores therein processing information necessary for a variety of processing by the operation processing unit 350 and a program 343. The program 343 is loaded and executed by the operation processing unit 350 and thereby realizes various processing units. The program 343 is loaded in advance from an external device and a recording medium via a data input/output function such as the communication I/F unit 330 and is stored in the storing unit 340. Major information stored in the storing unit 340 includes, for example, reception information 341 and global model information 342.
The reception information 341 includes information indicating local model parameters received from the respective client apparatuses 200. For example, the reception information 341 is updated when a parameter receiving unit 351 receives information indicating local model parameters from the client apparatuses 200 via the communication I/F unit 330.
The global model information 342 includes information indicating a model parameter of a global model, calculated based on the reception information 341. For example, the global model information 342 is updated when a similarity degree calculating unit 352 to be described later calculates a parameter based on the reception information 341.
In the storing unit 340, information other than illustrated above may be stored. For example, information indicating the number of training data owned by each of the client apparatuses 200 included by the learning system 100 can be stored in the storing unit 340.
The operation processing unit 350 has an arithmetic logic unit such as a CPU and a peripheral circuit thereof. The operation processing unit 350 loads the program 343 from the storing unit 340 and executes the program 343, and thereby makes the abovementioned hardware and the program 343 cooperate and realizes various processing units. Major processing units realized by the operation processing unit 350 includes, for example, the parameter receiving unit 351, the similarity degree calculating unit 352, a permutating unit 353, a parameter calculating unit 354, and a parameter transmitting unit 355. As in the case of the operation processing unit 250 included by the client apparatus 200, the operation processing unit 350 may have, instead of the CPU, a GPU or the like.
The parameter receiving unit 351 receives a local model parameter of each branch of each layer from each client apparatus 200. Moreover, the parameter receiving unit 351 stores the received local model parameter as the reception information 341 into the storing unit 340.
The similarity degree calculating unit 352 calculates the degree of similarity between local model parameters corresponding to each branch, received from different client apparatuses 200. For example, the similarity degree calculating unit 352 performs a process of calculating the degree of similarity for each of the layers configuring the neural network.
For example, by repeating a process of calculating the degree of similarity between local model parameters received from two client apparatuses, the similarity degree calculating unit 352 can calculate the degree of similarity between local model parameters received from the respective client apparatuses 200. In other words, the similarity degree calculating unit 352 sequentially solves bipartite matching problems to calculate the degrees of similarity between the respective local model parameters.
Referring to
Subsequently, the similarity degree calculating unit 352 calculates the degree of similarity between the local model parameter corresponding to each of the branches of the client apparatus 200-2 and the local model parameter corresponding to each of the branches of the client apparatus 200-3. After that, the similarity degree calculating unit 352 sequentially executes the same bipartite matching so that the similarity degree calculation process is performed one time or two times between one client apparatus 200 and the other client apparatus included by the learning system 100. For example, as illustrated in
The similarity degree calculating unit 352 may calculate the degree of similarity by any method. For example, the similarity degree calculating unit 352 can calculate a norm as the degree of similarity as shown by Equation 1. For example, Equation 1 shows an example of calculation of a norm in the case of calculating the degree of similarity between vector u as a local model parameter and vector v as a local model parameter. In the case illustrated by Equation 1, the smaller the value, the higher the degree of similarity.
where p may be any value such as 1 or 2.
Further, the similarity degree calculating unit 352 may calculate, instead of a norm, a cosine similarity as shown by Equation 2. As shown by Equation 2, the similarity degree calculating unit 352 can calculate a cosine similarity by dividing the inner product of vector u and vector v by the magnitude of vector u and vector v. In the case shown by Equation 2, the larger the value, the higher the degree of similarity.
The similarity degree calculating unit 352 may calculate the degree of similarity between the local model parameters by a method other than the examples illustrated above. Moreover, the similarity degree calculating unit 352 may calculate the degree of similarity by a method other than bipartite matching. For example, the similarity degree calculating unit 352 may calculate the degree of similarity for all combinations of client apparatuses 200 or all combinations of branches for which the similarity degree calculation is to be performed.
The permutating unit 353 performs a permutation process of permutating the branches based on the degrees of similarity calculated by the similarity degree calculating unit 352. For example, it is assumed that the parameter calculating unit 354, which will be described later, calculates a parameter of a global model based on local model parameters corresponding to branches with the same sequential number. In this case, the permutating unit 353 permutates the branches based on the degrees of similarity so that the parameter calculating unit 354 performs the process of calculating the parameter of the global model in a combination of branches with the highest degree of similarity.
For example, it is assumed that the permutating unit 353 leaves the order of the branches unchanged in the client apparatus 200-1. That is to say, the permutating unit 353 performs identity permutation. Next, the permutating unit 353 focuses on the client apparatus 200-1 and the client apparatus 200-2, and permutates the branches corresponding to the client apparatus 200-2 so that branches with high degree of similarity have the same sequential number. After that, the permutating unit 353 focuses on the client apparatus 200-2 and the client apparatus 200-3, and permutates the branches corresponding to the client apparatus 200-3. After that, the permutating unit 353 performs the same permutation process for each combination of the client apparatuses 200 for which the similarity calculating unit 352 has calculated the degree of similarity.
For example, as illustrated in
The parameter calculating unit 354 calculates a parameter of the global model based on local model parameters selected based on the result of calculation of the degree of similarity by the similarity degree calculating unit 352. For example, the parameter calculating unit 354 selects branches with the same sequential number as a target for calculation based on the result of the permutation process by the permutating unit 353, and calculates a parameter of the global model. Moreover, the parameter calculating unit 354 stores the calculated parameter of the global model as the global model information 342 into the storing unit 340.
For example, referring to
Specifically, for example, the parameter calculating unit 354 calculates a parameter of the global model based on a plurality of local model parameters by performing weighting using the number of training data owned by the client apparatus 200 and then calculating the average of the local model parameters. For example, the parameter calculating unit 354 can calculate a parameter of the global model based on a plurality of local model parameters by solving an equation shown in Equation 3.
where n indicates the total number of data owned by each client apparatus, nk indicates the number of data owned by a client apparatus k, K is the total number of the client apparatuses 200 included by the learning system 100, Wij indicates a parameter of the j-th branch of the i-th layer, and σ−1(j) indicates a branch before permutation that becomes the branch j after permutation.
The parameter calculating unit 354 may calculate a parameter of the global model by a method other than illustrated above. For example, the parameter calculating unit 354 may be configured to, without performing weighting using the number of training data, calculate the average of the respective local model parameters. Moreover, the parameter calculating unit 354 may be configured to select a combination of branches with a high degree of similarity based on the result of similarity degree calculation by the similarity degree calculating unit 352, and calculate a parameter of the global model using the selected combination.
The parameter transmitting unit 355 transmits the parameter of the global model calculated by the parameter calculating unit 354 to the client apparatus 200. The parameter transmitting unit 355 may return the branch of the global model to the branch before permutation and thereafter transmit to the client apparatus.
The above is an example of the configuration of the server apparatus 300. Subsequently, an example of operation of the learning system 100 will be described with reference to
The learning unit 252 performs machine learning using training data included by the training data information 241 on a model having the parameter received by the parameter receiving unit 251, and thereby updates a parameter of each of the branches, a weight for each of the branches, and the like (step S102). For example, the learning unit 252 may perform the machine learning by a known method such as stochastic gradient descent.
The parameter transmitting unit 253 transmits a local model parameter, which is the parameter updated by the learning unit 252, to the server apparatus 300 (step S103).
The above is an example of the operation of the client apparatus 200. Subsequently, with reference to
The similarity degree calculating unit 352 calculates a degree of similarity between local model parameters corresponding to each branch that are received from different client apparatuses 200 (step S202). For example, by repeatedly executing a process of calculating a degree of similarity between local model parameters received from two client apparatus, the similarity degree calculating unit 352 can calculate the degrees of similarity between the local model parameters received from the respective client apparatuses. In other words, the similarity degree calculating unit 352 sequentially solves bipartite matching to calculate the degrees of similarity between the respective local model parameters.
The permutating unit 353 performs a permutation process of permutating the branches based on the degrees of similarity calculated by the similarity degree calculating unit 352 (step S203). For example, the permutating unit 353 permutates the branches based on the degrees of similarity so that a process of calculating the parameter of the global model by the parameter calculating unit 354 is performed in a combination of branches with high degree of similarity.
The parameter calculating unit 354 calculates the parameter of the global model based on the local model parameters selected based on the result of the similarity degree calculation by the similarity degree calculating unit 352 (step S204). For example, the parameter calculating unit 354 selects branches with the same sequential number as a calculation target based on the result of the permutation process by the permutating unit 353, and calculates the parameter of the global model.
The parameter transmitting unit 355 transmits the parameter of the global model calculated by the parameter calculating unit 354 to the client apparatuses 200 (step S205). Meanwhile, the parameter transmitting unit 355 may return the branches of the global model to the branches before permutation and transmit to the client apparatus.
The above is an example of the operation of the server apparatus 300. In the learning system 100, a series of steps as illustrated with reference to
Thus, the server apparatus 300 has the parameter calculating unit 354 and the parameter transmitting unit 355. With such a configuration, the parameter transmitting unit 355 can transmit the parameter calculated by the parameter calculating unit 354 to the client apparatuses 200. As a result, each of the client apparatuses 200 can update the local model parameter and the weight by using the received parameter. Consequently, without sharing the local model between the client apparatuses 200, for example, each of the client apparatuses 200 can learn the weight and thereby train the local model appropriate for its own data. As a result, it is possible to reduce the risk of information leakage.
Furthermore, the server apparatus 300 described in this example embodiment has the similarity degree calculating unit 352 and the parameter calculating unit 354. With such a configuration, the parameter calculating unit 354 can calculate the parameter of the global model based on the local model parameters selected based on the result of similarity degree calculation by the similarity degree calculating unit 352. Conventionally, without calculating the degree of similarity, the average has been taken between the same branches at all times. As a result, learning may become unstable when parameters far apart have the same sequential number. In the case of the server apparatus 300 described in this example embodiment, as mentioned above, the parameter of the global model is calculated by averaging the similar parameters. As a result, more stable learning is possible as compared with the conventional case, and more accurate learning is possible.
Next, a second example embodiment of the present disclosure will be described with reference to
In the second example embodiment of the present disclosure, an example of the configuration of the server apparatus 400 that is an information processing apparatus performing learning in cooperation with an external apparatus such as a client apparatus will be described.
Referring to
a CPU (Central Processing Unit) 401 (arithmetic logic unit),
a ROM (Read Only Memory) 402 (memory unit),
a RAM (Random Access Memory) 403 (memory unit),
programs 404 loaded to the RAM 403,
a storage device 405 for storing the programs 404,
a drive device 406 that reads from and writes into a recording medium 410 outside the information processing apparatus,
a communication interface 407 connected to a communication network 411 outside the information processing apparatus,
an input/output interface 408 that inputs and outputs data, and
a bus 409 that connects the respective components.
The server apparatus 400 may use, instead of the abovementioned CPU, a GPU (Graphic Processing Unit), a DSP (Digital Signal Processor), a MPU (Micro Processing Unit), a FPU (Floating point number Processing Unit), a PPU (Physics Processing Unit), a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.
Further, the server apparatus 400 can realize functions as a receiving unit 421, a similarity degree calculating unit 422, a parameter calculating unit 423, and a parameter transmitting unit 424 shown in
The receiving unit 421 receives, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches.
The similarity degree calculating unit 422 calculates the degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses.
The parameter calculating unit 423 calculates a parameter of a global model based on a local model parameter selected based on the result of calculation by the similarity degree calculating unit 422.
The parameter transmitting unit 424 transmits the parameter calculated by the parameter calculating unit 423 to the client apparatus.
Thus, the server apparatus 400 has the parameter calculating unit 423 and the parameter transmitting unit 424. With such a configuration, the parameter transmitting unit 424 can transmit a parameter calculated by the parameter calculating unit 423 to the client apparatus. As a result, the client apparatus can update a local model parameter and a weight using the received parameter, for example. Consequently, without sharing the local model among the client apparatuses, each client apparatus can learn a local model appropriate for its own data by learning a weight for each client apparatus, for example. As a result, the risk of information leakage can be reduced.
Furthermore, the server apparatus 400 described in this example embodiment has the similarity degree calculating unit 422 and the parameter calculating unit 423. With such as configuration, the parameter calculating unit 423 can calculate a parameter of a global model based on the local model parameter selected based on the result of similarity degree calculation by the similarity degree calculating unit 422. As a result, it is possible to make learning more stable as compared with a case where selection based on the degree of similarity is not performed, and it is possible to perform learning with higher accuracy.
The server apparatus 400 described above can be realized by installation of a predetermined program in an information processing apparatus such as the server apparatus 400. Specifically, a program as another aspect of the present invention is a computer program for causing an information processing apparatus such as the server apparatus 400 to realize processes to: receive, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches; calculate the degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses; and calculate a parameter of a global model based on a local model parameter selected based on the result of calculation.
Further, a calculation method executed by an information processing apparatus such as the server apparatus 400 described above is a method by an information processing apparatus such as the server apparatus 400 of: receiving, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches; calculating the degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses; and calculating a parameter of a global model based on a local model parameter selected based on the result of calculation.
The inventions of a program, a computer-readable recording medium having a program recorded thereon, and a calculation method with the abovementioned configurations also have the same actions and effects as the server apparatus 400 described above, and therefore, can achieve the abovementioned object of the present invention.
The whole or part of the example embodiments disclosed above can be described as the following supplementary notes. Below, the overview of a server apparatus and the like according to the present invention. However, the prevent invention is not limited to the following configurations.
A server apparatus comprising:
a receiving unit configured to receive, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches;
a similarity degree calculating unit configured to calculate a degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses;
a parameter calculating unit configured to calculate a parameter of a global model based on the local model parameter selected based on a result of calculation by the similarity degree calculating unit; and
a parameter transmitting unit configured to transmit the parameter calculated by the parameter calculating unit to the client apparatus.
The server apparatus according to Supplementary Note 1, wherein the similarity degree calculating unit is configured to, by repeatedly executing a process of calculating the degree of similarity between the local model parameters received from two client apparatuses, calculate the degrees of similarity between the local model parameters received from the plurality of client apparatuses.
The server apparatus according to Supplementary Note 1, wherein the similarity degree calculating unit is configured to calculate the degree of similarity between the local model parameter corresponding to each of the branches received from a first client apparatus among the plurality of client apparatuses and the local model parameter corresponding to each of the branches received from a second client apparatus different from the first client apparatus, and thereafter calculate the degree of similarity between the local model parameter corresponding to each of the branches received from the second client apparatus and the local model parameter corresponding to each of the branches received from a third client apparatus different from the second client apparatus.
The server apparatus according to Supplementary Note 1, wherein the parameter calculating unit is configured to select the branches corresponding to the respective client apparatuses so as to combine the branches with highest similarity degree based on the result of calculation by the similarity degree calculating unit.
The server apparatus according to Supplementary Note 1, comprising a permutating unit configured to permutate the branches based on the result of calculation by the similarity degree calculating unit,
wherein the parameter calculating unit is configured to select the branches to be a parameter calculation target based on a result of permutation by the permutating unit.
The server apparatus according to Supplementary Note 5, wherein the permutating unit is configured to permutate the branches so as to combine the branches with highest similarity degree.
The server apparatus according to Supplementary Note 5, wherein the parameter calculating unit is configured to calculate the parameter of the global model by calculating an average value of the branches with a same sequential number after permutation by the permutating unit.
A calculation method by an information processing apparatus, the method comprising:
receiving, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches;
calculating a degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses;
calculating a parameter of a global model based on the local model parameter selected based on a result of the calculating; and
transmitting the calculated parameter to the client apparatus.
The calculation method according to Supplementary Note 8, comprising by repeatedly executing a process of calculating the degree of similarity between the local model parameters received from two client apparatuses, calculating the degrees of similarity between the local model parameters received from the plurality of client apparatuses.
A computer program comprising instructions for causing an information processing apparatus to realize process to:
receive, from a plurality of client apparatuses that perform federated learning of a neural network model having multiplex branches capable of performing different operations on a common input and thereby learn a local model parameter of each of the multiplex branches and a weight for each branch used in superposing outputs from the respective multiplex branches, the local model parameters corresponding to each of the branches;
calculate a degree of similarity between the local model parameters corresponding to each of the branches, received from different client apparatuses;
calculate a parameter of a global model based on the local model parameter selected based on a result of the calculation; and
transmit the calculated parameter to the client apparatus.
Although the present invention has been described above with reference to the example embodiments, the present invention is not limited to the abovementioned example embodiments.
The configurations and details of the present invention can be changed in various manners that can be understood by one skilled in the art within the scope of the present invention
| Number | Date | Country | Kind |
|---|---|---|---|
| 2022-092302 | Jun 2022 | JP | national |