The present invention relates to a federated learning technique, and particularly, to a technique for efficiently registering a local model in a local model management table used to compute a global model from local models.
As a technique for performing learning without aggregating training data into one device, there is a federated learning technique. As a federated learning technique, there is FedAVG described in Non Patent Literature 1, for example.
When the federated learning technique is used, since training data is not taken outside of a local model training device, anxiety about taking the data out can be eliminated, and at the same time, high speed can be attained through parallel learning. However, if parameters of a model that is being trained are traced and leak from the process of communication between the local model training devices 1001, . . . , 100M and the global model computation device 900, for example, there is a risk of training data being inferred. In order to avoid such a risk, using secure computation for computation of a global model can be considered.
Secure computation is a method of obtaining results of a designated arithmetic operation without restoring encrypted numerical values (refer to Reference Non Patent Literature 1, for example). In the method of Reference Non Patent Literature 1, encryption for distributing a plurality of pieces of information that can be used to restore numerical values to three secure computation devices can be performed to maintain a state in which results of addition/subtraction, constant summation, multiplication, constant multiplication, logical operations (negation, logical product, logical sum, and exclusive logical sum), and data format conversion (integer, binary) have been distributed to the three secure computation devices without restoring numerical values, that is, an encrypted state. In general, the number of distributions is not limited to 3 and can be N (N is an integer of 3 or more), and a protocol for realizing secure computation by cooperative computation by N secure computation devices is called a multiparty protocol.
(Reference Non Patent Literature 1: Koji Chida, Koki Hamada, Dai Igarashi, and Katsumi Takahashi, “Reconsideration of Lightweight Verifiable 3-Party Concealment Function Computation,” In CSS, 2010.)
[NPL 1] McMahan, B., E. Moore, D. Ramage, et al., “Communication-efficient learning of deep networks from decentralized data,” In Artificial Intelligence and Statistics, pp. 1273-1282, 2017.
However, if a global model is computed by managing parameters of local models using a local model management table having the same structure as that of
Accordingly, an object of the present invention is to provide a technique for efficiently registering local models in a local model management table used when a global model is computed from local models in federated learning.
One aspect of the present invention is a secure global model computation device in a federated learning system including M local model training devices for training local models using training data and a secure global model computation system composed of N secure global model computation devices for secure computation of a global model from M local models, wherein M and K are integers of 2 or more and N is an integer of 3 or more, a local model is defined as a neural network composed of K layers, and a local model management table is defined as a table including an attribute having a set (m, k) (1≤m≤M, 1≤k≤K) of an identifier m for identifying a local model and an identifier k for identifying a layer as an attribute value and an attribute having shares of parameters of the local model as an attribute value, the secure global model computation device including: a transmission/reception unit configured to receive shares of parameters of a local model (hereinafter referred to as an m-th local model) trained by one local model training device (hereinafter referred to as an m-th local model training device (where m satisfies 1≤m≤M)) among the M local model training devices; and a parameter share registration unit configured to register the shares of the parameters of the m-th local model in a local model management table using K records having a set (m, k) of identifiers and shares of parameters of a k-th layer (1≤k≤K) of the m-th local model as one record.
One aspect of the present invention is a secure global model computation device in a federated learning system including M local model training devices for training local models using training data and a secure global model computation system composed of N secure global model computation devices for secure computation of a global model from M local models, wherein M and K are integers of 2 or more and N is an integer of 3 or more, a local model is defined as a model represented using K vectors, and a local model management table is defined as a table including an attribute having a set (m, k) (1≤m≤M, 1≤k≤K) of an identifier m for identifying a local model and an identifier k for identifying a vector constituting the local model as an attribute value and an attribute having shares of parameters of the local model as an attribute value, the secure global model computation device including: a transmission/reception unit configured to receive shares of parameters of a local model (hereinafter referred to as an m-th local model) trained by one local model training device (hereinafter referred to as an m-th local model training device, where m satisfies 1≤m≤M) among the M local model training devices; and a parameter share registration unit configured to register the shares of the parameters of the m-th local model in a local model management table using K records having a set (m, k) of identifiers and shares of parameters included in a k-th vector (1≤k≤K) of the m-th local model as one record.
According to the present invention, it is possible to efficiently register local models in the local model management table used when a global model is computed from local models in the federated learning.
The following describes embodiments of the present invention in detail. Note that constituent elements having the same function will be denoted by the same reference numerals and redundant description thereof will be omitted.
A notation method used in this specification will be described before the embodiments are described.
^ (caret) denotes superscript. For example, xy^z indicates that yz is a superscript to x, and xy^z indicates that yz is a subscript to x. In addition, _ (underscore) indicates a subscript. For example, xy_z indicates that yz is a superscript to x, and xy_z indicates that yz is a subscript to x. Superscripts “^” and “˜” as in ^ x and ˜x for a certain character x would normally be written directly above “x,” but are written as ^x or ˜x here due to restrictions on notation in this specification.
Secure computation in the present invention is constructed using a combination of arithmetic operations in existing secure computation. Arithmetic operations necessary for the secure computation include, for example, concealment, addition, subtraction, multiplication, division, logical operations (negation, logical product, logical sum, and exclusive logical sum), and comparison operations (=, <, >, ≤, and ≥). Several operations and their notation will be described below.
[[x]]is assumed to be a value obtained by concealing x by secret sharing (hereinafter referred to as a share of x). Any method can be used as a secret sharing method. For example, Shamir secret sharing on GF (261-1) and replicated secret sharing on Z2 can be used.
A plurality of secret sharing methods may be used in combination in one certain algorithm. In this case, it is assumed that they can be interconverted as appropriate.
Further, it is assumed that [[→x]]=([[x1]], . . . , [[xN]]) for an N-dimensional vector →x=(x1, . . . , XN). That is, [[→x]] is a vector having a share [[xn]]of an n-th element xn, of →x as an n-th element. Similarly, for an M×N matrix A=(am,n) (1≤m≤M , 1≤n≤N), [[A]] is assumed to be a matrix having a share [[am,n]] of an (m, n)-th element of A as an (m, n)-th element.
Note that x is referred to as plaintext of [[x]].
As a method of obtaining [[x]] from x (concealment) and a method of obtaining x from [[x]] (restoration), specifically, there are methods described in Reference Non Patent Literature 1 and Reference Non Patent Literature 2.
(Reference Non Patent Literature 2: Shamir, A, “How to share a secret,” Communications of the ACM, Vol. 22, No. 11, pp. 612-613, 1979.)
Addition [[x]]+[[y]] according to secure computation has [[x]] and [[y]] as inputs and [[x+y]] as an output. Subtraction [[x]]−[[y]] according to secure computation has [[x]] and [[y]] as inputs and [[x−y]] as an output. Multiplication [[x]]×[[y]] (which may be represented as mul([[x]], [[y]])) according to secure computation has [[x]] and [[y]] as inputs and [[x×y]] as an output. Division [[x]]/[[y]] (which may be represented as div ([[x]], [[y]])) according to secure computation has [[x]] and [[y]] as inputs [[x/y]] as an output.
As specific methods of addition, subtraction, multiplication and division, there are methods described in Reference Non Patent Literature 3 and Reference Non Patent Literature 4.
(Reference non-patent literature 3: Ben-or, M., Goldwasser, S. and Wigderson, A., “Completeness theorems for non-cryptographic fault-tolerant distributed computation,” Proceedings of the twentieth annual ACM symposium on Theory of computing, ACM, pp. 1-10, 1988.)
(Reference non-patent literature 4: Gennaro, R., Rabin, M. O. and Rabin, T., “Simplied VSS and fast-track multiparty communications with applications to threshold cryptography,” Proceedings of the seventeenth annual ACM symposium on
Principles of distributed computing, ACM, pp. 101-111, 1998.)
Negation not [[x]] according to secure computation has [[x]] as an input and [[not(x)]] as an output. Logical product and ([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs and [[and(x, y)]] as an output. Logical sum or ([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs and [[or(x, y)]] as an output. Exclusive logical sum xor([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs and [[xor(x, y)]] as an output.
Note that logical operations can be easily constructed by combining addition, subtraction, multiplication, and division.
Comparison Operations
Equal sign decision=([[x]], [[y]]) (which may be represented as equal ([[x]], [[y]])) according to secure computation has [[x]] and [[y]] as inputs, [[1]] as an output when x=y, and [[0]] as an output in other cases. Comparison <([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs, [[1]] as an output when x<y, and [[0]] as an output in other cases. Comparison >([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs, [[1]] as an output when x>y, and [[0]] as an output in other cases. Comparison ≤([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs, [[1]] as an output when x≤y, and [[0]] as an output in other cases. Comparison ≥([[x]], [[y]]) according to secure computation has [[x]] and [[y]] as inputs, [[1]] as an output when x≥y, and [[0]] as an output in other cases.
Note that comparison operations can be easily constituted by combining logical operations.
As described in [Technical Problem], it is very inefficient to compute a global model using the local model management table shown in
Hereinafter, a federated learning system 10 will be described with reference to
As shown in
As shown in
The secure global model computation system 20 realizes secure computation of a global model which is a multiparty protocol according to cooperative computation by N secure global model computation devices 200n. Therefore, training start condition determination means 220 (not shown) of the secure global model computation system 20 is composed of training start condition determination units 2201, . . . , 220N, and global model computation means 230 (not shown) is composed of global model computation units 2301, . . . , 230N.
Hereinafter, an operation of the local model training device 100m will be described with reference to
In S110m, the local model training unit 110m trains the m-th local model using training data recorded in the recording unit 190m. In the first training of the m-th local model, the local model training unit 110m may set initial values of parameters of the m-th local model using initial values recorded in advance in the recording unit 190m or may set the initial values of the parameters of the m-th local model using initial values generated using random numbers. In the second and subsequent training of the m-th local model, the local model training unit 110m sets initial values of the parameter of the m-th local model using a global model acquired in S130m which will be described later.
In S120m, the parameter share computation unit 120m computes shares of the parameters of the m-th local model from the parameters of the m-th local model trained in S110m. When the computation is finished, the parameter share computation unit 120m transmits the shares of the parameters of the m-th local model to the secure global model computation devices 2001, . . . , 200N using the transmission/reception unit 180m.
In S130m, the global model acquisition unit 130m acquires shares of parameters of the global model from the secure global model computation devices 2001, . . . , 200N using the transmission/reception unit 180m after the end of processing of S120m or after the elapse of a predetermined time from the end of processing of S150m.
In S140m, the parameter computation unit 140m computes parameters of the global model from the shares of the parameters of the global model acquired in S130m. The parameter computation unit 140m records the computed parameters of the global model in the recording unit 190m. Note that, in the recording unit 190m, at least two sets of the parameters of the global model, that is, the parameters of the global model obtained through the current computation and the parameters of the global model obtained through the previous computation are recorded.
In S150m, the training start condition determination unit 150m compares the parameters of the global model computed in S140m with the parameters of the global model obtained in the previous computation, executes processing of S110m upon determining that a training start condition is satisfied in a case in which the two sets of the parameters of the global model are different, and returns to processing of S130m upon determining that the training start condition is not satisfied in other cases.
Hereinafter, the operation of the secure global model computation system 20 will be described with reference to
In S210, the parameter share registration unit 210, of the secure global model computation device 200n (1≤n≤N) takes the shares of the parameters of the m-th local model trained by the m-th local model training device 100 received using the transmission/reception unit 280, as inputs and registers the shares of the parameters of the m-th local model in the local model management table using K records having a set (m, k) of identifiers and shares of parameters of a k-th layer (1≤k≤K) of the m-th local model as one record.
In S220, the training start condition determination means 220 executes processing of S230 upon determining that a training start condition is satisfied in a case in which the number of newly registered local models exceeds a predetermined value (the value is 1 or more and M or less), or is equal to or greater than the predetermined value after the previous global model computation, and returns to processing of S210 upon determining that the training start condition is not satisfied in other cases.
In S230, the global model computation means 230 computes shares of parameters of the global model using the shares of the parameters of the local models managed by the local model management table. The global model computation means 230 sets an average of shares of corresponding parameters from the first local model to the M-th local model as the shares of parameters of the global model, for example. Note that processing speed can be increased by representing shares of parameters of each model using a vector and performing various operations.
Although a local model is described as a neural network composed of K layers in the first embodiment, a local model may be a model represented using K vectors, in general. In this case, the local model management table is a table including an attribute having a set (m, k) (1≤m≤M, 1≤k≤K) of an identifier m for identifying a local model and an identifier k for identifying a vector constituting the local model as an attribute value and an attribute having shares of parameters of the local model as an attribute value. Further, in S210, the parameter share registration unit 210, of the secure global model computation device 200n (1≤n≤N) takes the shares of the parameters of the m-th local model trained by the m-th local model training device 100 received using the transmission/reception unit 280n, as inputs and registers the shares of the parameters of the m-th local model using K records having a set (m, k) of identifiers and shares of parameters included in a k-th vector (1≤k≤K) of the m-th local model as one record.
According to the embodiment of the present invention, it is possible to efficiently register local models in the local model management table used when a global model is computed from local models in federated learning.
The processing of each unit of each device described above may be implemented by a computer, and in this case, the processing details of the functions that each device should have are described by a program. In addition, various types of processing functions in each device described above are realized on a computer by causing this program to be read by a recording unit 2020 of a computer 2000 shown in
Each device of the present invention includes, as a single hardware entity, for example, an input unit to which a signal can be input from the outside of the hardware entity, an output unit through which a signal can be output to the outside of the hardware entity, a communication unit to which a communication device (for example, a communication cable) capable of communicating with the outside of the hardware entity can be connected, a CPU (Central Processing Unit, which may include a cache memory, a register, or the like) serving as an arithmetic processing unit, a RAM and a ROM serving as memories, an external storage device serving as a hard disk, and a bus that connects the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM, and the external storage device such that data can be exchanged therebetween. As necessary, a device (drive) capable of reading/writing data from/to a recording medium such as a CD-ROM may be provided in the hardware entity. An example of a physical entity including such hardware resources is a general-purpose computer.
The external storage device of the hardware entity stores programs necessary for realizing the functions described above, data necessary for processing the programs, and the like (not limited to the external storage device, for example, the program may be stored in a ROM which is a read-only storage device). In addition, data and the like obtained by processing of these programs are appropriately stored in the RAM, the external storage device, or the like.
In the hardware entity, each program stored in the external storage device (or the ROM or the like) and data necessary for processing each program are read into a memory as necessary, and are appropriately interpreted, executed, and processed by the CPU. As a result, the CPU realizes a predetermined function (each component represented by the aforementioned unit, . . . means, or the like). That is, each component of the embodiment of the present invention may be configured as processing circuitry.
As described above, when the processing function in the hardware entity (the device according to the present invention) described in the above-described embodiments is implemented by the computer, details of processing of the function included in the hardware entity is written by the program. Then, by executing this program on the computer, the processing function in the above-described hardware entity is implemented on the computer.
A program describing the details of processing can be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a non-transitory recording medium, and specifically a magnetic recording device, an optical disc, or the like.
Further, the program is distributed, for example, by sales, transfer, or lending of a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. In addition, the distribution of the program may be performed by storing the program in advance in a storage device of a server computer and transferring the program from the server computer to another computer via a network.
A computer executing such a program is configured to, for example, first, temporarily store a program recorded on a portable recording medium or a program transferred from a server computer in an auxiliary recording unit 2025 which is its own non-transitory storage device. When executing the processing, the computer reads the program stored in the auxiliary recording unit 2025 which is its own non-transitory storage device into the recording unit 2020, and executes the processing according to the read program. As another embodiment of the program, the computer may directly read the program from the portable recording medium into the recording unit 2020 and execute processing according to the program. Each time the program is transferred from the server computer to the computer, the processing according to the received program may be executed sequentially. In addition, the processing may be executed by means of a so-called ASP (Application Service Provider) type service which does not transfer a program from the server computer to the computer and implements processing functions only by execution instructions and acquisition of the results. It is assumed that the program in this embodiment includes equivalent which is information to be provided for processing by an electronic computer and which is equivalent to a program (e.g., data that is not a direct command to the computer but has the property of defining the processing of the computer).
In addition, although the present device is configured by executing a predetermined program on the computer in this form, at least a part of details of processing may be implemented by hardware.
The present invention is not limited to the above-described embodiment, and appropriate changes can be made without departing from the spirit of the present invention.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2022/016505 | 3/31/2022 | WO |