This specification discloses a neural network model learning method.
Real-world data essential for enhancing intelligent services is distributed across numerous edge devices (e.g., IoT devices, personal smartphones, or data storage spaces in different organizations) Deep learning may benefit from large data sets generated by bulk collection, but a server may be prohibited from acquiring data from edge devices as security concerns and privacy regulations increase. This may cause some limitations in centralized training of deep neural network models.
Federated learning (FL), which allows edge devices to collaboratively train models without sharing data with a central server, turns out to be a viable option to meet these requirements. In particular, the Federated Averaging (FedAvg) algorithm has emerged as an approach for model training in distributed environments with data privacy concerns. FedAvg works by allowing each edge device to train a local model with its own data before sending a trained parameter to the server. The server aggregates a received parameter into a single global model that inherits a trained function of the local model.
In fact, in an FL system, large data heterogeneity can occur because local data of each device varies depending on the characteristics and operation of the device. Heterogeneous data have a major problem causing slow convergence and unoptimized model performance in federated learning.
In order to solve the problems described above, embodiments of the present disclosure propose measures for preventing model performance degradation due to heterogeneous data in federated learning.
In order to solve the problem described above, this specification discloses a model learning method performed by a terminal according to an embodiment. The model learning method according to an embodiment includes obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation. The terminal may be configured to include a memory and a processor.
According to the embodiment, the representation may be obtained for each intermediate layer constituting the obtained model.
According to the embodiment, a previous model, a global model and a current model may be determined, a representation may be obtained for the previous model, the global model and the current model, and the current model may be updated based on the representation obtained for the previous model, the global model, and the current model.
According to the embodiment, the current model may be updated based on a representation loss, and the representation loss may be determined based on at least one of a similarity between the representation obtained from the previous model and the representation obtained from the current model and a similarity between the representation obtained from the current model and the representation obtained from the global model.
According to the embodiment, the representation loss may be determined for each layer constituting the current model.
According to the embodiment, the current model may be updated by applying a weight to the representation loss, and the weight may be determined for each layer constituting the current model.
According to the embodiment, the representation of the obtained model may be determined by performing computation of calculating a predetermined value on an intermediate layer result value of the obtained model.
According to the embodiment, the representation loss may be determined as a value that lowers the similarity between the representation obtained from the previous model and the representation obtained from the current model, and increases the similarity between the representation obtained from the current model and the representation obtained from the global model.
According to the embodiment, the weight may be determined based on the similarity between the representation obtained from the current model and the representation obtained from the global model.
Further, in order to solve the above problem, this specification discloses a terminal including a memory and a processor according to another embodiment. In the terminal, the processor may obtain at least one model of a previous model and a global model, obtains a representation of the obtained model, and updates a current model using the representation.
Further, in order to solve the above problem, this specification discloses a model learning method performed by a server according to still another embodiment. The model learning method performed by the server includes transmitting a global model to a terminal, and receiving a local parameter from the terminal. The local parameter may be determined by the terminal obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation. The server may be configured to include a communicator and a processor.
The method performed by the terminal and/or server described above may be provided by being recorded on a computer-readable recording medium in the form of a computer program for performing the method.
This specification discloses measures for introducing a regularization term into a local training process of federated learning as a simple and effective method for preventing model performance degradation due to heterogeneous data in federated learning. The regularization term can be calculated based on a representation extracted from an intermediate layer of a deployed model. For example, utilizing representations of all intermediate layers and assigning an appropriate weight to respective contributions thereof can provide more granular regularization to the training process.
FedIntR disclosed herein can be implemented by incorporating regularization into the local training process. Self-supervised learning with additional loss across the intermediate layers can improve the performance of the model on downstream tasks. By incorporating the intermediate representation into the FL process, more effective regularization for the data heterogeneity problem can be implemented.
Further, FedIntR can be considered a general approach that does not require manual selection of layers to be included in regularization, because FedIntR automatically determines the contribution of different intermediate layers to regularization based on the similarity between local and global representations.
With this, distributed edge devices (e.g., IoT devices, smartphones, data storage space, etc.) can jointly train models that can provide intelligent services using a federated learning mechanism having greater tolerance to data heterogeneity problems.
Hereinafter, a specific embodiment of the present disclosure will be described with reference to the drawings. The following detailed description is provided to aid in a comprehensive understanding of the methods, apparatus and/or systems described herein. However, this is illustrative only, and the present disclosure is not limited thereto.
In describing the embodiments of the present disclosure, when it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the present disclosure, a detailed description thereof will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present disclosure, which may vary according to the intention or custom of users or operators. Therefore, the definition should be made based on the contents throughout this specification. The terms used in the detailed description are only for describing embodiments of the present disclosure, and should not be limiting. Unless explicitly used otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as “comprising” or “including” are intended to refer to certain features, numbers, steps, actions, elements, some or combination thereof, and it is not to be construed to exclude the presence or possibility of one or more other features, numbers, steps, actions, elements, some or combinations thereof, other than those described.
In the following description, “transfer,” “communication,” “transmission,” “reception,” of a signal or information and other terms having similar meaning include not only direct transmission of a signal or information from one component to another component, but also transmission of the signal or information through another component. In particular, “transferring” or “transmitting” a signal or information to a component indicates a final destination of the signal or information and does not mean a direct destination. This is the same for “receiving” a signal or information. In addition, in this specification, the fact that two or more pieces of data or information are “related” means that if one data (or information) is acquired, at least part of the other data (or information) can be obtained based on it.
In addition, terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. Terms may be used for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component without departing from the scope of the present disclosure.
Utilizing the representation of at least one intermediate layer and assigning an appropriate weight to each contribution can provide more fine-grained regularization to the training process. Hereinafter, an idea of using the representation extracted from the intermediate layer to overcome the performance degradation caused by non-independent and identically distributed (non-IID) data of the FL will be described. In this specification, a method of enhancing the similarity between intermediate layer representations of the local model and the global model is presented. This is referred to as the Federated learning Intermediate Representations (FedIntR) algorithm (hereinafter referred to as “FedIntR”). In one embodiment, FedIntR can calculate a regularization term based on contrast loss using the local and global intermediate representations. In addition, FedIntR can also automatically calculate layer-wise weights to determine the degree to which each intermediate layer contributes to the regularization term. In addition, FedIntR may assign a greater contribution weight to a layer having a higher global and local representation similarity.
Here, i is a local loss of an i-th client, Di is a data set of the i-th client, N represents the total number of clients, and |D|=Σi=1N|Di|. Besides the cross-entropy loss, FedIntR adds a regularization term calculated with the help of the intermediate representation of a global model wgt and a local model wit−1 in the previous round.
FedIntR according to an embodiment may be implemented by incorporating regularization into a local training stage of FedAvg. For example, FedIntR can be implemented by incorporating regularization into the local training second stage of FedAvg. For example, FedIntR can be implemented by modifying a local training process of vanilla FedAvg. Vanilla FedAvg is a known algorithm, and thus description thereof is omitted.
Referring to
The FedIntR algorithm in
FedIntR can calculate the representation loss for the k-th layer using the equation below.
Here, τ is a temperature parameter and sim( ) is a similarity function. A cosine similarity function below can be used as the similarity function.
Equation 2 will be described in more detail. The layer-wise representation loss lk encourages the local representation zk to move further away from zpk and closer to the global representation zgk. Through this, the regularization term can maximize the distance between the local representation zk and the previous local representation zpk, and minimize the distance between the local representation zk and the global representation zgk.
The layer-wise representation loss lk is calculated for the k-th layer of the model, and may have different importance for each layer. Therefore, a different weight α may be assigned to each layer-wise representation loss lk. A layer-wise weight αk, which represents the contribution of lk to the regularization term, is determined based on the similarity of zk and zgk using the softmax function. For example, if the k-th layer has a higher similarity between zk and zgk than other layers, the representation loss lk of that layer may be assigned a higher weight. Specifically, αk can be calculated as follows.
Here, Σk=1K αk is 1.
FedIntR calculates αk and lk for all layers k∈[1, 2, . . . , K]. After that, [αk]k=1K and [lk]k=1K can be incorporated into the local training loss as regularization terms. Local loss is defined as (5) and can be calculated using the equation below.
Here, lsup represents the cross-entropy loss and the second element is a regularization term having μ which is a balancing parameter.
For example, the processor of the terminal may operate in order for the terminal to perform the following model learning method.
More specifically, the representation may be obtained for each intermediate layer constituting the obtained model. For example, the representation can be determined by performing computation of calculating a predetermined value on an intermediate layer result value of the obtained model.
In addition, a previous model, a global model and a current model may be determined, a representation may be obtained for the previous model, the global model and the current model, and the current model may be updated based on the representation obtained for the previous model, the global model, and the current model.
The current model may be updated based on a representation loss, and the representation loss may be determined based on at least one of a similarity between the representation obtained from the previous model and the representation obtained from the current model and a similarity between the representation obtained from the current model and the representation obtained from the global model. The representation loss may be determined for each layer constituting the current model.
The representation loss may be determined as a value that lowers the similarity between the representation obtained from the previous model and the representation obtained from the current model, and increases the similarity between the representation obtained from the current model and the representation obtained from the global model.
The current model may be updated by applying a weight to the representation loss. The weight may be determined for each layer constituting the current model. The weight may be determined based on the similarity between the representation obtained from the current model and the representation obtained from the global model.
In addition, the server according to an embodiment includes a communicator and a processor, and each component of the server can operate in order for the server to perform the following model learning method.
As described above, the federated learning method using FedIntR can be implemented by performing model learning by the server and the terminal that integrates regularization into the local training process. Self-supervised learning with additional loss across intermediate layers can improve the performance of the model on downstream tasks. By incorporating intermediate representations into the FL process, more effective regularization for the data heterogeneity problem can be implemented.
As described above, in federated learning, the representation of the intermediate layers of the entire model structure can be used to regularize the local training process. With this, the necessity of manual determination on the intermediate layer to be included in the regularization process may be negated. In addition, more information could be incorporated into the regularization process to effectively guide the local training process.
As described above, the weights may be assigned to different intermediate layers of the model in order to determine their contribution to the regularization term. The contribution weight for each intermediate layer may be calculated as a different value for each layer, as described above. With this, it is possible to avoid performance degradation of the global model that occurs due to assigning the same contribution weight (i.e., taking the average of all intermediate representation losses as regularization).
In addition, the contribution of each layer to the regularization term may not be determined manually, in that the contribution of each intermediate layer can be calculated using the similarity between the local intermediate representation and the global intermediate representation.
In this specification, a module may mean a functional and structural combination of hardware for carrying out the technical idea of the present invention and software for driving the hardware. For example, the “module” may mean a predetermined code and a logical unit of a hardware resource for executing the predetermined code, and does not necessarily mean a physically connected code or a single type of hardware.
The illustrated computing environment 10 includes a computing device 12. In an embodiment, the computing device 12 may be the base station. The computing device 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. The processor 14 may cause the computing device 12 to operate according to the exemplary embodiment described above. For example, the processor 14 may execute one or more programs stored on the computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 14, may be configured so that the computing device 12 performs operations according to the exemplary embodiment.
The computer-readable storage medium 16 is configured so that the computer-executable instruction or program code, program data, and/or other suitable forms of information are stored. A program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, the computer-readable storage medium 16 may be a memory (volatile memory such as a random access memory, non-volatile memory, or any suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, other types of storage media that are accessible by the computing device 12 and capable of storing desired information, or any suitable combination thereof.
The communication bus 18 interconnects various other components of the computing device 12, including the processor 14 and the computer-readable storage medium 16.
The computing device 12 may also include one or more input/output interfaces 22 that provide an interface for one or more input/output devices 24, and one or more network communication interfaces 26. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. The input/output device 24 may be connected to other components of the computing device 12 through the input/output interface 22. The exemplary input/output device 24 may include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touch pad or touch screen), a speech or sound input device, input devices such as various types of sensor devices and/or photographing devices, and/or output devices such as a display device, a printer, a speaker, and/or a network card. The exemplary input/output device 24 may be included inside the computing device 12 as a component configuring the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12.
Although representative embodiments of the present disclosure have been described in detail, a person skilled in the art to which the present disclosure pertains will understand that various modifications may be made thereto within the limits that do not depart from the scope of the present disclosure. Therefore, the scope of rights of the present disclosure should not be limited to the described embodiments, but should be defined not only by claims set forth below but also by equivalents to the claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0152754 | Nov 2022 | KR | national |
10-2023-0029872 | Mar 2023 | KR | national |