APPARATUS AND METHOD FOR TACKLING DATA HETEROGENEITY IN FEDERATED LEARNING USING INTERMEDIATE LAYER REPRESENTATION REGULARIZATION

Description

SPONSORED RESEARCH AND DEVELOPMENT

- National research and development program that supported this invention (1)
- Project identification number: 1711194179
- Project number: 00207816
- Name of Ministry: Ministry of Science and ICT
- Name of project management (professional) organization: National Research Foundation of Korea
- Research program name: Group research support
- Research project name: Development of core structure of Satellite-Air-Ground integrated networking system based on Meta Federated Learning
- Contribution rate: ¼
- Project carrying out organization name: Kyung Hee University Industry-Academic Cooperation Foundation
- Research period: 2023-03-01˜2024-02-29
- National research and development program that supported this invention (2)
- Project identification number: 1711193491
- Project t number: 2019-0-01287-005
- Name of Ministry: Ministry of Science and ICT
  - Project management (professional) organization name: Information and Communications Planning and Evaluation Institute
- Research program name: SW computing industry source technology development
- Research project name: (SW Star Lab) Evolutionary deep learning model generation platform for distributed edge
- Contribution rate: ¼
- Project carrying out organization name: Kyung hee University Industry-Academic Cooperation Foundation
- Research period: 2023.01.01˜2023.12.31
- National research and development program that supported this invention (3)
- Project identification number: 1711193622
- Project number: 2021-0-02068-003
- Name of Ministry: Ministry of Science and ICT
- Project management (professional) organization name: Information and Communications Planning and Evaluation Institute
- Research program name: Information and Communication Broadcasting Innovative Human Resources Training
- Research project name: Artificial intelligence innovation hub research and development
- Contribution rate: ¼
  - Project carrying out organization name: Korea University Industry-Academic Cooperation Foundation
- Research period: 2023.01.01˜2023.12.31
- National research and development program that supported this invention (4)
- Project identification number: 1711179316
- Project number: 00155911
- Name of Ministry: Ministry of Science and ICT
- Project management (professional) organization name: Information and Communications Planning and Evaluation Institute
- Research program name: Artificial intelligence convergence innovation human resources training
- Research project name: Artificial intelligence convergence innovation talent training (Kyung Hee University)
- Contribution rate: ¼
- Project carrying out organization name: Kyung Hee University Industry-Academic Cooperation Foundation
- Research period: 2023.01.01˜2023.12.31

BACKGROUND
1. Field

This specification discloses a neural network model learning method.

2. Background Art

Real-world data essential for enhancing intelligent services is distributed across numerous edge devices (e.g., IoT devices, personal smartphones, or data storage spaces in different organizations) Deep learning may benefit from large data sets generated by bulk collection, but a server may be prohibited from acquiring data from edge devices as security concerns and privacy regulations increase. This may cause some limitations in centralized training of deep neural network models.

Federated learning (FL), which allows edge devices to collaboratively train models without sharing data with a central server, turns out to be a viable option to meet these requirements. In particular, the Federated Averaging (FedAvg) algorithm has emerged as an approach for model training in distributed environments with data privacy concerns. FedAvg works by allowing each edge device to train a local model with its own data before sending a trained parameter to the server. The server aggregates a received parameter into a single global model that inherits a trained function of the local model.

In fact, in an FL system, large data heterogeneity can occur because local data of each device varies depending on the characteristics and operation of the device. Heterogeneous data have a major problem causing slow convergence and unoptimized model performance in federated learning.

SUMMARY

In order to solve the problems described above, embodiments of the present disclosure propose measures for preventing model performance degradation due to heterogeneous data in federated learning.

In order to solve the problem described above, this specification discloses a model learning method performed by a terminal according to an embodiment. The model learning method according to an embodiment includes obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation. The terminal may be configured to include a memory and a processor.

According to the embodiment, the representation may be obtained for each intermediate layer constituting the obtained model.

According to the embodiment, a previous model, a global model and a current model may be determined, a representation may be obtained for the previous model, the global model and the current model, and the current model may be updated based on the representation obtained for the previous model, the global model, and the current model.

According to the embodiment, the current model may be updated based on a representation loss, and the representation loss may be determined based on at least one of a similarity between the representation obtained from the previous model and the representation obtained from the current model and a similarity between the representation obtained from the current model and the representation obtained from the global model.

According to the embodiment, the representation loss may be determined for each layer constituting the current model.

According to the embodiment, the current model may be updated by applying a weight to the representation loss, and the weight may be determined for each layer constituting the current model.

According to the embodiment, the representation of the obtained model may be determined by performing computation of calculating a predetermined value on an intermediate layer result value of the obtained model.

According to the embodiment, the representation loss may be determined as a value that lowers the similarity between the representation obtained from the previous model and the representation obtained from the current model, and increases the similarity between the representation obtained from the current model and the representation obtained from the global model.

According to the embodiment, the weight may be determined based on the similarity between the representation obtained from the current model and the representation obtained from the global model.

Further, in order to solve the above problem, this specification discloses a terminal including a memory and a processor according to another embodiment. In the terminal, the processor may obtain at least one model of a previous model and a global model, obtains a representation of the obtained model, and updates a current model using the representation.

Further, in order to solve the above problem, this specification discloses a model learning method performed by a server according to still another embodiment. The model learning method performed by the server includes transmitting a global model to a terminal, and receiving a local parameter from the terminal. The local parameter may be determined by the terminal obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation. The server may be configured to include a communicator and a processor.

The method performed by the terminal and/or server described above may be provided by being recorded on a computer-readable recording medium in the form of a computer program for performing the method.

This specification discloses measures for introducing a regularization term into a local training process of federated learning as a simple and effective method for preventing model performance degradation due to heterogeneous data in federated learning. The regularization term can be calculated based on a representation extracted from an intermediate layer of a deployed model. For example, utilizing representations of all intermediate layers and assigning an appropriate weight to respective contributions thereof can provide more granular regularization to the training process.

FedIntR disclosed herein can be implemented by incorporating regularization into the local training process. Self-supervised learning with additional loss across the intermediate layers can improve the performance of the model on downstream tasks. By incorporating the intermediate representation into the FL process, more effective regularization for the data heterogeneity problem can be implemented.

Further, FedIntR can be considered a general approach that does not require manual selection of layers to be included in regularization, because FedIntR automatically determines the contribution of different intermediate layers to regularization based on the similarity between local and global representations.

With this, distributed edge devices (e.g., IoT devices, smartphones, data storage space, etc.) can jointly train models that can provide intelligent services using a federated learning mechanism having greater tolerance to data heterogeneity problems.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating an example of federated learning.

FIG. 2 is a diagram illustrating a local loss calculation method.

FIG. 3 is a diagram illustrating a FedIntR algorithm.

FIG. 4 is a diagram illustrating a server-terminal architecture according to an embodiment.

FIG. 5 is a diagram illustrating a configuration of a server and a configuration of a terminal according to an embodiment.

FIG. 6 is a diagram illustrating a method for updating a model by the terminal according to an embodiment.

FIG. 7 is a diagram illustrating a method for updating a model by the server according to an embodiment.

FIG. 8 is a diagram illustrating data exchange between the server and the terminal according to an embodiment.

FIG. 9 is an exemplary diagram for describing an environment in which a base station operates according to an embodiment.

DETAILED DESCRIPTION

Hereinafter, a specific embodiment of the present disclosure will be described with reference to the drawings. The following detailed description is provided to aid in a comprehensive understanding of the methods, apparatus and/or systems described herein. However, this is illustrative only, and the present disclosure is not limited thereto.

In describing the embodiments of the present disclosure, when it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the present disclosure, a detailed description thereof will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present disclosure, which may vary according to the intention or custom of users or operators. Therefore, the definition should be made based on the contents throughout this specification. The terms used in the detailed description are only for describing embodiments of the present disclosure, and should not be limiting. Unless explicitly used otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as “comprising” or “including” are intended to refer to certain features, numbers, steps, actions, elements, some or combination thereof, and it is not to be construed to exclude the presence or possibility of one or more other features, numbers, steps, actions, elements, some or combinations thereof, other than those described.

In the following description, “transfer,” “communication,” “transmission,” “reception,” of a signal or information and other terms having similar meaning include not only direct transmission of a signal or information from one component to another component, but also transmission of the signal or information through another component. In particular, “transferring” or “transmitting” a signal or information to a component indicates a final destination of the signal or information and does not mean a direct destination. This is the same for “receiving” a signal or information. In addition, in this specification, the fact that two or more pieces of data or information are “related” means that if one data (or information) is acquired, at least part of the other data (or information) can be obtained based on it.

In addition, terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. Terms may be used for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component without departing from the scope of the present disclosure.

Utilizing the representation of at least one intermediate layer and assigning an appropriate weight to each contribution can provide more fine-grained regularization to the training process. Hereinafter, an idea of using the representation extracted from the intermediate layer to overcome the performance degradation caused by non-independent and identically distributed (non-IID) data of the FL will be described. In this specification, a method of enhancing the similarity between intermediate layer representations of the local model and the global model is presented. This is referred to as the Federated learning Intermediate Representations (FedIntR) algorithm (hereinafter referred to as “FedIntR”). In one embodiment, FedIntR can calculate a regularization term based on contrast loss using the local and global intermediate representations. In addition, FedIntR can also automatically calculate layer-wise weights to determine the degree to which each intermediate layer contributes to the regularization term. In addition, FedIntR may assign a greater contribution weight to a layer having a higher global and local representation similarity.

FIG. 1 is a diagram illustrating an example of federated learning using FedIntR. FedIntR trains a global model w_gby solving the following problems.

$\begin{matrix} w_{g}^{*} = \arg \min_{w_{g}} \sum_{i = 1}^{N} \frac{❘ 𝒟_{𝒾} ❘}{❘ 𝒟 ❘} ℒ_{𝒾} (w_{g}), & [Equation 1] \end{matrix}$

Here, custom-character _iis a local loss of an i-th client, D_iis a data set of the i-th client, N represents the total number of clients, and |D|=Σ_i=1^N|D_i|. Besides the cross-entropy loss, FedIntR adds a regularization term calculated with the help of the intermediate representation of a global model w_g^tand a local model w_i^t−1in the previous round.

FedIntR according to an embodiment may be implemented by incorporating regularization into a local training stage of FedAvg. For example, FedIntR can be implemented by incorporating regularization into the local training second stage of FedAvg. For example, FedIntR can be implemented by modifying a local training process of vanilla FedAvg. Vanilla FedAvg is a known algorithm, and thus description thereof is omitted.

Referring to FIG. 1, four main steps can be performed for each training round t. First, a central server can distribute the global model w_g^tto a client i (S110). Next, the client i can train the local model w_i^tfor personal data D_i(S120). Here, the local model w_i^tmay be a model regularized by the intermediate representations in the global model w_g^tand a previous local model w_i^t−1. Next, the trained local model can be returned to the server (S130). Next, the server can aggregate local parameters received from the respective clients in order to update the global model w_g^t+1(S140).

FIG. 2 is a diagram illustrating a local loss calculation method of FedIntR. As illustrated in FIG. 2, FedIntR can calculate a regularization term 242 using intermediate representation z of a local model w_i^t210, a global model w 220, and a previous local model w_i^t−1230. More specifically, in FIG. 2, the intermediate representation z of the local model w_i^t210 is illustrated as z¹, . . . , z^K, the intermediate representation z of the global model w_g^tt 220 is illustrated as z_g¹, . . . , z_p^K, and the intermediate representation z of the previous local model w_i^t−1230 is illustrated as z_p¹, . . . , z_p^K. A weight 250 and a representation loss 260 can be calculated for each layer using the intermediate representation z. Then, the regularization term 242 can be calculated using the weight and the representation loss. A local training loss 240 can be calculated by adding this regularization term 242 and a cross-entropy loss l_sup241. One embodiment of FedIntR can be described in pseudo code as illustrated in FIG. 3.

The FedIntR algorithm in FIG. 3 will be described in more detail. A local training process 310 begins with the i-th client receiving the global model w_g^tfrom the central server. The client synchronizes the previous local model w_i^t−1with the current model w_i^t(the model to be trained) and then synchronizes w_i^twith w_g^t. It is assumed that the developed model structure contains a total of K layers (e.g., convolutional or fully connected layers) from which representations can be extracted. Given an input x, FedIntR extracts representations z^k, z_p^k, and z_g^kfrom k-th layers of the current local model w_i^t, the previous local model w_i^t−1, and the global model w_g^t, respectively. Extraction of the representation can be performed by passing an output of the layer through projection heads q₁to q_kto obtain the representation, as illustrated in FIG. 2. Here, the projection head q may consist of at least one (dense) layer, and the obtained representation z may be obtained in the form of a numeric vector.

FedIntR can calculate the representation loss for the k-th layer using the equation below.

$\begin{matrix} ℓ_{k} = - \log \frac{\exp (sim (z^{k}, z_{g}^{k}) / τ)}{\exp (sim (z^{k}, z_{g}^{k}) / τ) + \exp (sim (z^{k}, z_{p}^{k}) / τ)}, & [Equation 2] \end{matrix}$

Here, τ is a temperature parameter and sim( ) is a similarity function. A cosine similarity function below can be used as the similarity function.

$\begin{matrix} sim (z_{i}, z_{j}) = \frac{z_{i}}{{ z_{i} }_{2}} \cdot \frac{z_{j}}{{ z_{j} }_{2}} . & [Equation 3] \end{matrix}$

Equation 2 will be described in more detail. The layer-wise representation loss l_kencourages the local representation z^kto move further away from z_p^kand closer to the global representation z_g^k. Through this, the regularization term can maximize the distance between the local representation z^kand the previous local representation z_p^k, and minimize the distance between the local representation z^kand the global representation z_g^k.

The layer-wise representation loss l_kis calculated for the k-th layer of the model, and may have different importance for each layer. Therefore, a different weight α may be assigned to each layer-wise representation loss l_k. A layer-wise weight α_k, which represents the contribution of l_kto the regularization term, is determined based on the similarity of z^kand z_g^kusing the softmax function. For example, if the k-th layer has a higher similarity between z^kand z_g^kthan other layers, the representation loss l_kof that layer may be assigned a higher weight. Specifically, α_kcan be calculated as follows.

$\begin{matrix} α_{k} = \frac{\exp (sim (z^{k}, z_{g}^{k}) / τ)}{\sum_{\hat{k} = 1}^{K} \exp (sim (z^{\hat{k}}, z_{g}^{\hat{k}}) / τ)}, & [Equation 4] \end{matrix}$

Here, Σ_k=1^Kα_kis 1.

FedIntR calculates α_kand l_kfor all layers k∈[1, 2, . . . , K]. After that, [α_k]_k=1^Kand [l_k]_k=1^Kcan be incorporated into the local training loss as regularization terms. Local loss custom-character is defined as (5) and can be calculated using the equation below.

$\begin{matrix} ℒ = ℓ_{\sup} + μ \sum_{k = 1}^{K} α_{k} ℓ_{k} & [Equation 5] \end{matrix}$

Here, l_suprepresents the cross-entropy loss and the second element is a regularization term having μ which is a balancing parameter.

FIG. 4 is a diagram illustrating a server-terminal architecture according to an embodiment. The federated learning method according to an embodiment can be performed by adopting FedIntR described above. A system for the federated learning may be configured to include a server 410 and at least one terminal 421, . . . , 423, as illustrated in FIG. 4. The system may be configured to include one terminal or N terminals.

FIG. 5 is a diagram illustrating the configuration of the server and the configuration of the terminal according to an embodiment. FIG. 5 is a diagram illustrating a minimum configuration that the server can have. The server may be configured to include a processor 510, a communicator 520, and a memory 500, as illustrated in FIG. 5. The terminal may also be configured to include the processor 510, the communicator 520, and the memory 500, as illustrated in FIG. 5. The server and the terminal can perform data communication through the communicator included in the server and the terminal. The server and terminal can perform federated learning using the processor and the memory included in the server and terminal.

For example, the processor of the terminal may operate in order for the terminal to perform the following model learning method. FIG. 6 is a diagram illustrating a method for updating a model by the terminal according to an embodiment. The model learning method performed by the terminal according to an embodiment may be performed including obtaining at least one model of the previous model and the global model (S610), obtaining a representation of the obtained model (S620), and updating a current model using the representation (S630). And, the terminal can use the communicator to receive the global model from the server and transmit the local parameter constituting the updated current model to the server.

More specifically, the representation may be obtained for each intermediate layer constituting the obtained model. For example, the representation can be determined by performing computation of calculating a predetermined value on an intermediate layer result value of the obtained model.

In addition, a previous model, a global model and a current model may be determined, a representation may be obtained for the previous model, the global model and the current model, and the current model may be updated based on the representation obtained for the previous model, the global model, and the current model.

The current model may be updated based on a representation loss, and the representation loss may be determined based on at least one of a similarity between the representation obtained from the previous model and the representation obtained from the current model and a similarity between the representation obtained from the current model and the representation obtained from the global model. The representation loss may be determined for each layer constituting the current model.

The representation loss may be determined as a value that lowers the similarity between the representation obtained from the previous model and the representation obtained from the current model, and increases the similarity between the representation obtained from the current model and the representation obtained from the global model.

The current model may be updated by applying a weight to the representation loss. The weight may be determined for each layer constituting the current model. The weight may be determined based on the similarity between the representation obtained from the current model and the representation obtained from the global model.

In addition, the server according to an embodiment includes a communicator and a processor, and each component of the server can operate in order for the server to perform the following model learning method. FIG. 7 is a diagram illustrating a method for updating a model by the server according to an embodiment. For example, the model learning method performed by the server may include transmitting the global model to the terminal (S710) and receiving the local parameter from the terminal (S720). Here, the local parameter may be determined by the terminal obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation.

FIG. 8 is a diagram illustrating data exchange between the server and the terminal according to an embodiment. The server may transmit the global model to the terminal (S810). Next, the terminal can determine the local parameter using the received global model (S820). This can be performed as described with reference to FIG. 6. For example, as described above, the terminal can determine the local parameter by updating the model using the representations of the previous model, the global model, and the current model (S820). Next, the terminal can transmit the local parameter to the server (S830).

As described above, the federated learning method using FedIntR can be implemented by performing model learning by the server and the terminal that integrates regularization into the local training process. Self-supervised learning with additional loss across intermediate layers can improve the performance of the model on downstream tasks. By incorporating intermediate representations into the FL process, more effective regularization for the data heterogeneity problem can be implemented.

As described above, in federated learning, the representation of the intermediate layers of the entire model structure can be used to regularize the local training process. With this, the necessity of manual determination on the intermediate layer to be included in the regularization process may be negated. In addition, more information could be incorporated into the regularization process to effectively guide the local training process.

As described above, the weights may be assigned to different intermediate layers of the model in order to determine their contribution to the regularization term. The contribution weight for each intermediate layer may be calculated as a different value for each layer, as described above. With this, it is possible to avoid performance degradation of the global model that occurs due to assigning the same contribution weight (i.e., taking the average of all intermediate representation losses as regularization).

In addition, the contribution of each layer to the regularization term may not be determined manually, in that the contribution of each intermediate layer can be calculated using the similarity between the local intermediate representation and the global intermediate representation.

In this specification, a module may mean a functional and structural combination of hardware for carrying out the technical idea of the present invention and software for driving the hardware. For example, the “module” may mean a predetermined code and a logical unit of a hardware resource for executing the predetermined code, and does not necessarily mean a physically connected code or a single type of hardware.

FIG. 9 is a block diagram for illustratively describing a computing environment 10 including a computing device suitable for use in exemplary embodiments. In the illustrated embodiment, respective components may have different functions and capabilities other than those described below, and may include additional components in addition to those described below.

The illustrated computing environment 10 includes a computing device 12. In an embodiment, the computing device 12 may be the base station. The computing device 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. The processor 14 may cause the computing device 12 to operate according to the exemplary embodiment described above. For example, the processor 14 may execute one or more programs stored on the computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 14, may be configured so that the computing device 12 performs operations according to the exemplary embodiment.

The computer-readable storage medium 16 is configured so that the computer-executable instruction or program code, program data, and/or other suitable forms of information are stored. A program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, the computer-readable storage medium 16 may be a memory (volatile memory such as a random access memory, non-volatile memory, or any suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, other types of storage media that are accessible by the computing device 12 and capable of storing desired information, or any suitable combination thereof.

The communication bus 18 interconnects various other components of the computing device 12, including the processor 14 and the computer-readable storage medium 16.

The computing device 12 may also include one or more input/output interfaces 22 that provide an interface for one or more input/output devices 24, and one or more network communication interfaces 26. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. The input/output device 24 may be connected to other components of the computing device 12 through the input/output interface 22. The exemplary input/output device 24 may include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touch pad or touch screen), a speech or sound input device, input devices such as various types of sensor devices and/or photographing devices, and/or output devices such as a display device, a printer, a speaker, and/or a network card. The exemplary input/output device 24 may be included inside the computing device 12 as a component configuring the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12.

Although representative embodiments of the present disclosure have been described in detail, a person skilled in the art to which the present disclosure pertains will understand that various modifications may be made thereto within the limits that do not depart from the scope of the present disclosure. Therefore, the scope of rights of the present disclosure should not be limited to the described embodiments, but should be defined not only by claims set forth below but also by equivalents to the claims.

Claims

1. A model learning method performed by a terminal, comprising: obtaining at least one model of a previous model and a global model;obtaining a representation of the obtained model; andupdating a current model using the representation.
2. The model learning method of claim 1, wherein the representation is obtained for each intermediate layer constituting the obtained model.
3. The model learning method of claim 1, wherein a previous model, a global model and a current model are determined,a representation is obtained for the previous model, the global model and the current model, andthe current model is updated based on the representation obtained for the previous model, the global model, and the current model.
4. The model learning method of claim 3, wherein the current model is updated based on a representation loss, andthe representation loss is determined based on at least one of a similarity between the representation obtained from the previous model and the representation obtained from the current model and a similarity between the representation obtained from the current model and the representation obtained from the global model.
5. The model learning method of claim 4, wherein the representation loss is determined for each layer constituting the current model.
6. The model learning method of claim 5, wherein the current model is updated by applying a weight to the representation loss, andthe weight is determined for each layer constituting the current model.
7. The model learning method of claim 1, wherein the representation of the obtained model is determined by performing computation of calculating a predetermined value on an intermediate layer result value of the obtained model.
8. The model learning method of claim 4, wherein the representation loss is determined as a value that lowers the similarity between the representation obtained from the previous model and the representation obtained from the current model, and increases the similarity between the representation obtained from the current model and the representation obtained from the global model.
9. The model learning method of claim 6, wherein the weight is determined based on the similarity between the representation obtained from the current model and the representation obtained from the global model.
10. A terminal including a memory and a processor for performing the method of claim 1, wherein the processor is configured to obtain at least one model of a previous model and a global model,obtain a representation of the obtained model, andupdate a current model using the representation.
11. A model learning method performed by a server, comprising: transmitting a global model to a terminal; andreceiving a local parameter from the terminal, whereinthe local parameter is determined by the terminal obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation.
12. The model learning method of claim 11, wherein the representation is obtained for each intermediate layer constituting the obtained model by the terminal.
13. The model learning method of claim 11, wherein a previous model, a global model and a current model are determined by the terminal,a representation is obtained for the previous model, the global model and the current model by the terminal, andthe current model is updated based on the representation obtained for the previous model, the global model, and the current model.
14. The model learning method of claim 13, wherein the current model is updated based on a representation loss, andthe representation loss is determined based on at least one of a similarity between the representation obtained from the previous model and the representation obtained from the current model and a similarity between the representation obtained from the current model and the representation obtained from the global model.
15. The model learning method of claim 14, wherein the representation loss is determined for each layer constituting the current model.
16. The model learning method of claim 15, wherein the current model is updated by applying a weight to the representation loss, andthe weight is determined for each layer constituting the current model.
17. The model learning method of claim 11, wherein the representation of the obtained model is determined by performing computation of calculating a predetermined value on an intermediate layer result value of the obtained model, by the terminal.
18. A server comprising: a communicator; anda processor, whereinthe processor is configured to control the communicator to transmit a global model to a terminal,the processor is configured to control the communicator to receive a local parameter from the terminal, andthe local parameter is determined by the terminal obtaining at least one model of a previous model and a global model, obtaining a representation of the obtained model, and updating a current model using the representation.
19. A computer-readable recording medium on which a computer program for performing the method of claim 1 is recorded.
20. A computer program that is recorded on a computer-readable recording medium and is for performing the method of claim 1.

Priority Claims (2)

Number	Date	Country	Kind
10-2022-0152754	Nov 2022	KR	national
10-2023-0029872	Mar 2023	KR	national

APPARATUS AND METHOD FOR TACKLING DATA HETEROGENEITY IN FEDERATED LEARNING USING INTERMEDIATE LAYER REPRESENTATION REGULARIZATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (2)