EDGE-CLIENT COLLABORATIVE FEDERATED GRAPH LEARNING WITH ADAPTIVE NEIGHBOR GENERATION

TECHNICAL FIELD

The present invention belongs to the technical field of Federated Graph Learning, in particular relates to edge-client collaborative federated graph learning with adaptive neighbor generation.

BACKGROUND

With powerful expressive capabilities, graphs have been widely used to depict real-world application scenarios such as social network, knowledge graph. In the area of graph learning, the emerging Graph Neural Networks (GNNs) have gained significant attention due to their exceptional performance in dealing with graphrelated tasks. GNNs efficiently utilize the feature propagation by employing multiple graph convolutional layers for node classification tasks, where the structural knowledge is distilled into discriminative representations from complex graph-orient data in diverse domains such as prediction modeling, malware detection, and resource allocation. Commonly, the training performance of GNNs depends on the substantial graph data distributed among clients. However, due to privacy and overhead concerns, it is impractical to assemble graph data from all clients for GNN training.

Following a distributed training mode, Federated Graph Learning (FGL) aims to deal with the problem of graph data island by promoting cooperative training among multiple clients. To protect privacy, the FGL offers generalized graph mining models over distributed subgraphs without sharing raw data. Many studies have verified the feasibility of FGL in various domains such as transportation, computer vision, and edge intelligence. Recently, some studies also adopted FGL-based frameworks for semi-supervised classification tasks. These approaches typically join an edge server with multiple clients to train a globally-shared classifier for downstream tasks, where the clients and edge server undertake local updating and global aggregation, respectively.

In real-world FGL application scenarios, there are potential links between the subgraphs of a client and others since these subgraphs contain significant information about neighbor clients. However, previous FGL-related studies overlooked such important links among clients, as shown in FIG. 1 (left). This oversight results in the insufficient feature propagation of multi-hop neighbors during local model training, ultimately leading to degraded performance in classification tasks. To explore the potential links among clients, some prior studies inferred the missing links in subgraphs, as shown in FIG. 1 (middle). For example, Zhang et al. employed linear predictors on local models to conduct the missing links in subgraphs. However, the potential links rely solely on local clients, disregarding meaningful information from neighbor clients. Therefore, the features implied in the generated links may be incomplete and infeasible to recover the cross-client information. Moreover, most existing studies commonly adopted the classic FedAvg algorithm, neglecting the high training costs when the number of clients continues to expand, which leads to a serious singlepoint overload problem.

SUMMARY

The purpose of the present invention is to provide a edge-Client Collaborative Federated Graph Learning with Adaptive Neighbor Generation, consider the typical FGL scenario with distributed graph datasets; based on this setting, first propose an improved centralized FGL framework, named FedGL; next, extend the FedGL to the scenario of multi-edge collaboration and propose a novel distributed FGL framework, named SpreadFGL.

Consider an edge server to communicate with M clients; the FedGL leverages the edge server S_jas an intermediary to facilitate the information flow among clients, where S_jcovers all clients, denoted by M_j=M; incorporate a graph imputation generator to construct learnable links, thereby generating the latent links between subgraphs; employ a L-layer GNN model with the local node classifier F_i^j, defined as

$\begin{matrix} H^{(j, i)} = {GNNconv}_{w (j, i)} (ε^{ji}, X^{ji}) & (6) \end{matrix}$

- where GNNconv(⋅) is a GNN model and H^(j,i)indicates the GNN output of the l-th client covered by S_j; the feature propagation of the (l+1)-th layer is given in Eq. (3); moreover, the Cross-Entropy loss function is adopted for the l-th client covered by S_jin the downstream tasks, defined as

$\begin{matrix} ℒ_{F_{i}^{j}} = i_{j} (F_{i}^{j} (W_{(j, i)})) = - \sum_{u = 1}^{❘ t_{ji} ❘} \sum_{r = 1}^{c} Y_{ur}^{ji} \ln H^{(j, i)} & (7) \end{matrix}$

- where Y_u^jiis the inference vector of the node u conducted by local training;

For every edge-client communication in FedGL, each client parallelly trains the local node classifier F_i^jparameterized by W_(j,i)in local training rounds, formulated as

$\begin{matrix} W_{(j, t)}^{t + 1} = W_{(j, t)}^{t} - α \nabla R_{i} (F_{i} (W_{(j, t)}^{t})) & (8) \end{matrix}$

- where α is the learning rate; t∈[T_l−1] indicates the local training rounds;

After local training, S_jaggregates local parameters {W_(j,i)|i∈[M_j]} to update global ones W_j, and then broadcasts W_jto all clients at each edge-client communication.

The clients upload the processed embeddings {H^(j,i)|i∈[M_j]} to the edge server at every custom-character intervals of edge-client communication, where the original linked nodes remain proximate in the low-dimensional space; next, the graph imputation generator performs the fusion on the processed embeddings to obtain the globally-shared information H^j∈|^j|×c, where v^jis the number of all clients covered by S_j; based on this, H^jis denoted as

$\begin{matrix} H^{j} = ⌊ H^{(j, 1)}  \dots  H^{(j, M_{j})} ⌋ . & (9) \end{matrix}$

The graph imputation generator utilizes the distance to evaluate the node similarity and construct the global topology graph, referred to Ā^j=H^jH^j^T; next, k most similar nodes are selected from this topology graph as potential cross-subgraph links, denoted by the set ε^j; to generate the potential feature vectors X^junder the guidance of the globally-shared information, an autoencoder parameterized by ΦAE is used to explore overcomplete underlying representations from H^j; furthermore, to guarantee data privacy, the random noisy vector S is input to the autoencoder, and thus the output of the autoencoder is reconstructed as H^j=h(f(S)), where f(⋅) and h(⋅) are the encoder and decoder, respectively; it is noted that X^j=f(S) indicates the potential features expected to be extracted by the encoder; with the autoencoder, the random noisy vector is mapped to the same dimension as H^j, and the output of the (l+1)-th layer is defined as

$\begin{matrix} {\overline{H}}^{(j, l + 1)} = σ ({\overline{H}}^{(j, l)} W_{a}^{(j, l + 1)} + b_{a}^{(j, l + 1)}) & (10) \end{matrix}$

- where W_a^(j,l+1)∈^d^l^×d^l+1and b_a^(j,l+1)∈^d^lare the layer-specific weights and biases, respectively; σ(⋅) denotes the activation function.

The assessor adopts a fully-connected neural network to evaluate H^j; the assessor takes the reconstructed globally shared information H^jas input in the form of a value, which is positively correlated with the quality evaluation of the reconstructed data; hence, the autoencoder tends to obtain a higher value under the supervision of the assessor and extract more valid global information; specifically, the loss function of the autoencoder is defined as

$\begin{matrix} \begin{matrix} {\hat{ℒ}}_{AE} = - \sum_{u} 𝔼_{p ({\overline{h}}_{u}^{j} ❘ \forall u \in 𝒱^{j})} (Assor ({\overline{h}}_{u}^{j})), \\ = \frac{1}{❘ j ❘} \sum_{u} \log (1 - Assor ({\overline{h}}_{u}^{j})) \end{matrix}, & (11) \end{matrix}$

- where _p(⋅) is the expectation of the variables in p(⋅), and p(h_u^−j|∀u∈v^j) indicates h sampled from the distribution of H^j; Assor(⋅) is the assessor that evaluates the constructed global information; to distinguish the original and reconstructed global data, we regard the globally-shared information as the criterion and train the assessor to assign higher scores; the assessor is trained to assign lower scores with the reconstructed global information; the assessor is able to guide the autoencoder to evolve more discriminative representations of latent features; the loss function of the assessor is defined as

$\begin{matrix} \begin{matrix} {\hat{ℒ}}_{AS} = - \sum_{n} [𝔼_{p (h_{u}^{j} ❘ \forall u \in 𝒱^{j})} (Assor (h_{u}^{j})) + \\ 𝔼_{p ({\overline{h}}_{u}^{j} ❘ \forall u \in 𝒱^{j})} (1 - Assor ({\overline{h}}_{u}^{j}))] \\ = \frac{1}{❘ j ❘} \sum_{u} [\log (1 - Assor (h_{u}^{j})) + \log (Assor ({\overline{h}}_{u}^{j}))] \end{matrix} & (12) \end{matrix}$

- where p(h_u^j|∀u∈v^j) denotes h_u^jsampled from the distribution of H^j;

The training processes of the autoencoder and assessor are performed simultaneously, where the assessor guides the autoencoder to learn more discriminative reconstructed data and potential features through back-propagation.

Based on the proposed versatile assessor, we first set a threshold θ∈(0, 1) in every training iteration of the autoencoder and select the attributes in h_u^jthat are less than θ; these attributes are deemed as negative and their feedbacks from the assessor are 0; next, the zero-regularization is used to process these negatives, and thus both the autoencoder and the assessor can spotlight the representations that are meaningful for downstream tasks; hence, the loss function of the assessor is updated and redefined as

$\begin{matrix} ℒ_{AS} = \frac{1}{❘ j ❘} \sum_{u} [\log (1 - Assor (h_{u}^{j} ⊙ e_{u})) + \log (Assor ({\overline{h}}_{u}^{j} ⊙ e_{u}))], & (13) \end{matrix}$

- where e_uis a c-dimensional vector that judges whether h_ui^j∈h_u^jis higher than θ (e_ui=1) or not (e_ui=0); ⊙ is the element-wise multiplication; correspondingly, the loss function of the autoencoder is updated and redefined as

$\begin{matrix} ℒ_{AE} = ⁠ \frac{1}{❘ j ❘} ⁠ \sum_{u} [⁠ \log (1 - Assor ({\overline{h}}_{u}^{j} ⊙ e_{u})) + { h_{u}^{j} ⊙ (- e_{u}) - {\overline{h}}_{u}^{j} ⊙ (- e_{u}) }_{2}^{2}], & (14) \end{matrix}$

- where h_u^jand h_u^jare the u-th vector of H^jand H^j, respectively; is an indicator vector with the values of 1; through the above operations, ε^jand X^jare used to form the learnable potential graph =(^j, ε^j, X^j).

The edge server S_jdivides custom-character into some subgraphs, denoted by the set {^ji(^ji, ε^ji, ^ji)|i∈[M_j]}, where ε^ji={ε_uv^ji|ε_uv^ji∈ε^j, ∀u, u∈v^ji} is the neighbor set of v^ji, ^ji={x_u^ji|u∈v^ji}, and x_u^ji={x_u^ji|ē_uv^ji∈ε^ji} indicates the potential neighbor feature vectors of u; next, S_jassigns the subgraphs to each client; it is noted that each local client repairs the subgraph by using the local graphic patcher P_i^jreferring to custom-character ^ji=P_i^j(^ji); by collaborating with the edge server, clients are expected to acquire diverse neighbor features from globally-shared information, thereby fixing cross-subgraph missing links.

Propose a novel distributed FGL framework, named SpreadFGL, that extends the FedGL to a multi-edge environment; the SpreadFGL is able to facilitate more efficient FGL training and better load balancing in a multiedge collaborative environment; consider that there are N edge servers, and an edge server S_jis equipped with a global node classifier F_jparameterized by W_j; besides, a client only communicates with its closest edge server; there exist neighbor relationships among the servers, denoted by the matrix A∈ custom-character ^N×N; if S_iand S_jare neighbors, a_ij=1; otherwise, a_ij=0; moreover, the parameter transmission is permitted between neighbor servers;

In SpreadFGL, the clients adopt the L-layer GNNs; the edge servers exchange information with the covered clients in each edge-client communication; at each K intervals of edge-client communications, the clients and their nearest edge servers collaboratively utilize the shared information to extract the potential links based on the proposed graph imputation generator and negative sampling mechanism;

- design a weight regularizer during the local training; based on trace normalization, the regularizer is used to enhance the network learning capability of the local node classifiers; specifically, the loss function of the i-th client under the coverage of S_jis defined as

$\begin{matrix} \begin{matrix} ℒ_{F_{i}^{j}} = i_{j} (F_{i}^{j} (W_{(j, i)})) \\ = - \sum_{u = 1}^{❘ t_{ji} ❘} \sum_{r = 1}^{C} Y_{ur}^{ji} \ln H^{(j, i)} + Tr (W_{(j, i, L)} W_{(j, i, L)}^{T}), \end{matrix} & (15) \end{matrix}$

- where Tr(⋅) is the square matrix trace; W_(j,i,L)indicates the parameters of L-th GNN layer for the local node classifier F_i^j;

To better explore the potential cross-subgraph links by using the information from other servers, adopt the topology structure at the edge layer to facilitate the parameter transmission between neighbor servers; this enables the information flow among clients via the gradient propagation at each custom-character intervals of edge-client communication; specifically, S_jfirst aggregates the model parameters of its neighbor servers; next, S_javerages the parameters and broadcast them to the covered clients; this process can be described as

$\begin{matrix} W_{j} \leftarrow 1 / (\sum_{r = 1}^{N} a_{rj} M_{r}) \sum_{r = 1}^{N} \sum_{i = 1}^{M_{r}} a_{rj} W_{(r, i)} & (16) \end{matrix}$

Compared with the prior art, the present invention has the following beneficial effects:

- we design the FedGL to repair the missing links between clients, where a new graph imputation generator is developed that incorporates a versatile assessor and negative sampling mechanism to explore refined global information flow, extracting unbiased latent links and thus improving the training effect. through ablation experiments and convergence analysis, we validate the effectiveness of the core components designed in the proposed frameworks and the advantage of the SpreadFGL for achieving faster convergence speed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows Comparison between the classic FGL and the FedGL designed in the proposed SpreadFGL. In FIG. 1 (left), the FGL does not consider the inter-links between clients, causing insufficient feature propagation of multihop neighbors. In FIG. 1 (middle), the FGL solely infers the missing links by local subgraphs but ignores the meaningful information in neighbor clients. In FIG. 1 (right), the proposed FedGL utilizes the globally-shared information among clients, thereby extracting important cross-subgraph links for classification tasks;

FIG. 2 shows Overview of the proposed SpreadFGL. The SpreadFGL targets a distributed scenario that consists of multiple edge servers and clients. At the edge layer, the autoencoder is employed to explore potential global features of the covered clients, and then the versatile assessor is combined with a negative sampling mechanism to supervise refined information, where model parameters transmission is permitted between neighbor edge servers. At the client layer, GNNs are used as local node classifiers for downstream tasks, and then graphic patchers are employed to repair subgraphs and missing cross-subgraph links;

FIG. 3 shows real-world testbed for FedGL and SpreadFG;

FIG. 4 shows ACC of SpreadFGL with various numbers of clients and labeled ratios;

FIG. 5 shows Accuracy of SpreadFGL with different values of custom-character ;

FIG. 6 shows ACC of SpreadFGL with various local training iterations T;

FIG. 7 shows ablation study on negative sampling mechanism and versatile assessor when M=6 and the labeled ratio is 0.3;

FIG. 8 shows training loss of different FGL-based frameworks when M=6 and the labeled ratio is 0.3;

FIG. 9 shows ACC of different FGL-based frameworks when M=6 and the labeled ratio is 0.3.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The technical solution of the present invention is described in detail in combination with the accompany drawings.

Proposed in the present invention is a Edge-Client Collaborative Federated Graph Learning with Adaptive Neighbor Generation. Framework is as shown in FIG. 1.

The method specifically comprises the following design process:

To address these essential challenges, we propose FedGL, an improved centralized FGL framework, to explore potential cross-subgraph links by leveraging the global information flow. As illustrated in FIG. 1 (right), we consider the edge server as an intermediary to facilitate the flow of global information, thereby enhancing communication among different clients and fostering the collaborative knowledge integration of their subgraphs. Thus, the proposed FedGL is able to extract unbiased and generalized missing links through collaboration among the edge server and clients. Furthermore, we extend the FedGL to a multi-edge collaborative scenario and propose the SpreadFGL to efficiently handle the load balancing issue at the edge layer. The main contributions of this application are summarized as follows.

- We propose an improved centralized FGL framework, named FedGL. In FedGL, GNNs are utilized as the node classifiers in clients for semi-supervised classification tasks, ensuring effective feature propagation.
- We design an adaptive graph imputation generator to explore generalized potential cross-subgraph links, referring to the globally-shared topology graph at the edge layer.
- We develop a new versatile assessor that incorporates a negative sampling mechanism to supervise the process of generating subgraphs, where the discriminate features are constructed by autoencoder. Thus, we can focus on more refined features that are beneficial to classification tasks.
- We propose a novel distributed FGL framework, named SpreadFGL, which extends the FedGL to a multi-edge scenario. In SpreadFGL, the neighbor edge servers collaboratively conduct model training with a well-designed distributed loss function, enabling efficient extraction of the potential links between subgraphs. Thus, the SpreadFGL facilitates faster model convergence and better load balancing at the edge layer.
- Using real-world testbed and benchmark graph datasets, extensive experiments are conducted to demonstrate the superiority of the proposed SpreadFGL. The results show that the SpreadFGL outperforms state-of-the-art algorithms from the perspectives of model accuracy and convergence speed.

1. Related Work
A. GraphNeuralNetworks

Graph Neural Networks [18] have drawn considerable attention in recent years due to their remarkable capabilities. As an emerging technique in semi-supervised learning, GNNs

- can achieve accurate node classification for massive unlabeled nodes by training scarce labeled data. Considering the advanced ability in modeling graph structures, GNNs have derived several variants such as Graph Convolutional Networks (GCNs), Graph Attention Networks (GAT), and GraphSAGE. For example, GCNs conduct the operations of neural networks on graph topology, which have been widely used in semi-supervised learning tasks. The inference vector of the node u on the (l+1)-th GCN layer is defined as

$\begin{matrix} h_{u}^{(l + 1)} = σ {AGG (h_{υ}^{(l)}, e_{u υ}) ❘ \forall υ \in} & (1) \end{matrix}$

- where h_v^(l)is the vector of the node u in the l-th GCN layer. e_uvindicates the link between the node u and v. AGG(⋅) is an aggregator function used to integrate the neighbor features of node u via e_uv. And σ(⋅) is a non-linear activation function.

The GAT incorporates GCNs with attention mechanisms to adaptively assign the weights α_uv^(l+1)for the neighbors of the node u, and the inference vector is defined as

$\begin{matrix} h_{u}^{(l + 1)} = σ {AGG (α_{u υ}^{(l + 1)}, e_{u υ}) ❘ \forall υ \in} & (2) \end{matrix}$

The GraphSAGE aggregates node features by sampling from neighbor nodes and the inference vector is defined as

$\begin{matrix} h_{u}^{(l + 1)} = σ {h_{u}^{(l)}  AGG (h_{υ}^{(l)}, e_{u υ}) ❘ \forall υ \in} & (3) \end{matrix}$

- where ∥ denotes the concatenation operation.

There is an urgent need to study the restoration of missing cross-subgraph links for better handling the semi-supervised node classification.

B. Federated Graph Learning

Federated Graph Learning (FGL) has emerged as a captivating topic in recent years. Different from the classic GNN that relies on centralized feature propagation across the entire graph, FGL enables distributed clients to collectively maintain a globally-shared model through gradient aggregation. Many efforts have contributed to this topic. For instance, He et al. proposed a graph-level scheme that distributed graph datasets across multiple clients, catering to various downstream tasks. Wu et al. designed an FGL framework for recommendation tasks, where subgraphs contain overlapped items. Xie et al. developed an FGL based framework to mitigate the heterogeneity among features and graphs. They employed clustering techniques to aggregate clients based on the GNN gradients, aiming to enhance the collaboration efficiency of federated learning.

However, the above studies overlooked the pervasive missing links between clients happened in real-world scenarios, which may cause undesired performance in downstream tasks.

To the best of our knowledge, few studies well considered and tackled the problem of missing cross-subgraph links. Zhang et al. utilized a local linear predictor to explore the potential relationships between clients according to the local subgraph structure. However, the cross-subgraph relationships rely on important information from neighbor clients, which makes it hard to find the potential links only using local subgraphs, thereby leading to inefficient recovery of crossclient information. Moreover, prior studies commonly adopted the classic FedAvg for training, ignoring the overload of a single node (e.g., edge server) especially when the number of clients expands.

2. The Proposed SpreadFGL

In this section, we consider the typical FGL scenario with distributed graph datasets. Based on this setting, we first propose an improved centralized FGL framework, named FedGL. Next, we extend the FedGL to the scenario of multi-edge collaboration and propose a novel distributed FGL framework, named SpreadFGL. Specifically, FIG. 2 provides a detailed illustration of the proposed SpreadFGL.

A. Overview and Motivation

A graph dataset is denoted as custom-character (, Y)_D(G,Y), where =(, ε, X) is a global graph. is the node set, where ||=n, ε={e_uv} is the edge set that stores the link relationship between the nodes and , where ∀, v∈. X∈^n×dindicates the node feature matrix, where ∈^dis the feature vector of the i-th node. Y=[0, 1]∈ custom-character ^n×cis the label matrix, where c is the number of classes. Considering that there are N edge servers and M clients. The edge server S_jcovers M_jlocal clients {C_i^j|∈[M_j]} to conduct the FGL training, where Σ_j=1^NM_j=M. The client owns the part samples of the graph dataset, denoted by custom-character _i^j{^ji, Y^ji}, where ^ji=(^ji, ε^ji, X^ji) is a local subgraph and Y^jiis the sub-label matrix of nodes v^ji. To simulate the realworld scenario of missing links between clients, we consider that there are no shared nodes and connected links among clients, formulated by ^ji∩^ĵr=∅, where ∀ custom-character , r∈[M_j] and ≠r if =, and ∈[M_j], ∀r∈[M_ĵ] if ≠ĵ. The subgraphs of all clients form the complete graph, defined by Σ_j=1^NΣ_i=1^M^j|^ji|=n. Thus, there is no link between any two clients, and a client cannot directly retrieve the node features from another client. For clarity, Table I lists the main notations used in this application.

Based on the above scenario, the client custom-character , owns a local node classifier F_i^j, and graphic patcher P_i^j, and all clients can jointly learn graph representations for semi-supervised node classification. Generally, the proposed SpreadFGL aims to conduct collaborative learning on independent subgraphs across all clients, prioritizing the privacy of raw data. Therefore, the SpreadFGL obtains the global node classifiers {F_j|j∈[N]} parameterized by {W_j|j∈[N]} in the edge servers for downstream tasks. With this consideration, we formulate the optimization problem as minimizing the aggregated risks to find the optimal weights {W_j|j∈[N]}, defined as

$\begin{matrix} \sum_{j = 1}^{N} \min_{W_{j}} ℛ_{j} (F_{j} (W_{j})) = \sum_{j = 1}^{N} \min_{W_{(j, i)}} \frac{1}{M_{j}} \sum_{i = 1}^{M_{i}} ℛ_{i}^{j} (F_{i}^{j} (W_{(j, i)})) & (4) \end{matrix}$

- where W_(j,i)is the learnable weights of local node classifier F_i^j. _i^jis the loss function of the global node classifier F_j. And _i^ji is the loss function of the i-th client that is used to measure the local empirical risk,

$\begin{matrix} ℛ_{i}^{j} (F_{i}^{j} (W_{(j, i)})) = \frac{1}{❘ t_{ji} ❘} \sum_{υ \in t_{ji}} ℛ_{i}^{j} (W_{j}; F_{i}^{j} (ji (υ)), y_{υ}^{ji}) & (5) \end{matrix}$

Where custom-character _t^ji⊆^jiis the labeled training set in the i-th client, y_v^jiis the ground truth of node v in the i-th client.

TABLE I

MAIN NOTATIONS USED IN THIS PAPER

Notation
Description
Notation
Description

S_j, C_i^j
The j-th edge server and i-th client covered by S_j
N, M
Total number of edge servers and clients

custom-character

, X, Y
Global graph, feature matrix, and label matrix

custom-character

^ji, X^ji, Y^ji
Subgraph, sub-feature matrix, and sub-label matrix

F_i^j, W_{(j, i)}
Local node classifier and the parameters
F_j
Global node classifier

P_i^j
Local graphic patcher

custom-character

_F_i^j, H^{(j, i)}
Loss function and the output of F_i^j

H^j
Globally-shared information
Ā^j, X^j
Global topology graph and global potential features

custom-character

_AS
Loss function of assessor

custom-character

_AE
Loss function of autoencoder

B. FedGL. Centralized Federated Graph Learning

Since clients cannot directly capture cross-subgraph links that contain important neighbor information, the feature propagation from higher-order neighbors becomes inadequate, resulting in degraded classification performance. Therefore, it is crucial to explore the potential topology links among clients. To achieve this goal, we propose an improved centralized FGL framework, named FedGL. In FedGL, we consider an edge server to communicate with M clients. The FedGL leverages the edge server S_jas an intermediary to facilitate the information flow among clients, where S_jcovers all clients, denoted by M_j=M. Specifically, we incorporate a graph imputation generator to construct learnable links, thereby generating the latent links between subgraphs. To enhance feature propagation in local tasks and facilitate subsequent inference with the global model, we employ a L-layer GNN model with the local node classifier F_i^j, defined as

$\begin{matrix} H^{(j, i)} = {GNNconv}_{w (j, i)} (ε^{ji}, X^{ji}) & (6) \end{matrix}$

- where GNNconv(⋅) is a GNN model and H^(j,i)indicates the GNN output of the l-th client covered by S_j. The feature propagation of the (l+1)-th layer is given in Eq. (3). Moreover, the Cross-Entropy loss function is adopted for the l-th client covered by S_jin the downstream tasks, defined as

$ℒ_{F_{i}^{j}} = i_{j} (F_{i}^{j} (W_{(j, i)})) = - \sum_{u = 1}^{❘ t_{ji} ❘} \sum_{r = 1}^{C} Y_{ur}^{ji} \ln H^{(j, i)}$

where Y_u^jiis the inference vector of the node u conducted by local training.

For every edge-client communication in FedGL, each client parallelly trains the local node classifier F_i^jparameterized by W_(j,i)in local training rounds, formulated as

$\begin{matrix} W_{(j, t)}^{t + 1} = W_{(j, t)}^{t} - α \nabla R_{i} (F_{i} (W_{(j, t)}^{t})) & (8) \end{matrix}$

- where α is the learning rate. t∈[T_l−1] indicates the local training rounds.

After local training, S_jaggregates local parameters {W_(j,i)|i∈[M_j]} to update global ones W_j, and then broadcasts W_jto all clients at each edge-client communication.

C. Graph Imputation Generator with Versatile Assessor

To capture the potential cross-subgraph links, we design a graph imputation generator and incorporate it with a versatile assessor to explore a learnable potential graph custom-character ^j=(^j, ε^j, X^j) for mending the cross-subgraph links.

Graph Imputation Generator. To construct the globally shared information without revealing raw data, the clients upload the processed embeddings {H^(j,i)|i∈[M_j]} to the edge server at every custom-character intervals of edge-client communication, where the original linked nodes remain proximate in the low-dimensional space. Next, the graph imputation generator performs the fusion on the processed embeddings to obtain the globally-shared information H^j∈|^j|×c, where ^jis the number of all clients covered by S_j. Based on this, H^jis denoted as

$\begin{matrix} H^{j} = ⌊ H^{(j, 1)}  \dots  H^{(j, M_{j})} ⌋ & (9) \end{matrix}$

In real-world application scenarios of FGL, it is possible for each node in clients to own potential cross-subgraph links, and it may be insufficient for clients to propagate features in multi-hop neighbors if missing these cross-subgraph links. In response to this problem, the graph imputation generator utilizes the distance to evaluate the node similarity and construct the global topology graph, referred to Ā^j=H^jH^j^T. Next, k most similar nodes are selected from this topology graph as potential cross-subgraph links, denoted by the set ε^j. To generate the potential feature vectors X^junder the guidance of the globally-shared information, an autoencoder parameterized by ΦAE is used to explore overcomplete underlying representations from H^j. Furthermore, to guarantee data privacy, the random noisy vector S is input to the autoencoder, and thus the output of the autoencoder is reconstructed as H^j=h(f(S)), where f(⋅) and h(⋅) are the encoder and decoder, respectively. It is noted that X^j=f(S) indicates the potential features expected to be extracted by the encoder. With the autoencoder, the random noisy vector is mapped to the same dimension as H^j, and the output of the (l+1)-th layer is defined as

$\begin{matrix} {\overline{H}}^{(j, l + 1)} = σ ({\overline{H}}^{(j, l)} W_{a}^{(j, l + 1)} + b_{a}^{(j, l + 1)}) & (10) \end{matrix}$

Where W_a^(j,l+1)∈ custom-character ^d^l^×d^l+1and b_a^(j,l+1)∈^d^lare the layer-specific weights and biases, respectively. σ(⋅) denotes the activation function.

Versatile Assessor. Since the conditional distribution of H^jrelies on X^jand is independent of S, {S→X^j→H^j} in the autoencoder follows the Markov principle. Therefore, we design a versatile assessor parameterized by ΦAS to supervise the quality of reconstruction data from the decoder, aiming to extract the expected underlying features X^jtailored for node classification. Considering the diversity of datasets, the assessor should be trainable to fit in specific tasks. Thus, the assessor adopts a fully-connected neural network to evaluate H^j. Concretely, the assessor takes the reconstructed globally shared information H^jas input in the form of a value, which is positively correlated with the quality evaluation of the reconstructed data. Hence, the autoencoder tends to obtain a higher value under the supervision of the assessor and extract more valid global information. Specifically, the loss function of the autoencoder is defined as

$\begin{matrix} {\hat{ℒ}}_{AE} = - \sum_{u} 𝔼_{p ({\overline{h}}_{u}^{j} ❘ \forall u \in j)} (Assor ({\overline{h}}_{u}^{j})), = \frac{1}{❘ j ❘} \sum_{u} \log (1 - Assor ({\overline{h}}_{u}^{j})), & (11) \end{matrix}$

- where _p(⋅) is the expectation of the variables in p(⋅), and p(h_u^j|∀u∈v^j) indicates h_u^jsampled from the distribution of H^j; Assor(⋅) is the assessor that evaluates the constructed global information. To distinguish the original and reconstructed global data, we regard the globally-shared information as the criterion and train the assessor to assign higher scores. In contrast, the assessor is trained to assign lower scores with the reconstructed global information. Therefore, the assessor is able to guide the autoencoder to evolve more discriminative representations of latent features. Specifically, the loss function of the assessor is defined as

$\begin{matrix} {\hat{ℒ}}_{AS} = ⁠ - \sum_{u} [⁠ 𝔼_{p (h_{u}^{j} ❘ \forall u \in j)} (Assor (h_{u}^{j})) + 𝔼_{p ({\overline{h}}_{u}^{j} ❘ \forall u \in j)} (1 - Assor ({\overline{h}}_{u}^{j}))] = \frac{1}{❘ j ❘} \sum_{u} [\log (1 - Assor (h_{u}^{j})) + \log (Assor ({\overline{h}}_{u}^{j}))] & (12) \end{matrix}$

- where p(h_u^j|∀u∈v^j) denotes h_u^jsampled from the distribution of H^j.

D. Negative Sampling and Graph Fixing

Negative Sampling. To extract more refined potential features, we develop a negative sampling mechanism to concentrate on the pertinent information for node classification. Based on the proposed versatile assessor, we first set a threshold θ∈(0, 1) in every training iteration of the autoencoder and select the attributes in h_u^jthat are less than θ. These attributes are deemed as negative and their feedbacks from the assessor are 0. Next, the zero-regularization is used to process these negatives, and thus both the autoencoder and the assessor can spotlight the representations that are meaningful for downstream tasks. Hence, the loss function of the assessor is updated and redefined as

$\begin{matrix} ℒ_{AS} = \frac{1}{❘ j ❘} \sum_{u} [\log (1 - Assor (h_{u}^{j} ⊙ e_{u})) + \log (Assor ({\overline{h}}_{u}^{j} ⊙ e_{u}))], & (13) \end{matrix}$

Where e_uis a c-dimensional vector that judges whether h_ui^j∈h_u^jis higher than θ (e_ui=1) or not (e_ui, =0). ⊙ is the element-wise multiplication.

- i. Correspondingly, the loss function of the autoencoder is updated and redefined as

$\begin{matrix} ℒ_{A E} = \frac{1}{❘ j ❘} \sum_{u} [\log (1 - Assor ({\overline{h}}_{u}^{j} ⊙ e_{u})) + { h_{u}^{j} ⊙ (- e_{u}) - {\bar{h}}_{u}^{j} (- e_{u}) }_{2}^{2}], & (14) \end{matrix}$

Where h_u^jand h_u^jare the u-th vector of H^jand H^j, respectively. custom-character is an indicator vector with the values of 1.

Through the above operations, ε^jand X^jare used to form the learnable potential graph custom-character ^j=(^j, ε^j, X^j).

Graph Fixing. The edge server S_jdivides custom-character ^jinto some subgraphs, denoted by the set {^ji(^ji, ε^ji, ^ji)|i∈[M_j]}, where ε^ji={ε_uv^ji|ε_uv^ji∈ε^j, ∀u, u∈v^ji} is the neighbor set of ^ji, ^ji={x_u^ji|u∈v^ji}, and x_u^ji={x_u^ji|ē_uv^ji∈ε^ji} indicates the potential neighbor feature vectors of u. Next, S_jassigns the subgraphs to each client. It is noted that each local client repairs the subgraph by using the local graphic patcher P_i^jreferring to custom-character ^ji=P_i^j(^ji). This process simulates the missing links, thereby promoting the feature propagation of local tasks in Eq. (3). By collaborating with the edge server, clients are expected to acquire diverse neighbor features from globally-shared information, thereby fixing cross-subgraph missing links. Moreover, these cross subgraph links contribute to training a global node classifier F_j, aligning with the overall optimization objective in Eq. (4).

E. SpreadFGL: Distributed Federated Graph Learning

In real-world application scenarios, a single edge server may encounter the problem of excessive costs and degraded performance as the number of clients expands, particularly when clients are geographically dispersed. To address this problem, we propose a novel distributed FGL framework, named SpreadFGL, that extends the FedGL to a multi-edge environment. The SpreadFGL is able to facilitate more efficient FGL training and better load balancing in a multiedge collaborative environment. We consider that there are N edge servers, and an edge server S_jis equipped with a global node classifier F_jparameterized by W_j. Besides, a client only communicates with its closest edge server. There exist neighbor relationships among the servers, denoted by the matrix A∈ custom-character ^N×N. If S_jand S_jare neighbors, a_ij=1; otherwise, a_ij=0. Moreover, the parameter transmission is permitted between neighbor servers.

In SpreadFGL, the clients adopt the L-layer GNNs and conduct the feature propagation via Eq. (3) during the local training. The edge servers exchange information with the covered clients in each edge-client communication. At each K intervals of edge-client communications, the clients and their nearest edge servers collaboratively utilize the shared information to extract the potential links based on the proposed graph imputation generator and negative sampling mechanism.

However, the potential cross-subgraph links strictly depend on the information provided by all clients. This not only violates the core idea of the SpreadFGL but also is impractical if the information is transmitted from the clients that are under the coverage of other servers. In light of these concerns, we design a weight regularizer during the local training. Based on trace normalization, the regularizer is used to enhance the network learning capability of the local node classifiers. Specifically, the loss function of the i-th client under the coverage of S_jis defined as

$\begin{matrix} ℒ_{F_{i}^{j}} = i_{j} (F_{i}^{j} (W_{(j, i)})) = - \overset{❘ t_{ji} ❘}{\sum_{u = 1}} \underset{r = 1}{\sum^{c}} Y_{ur}^{ji} \ln H^{(j, i)} + Tr (W_{(j, i, L)} W_{(j, i, L)}^{T}), & (15) \end{matrix}$

- where Tr(⋅) is the square matrix trace. W_(j,i,L)indicates the parameters of L-th GNN layer for the local node classifier F_i^j.

To better explore the potential cross-subgraph links by using the information from other servers, we adopt the topology structure at the edge layer to facilitate the parameter transmission between neighbor servers. This enables the information flow among clients via the gradient propagation at each custom-character intervals of edge-client communication. Specifically, S_jfirst aggregates the model parameters of its neighbor servers. Next, S_javerages the parameters and broadcast them to the covered clients. This process can be described as

$\begin{matrix} W_{j} \leftarrow 1 / (\sum_{r = 1}^{N} a_{rj} M_{r}) \sum_{r = 1}^{N} \sum_{i = 1}^{M_{r}} a_{rj} W_{(r, i)} & (16) \end{matrix}$

The procedure of the proposed SpreadFGL is elaborated in Algorithm 1, whose core components have been described in detail before.

3. Performance Evaluation

we conduct ablation experiments to further verify the superiority of the core components designed in the proposed frameworks.

A. Experiment Setup

Real-world Testbed. As shown in FIG. 3, we build a hardware testbed to evaluate the proposed FedGL and SpreadFGL in real-world scenarios of edge-client collaboration. In the testbed, each Raspberry Pi 4B acts as a client that is equipped with Broadcom BCM2711 SoC @1.5 GHz with 4 GB RAM, and the OS is Raspbian GNU/Linux 11. Each Jetson TX2 acts as an edge server that is equipped with one 256-core NVIDIA Pascal® GPU, one Dual-core Denver 2 64-bit CPU and a quad-core Arm® Cortex®-A57 MPCore processor equipped with 8 GB RAM, and the OS is Ubuntu 18.04.6 LTS. The above hardware communicates through a 5 GHz WiFi network, and the proposed frameworks are implemented based on PyTorch. After completing local training, the client (Raspberry Pi 4B) uploads the local model parameters to its connected edge server (Jetson TX2). An edge server conducts aggregation and distributes the globally-shared model to its connected clients.

Datasets. The following four benchmark graph datasets are used in our experiments, as shown in Table II, where c is the number of classes, and |V_i| and |ε_i| are the average number of nodes and edges in subgraphs, and |Δε| is the number of missing cross-subgraph links.

- Cora is a dataset of citation network, respectively. According to the application topics, the nodes are labeled with 7 classes.
- Citeseer is a research application citation dataset, where nodes and edges indicate publications and citation re lationships, respectively. The citation relationships are defined as a word vector, and the nodes are classified into 6 classes.
- WikiCS is a dataset derived from Wikipedia, where nodes and edges indicate computer science (CS) articles and different branches, respectively. All nodes are labeled with 6 classes.
- CoauthorCS is an academic network dataset on microsoft scholar graph, where nodes and edges indicate authors and co-author relationships, respectively. The nodes are labeled with 15 classes based on research fields.

TABLE II

DESCRIPTION OF BENCHMARK GRAPH DATASETS

Datasets

text missing or illegible when filed

7
6
10
15

text missing or illegible when filed

d
1433

text missing or illegible when filed

300

M
6
9
12
15
6
9
12
15
6
9
12
15
6
9
12
15

text missing or illegible when filed

300
225

text missing or illegible when filed

277
221

text missing or illegible when filed

304

357
341
263

text missing or illegible when filed

110
434
632
782

text missing or illegible when filed

indicates data missing or illegible when filed

Comparison Algorithms. We compare our proposed FedGL and SpreadFGL with the following state-of-the-art algorithms.

- LocalFGL is a local node classifier in the SpreadFGL, which is trained by a client independently.
- FedAvg-fusion is an improved FedAvg framework, which trains a globally-shared GNN model with FedAvg via collaborating subgraphs distributed among clients.
- FedSage+ adopts a linear predictor to locally repair the potential links between subgraphs, referring to the

latent information in each training round. It is worth noting that there are few studies for handling the FGL scenario with completely missing cross-subgraph links between clients. FedSage+ is deemed as the state-of-the-art algorithm for studying the missing cross-subgraph links in FGL fields. However, it still suffers from performance bottlenecks and has not been well solved in real-world scenarios.

Parameter Settings. For the proposed SpreadFGL and FedGL, we adopt the GraphSAGE with two layers and use the GCN aggregator as local node classifiers. The autoencoder employs 4 fully-connected layers, where the neural number of encoder and decoder are {c, 16, d} and {d, 16, c}, respectively. In the autoencoder, the Softmax is used as an activation function in the last layer. The assessor adopts a fully connected neural network, where the hidden neural number is {c, 128, 16, 1}. In the assessor, the Sigmoid is used as an activation function in the last layer while the ReLU is used in the rest layers. The training iterations of the autoencoder and assessor are T_aeT_as=5 and T_as=3, respectively, and the Adam optimizer is used to update parameters with the learning rate of 0.001. The threshold θ is set to 1/c and k ranges in [3, 20]. Moreover, we select [20%, 60%] samples as the training set and randomly choose 20% as the testing set. The Louvain algorithm is used to measure the subgraph similarity for clients. The FedGL uses an edge server and the SpreadFGL adopts three edge servers for collaborative training with a ring topology structure, where the number of clients ranges in [6, 15]. The Adam optimizer is used to update the parameters of local classifiers with the learning rate lr=0.01. Besides, we use the well-known accuracy (ACC) and macro F1-score (F1) as performance metrics.

B. Experiment Results and Analysis

Node Classification Accuracy. As shown in Table III, the proposed SpreadFGL and FedGL can both achieve higher classification accuracy than other state-of-the-art algorithms under different datasets, indicating the superiority of the proposed frameworks for node classification tasks. Specifically, the significant performance gap between the LocalFGL and SpreadFGL verifies the advantages of using the proposed edge client collaboration mechanism. The FedGL and SpreadFGL outperform the FedSage+ by around 12.78% and 14.71% in terms of ACC and F1, respectively. This demonstrates that the FedGL and SpreadFGL gain more generalized potential cross subgraph links through the global information flow, further validating the effectiveness of the proposed graph imputation generator. Moreover, compared to the FedGL, the SpreadFGL achieves better performance on most of the datasets under various scenarios with different numbers of clients. This indicates that the information flow between clients and edge servers utilized in the SpreadFGL effectively promotes the repair of missing links among clients even though the scenario becomes complex with more clients.

TABLE III

NODE CLASSIFICATION ACCURACY (%) ON FOUR DATASETS WITH LABELED

RATIO OF text missing or illegible when filed

AND M = 6, 9, 12, 15

Dataset

text missing or illegible when filed

Methods
Metrics
M = 6
M = 9
M = 12
M = 15
M = 6
M = 9
M = 12
M = 15

LocalFGL
ACC
62.20
60.00
57.14
63.33
51.63

text missing or illegible when filed

43.75

56.71
52.43
53.96
41.47
47.85
49.70
46.00
37.90

FedAvg-fusion
ACC
81.70
76.89

text missing or illegible when filed

70.63
71.57
71.42

text missing or illegible when filed

68.64

79.15
74.05

text missing or illegible when filed

63.83
61.89
67.17
60.00
60.11

FedSage+
ACC
80.26
80.18

text missing or illegible when filed

72.87
72.46
72.09

text missing or illegible when filed

79.98
79.63
78.72

text missing or illegible when filed

62.25
61.65
60.45

text missing or illegible when filed

ACC
84.47
83.36
82.81

text missing or illegible when filed

73.08
73.53
73.03

text missing or illegible when filed

84.08
83.11
81.63
75.34

text missing or illegible when filed

67.53
64.39
63.72

SpreadFGL
ACC

text missing or illegible when filed

82.59

73.43
73.72

text missing or illegible when filed

84.32
83.31
82.34

text missing or illegible when filed

67.72
68.12

text missing or illegible when filed

Dataset

CoauthorCS

Methods
Metrics
M = 6
M = 9
M = 12
M = 15
M = 6
M = 9
M = 12
M = 15

LocalFGL
ACC

text missing or illegible when filed

55.56

47.46

80.00

79.90

52.06
48.50

text missing or illegible when filed

42.31
57.45

text missing or illegible when filed

FedAvg-fusion
ACC
76.25
74.70
73.67
73.37
87.73
86.96
87.35

text missing or illegible when filed

68.98
66.52
63.00
62.53
73.46
67.15
62.68

text missing or illegible when filed

FedSage+
ACC
36.32
38.73
36.94

text missing or illegible when filed

87.68
88.03

text missing or illegible when filed

66.57
67.06
61.85

text missing or illegible when filed

ACC
77.56
76.97
76.24
75.26

text missing or illegible when filed

89.74
87.72
88.62

text missing or illegible when filed

70.71

64.37

65.22
65.06

SpreadFGL
ACC

text missing or illegible when filed

77.10
76.32

text missing or illegible when filed

71.32

67.49
74.54
68.13

text missing or illegible when filed

indicates data missing or illegible when filed

Performance with Different Labeled Ratios. FIG. 4 depicts the ACC of the SpreadFGL on different datasets with various labeled ratios, varying from 0.2 to 0.6. With the same labeled ratio, the ACC tends to decrease as the datasets are distributed on more clients. This is because massive heterogeneous clients cause difficulty and instability in the aggregation process of model parameters. Under this scenario, the performance of the classic FGL might be seriously degraded since it adopts a centralized training manner. It is noted that the ACC is rising as the labeled ratio increases, but with fewer data points presenting the opposite situation. This discrepancy may be attributed to the sparsity of certain classes in the feature space, leading to insufficient model training and thus affecting classification accuracy.

Parameter Sensitivity. We analyze the parameter sensitivity of the proposed SpreadFGL on different datasets with respect to the hyperparameter K and T_l. As shown in FIG. 5, K remarkably affects the classification accuracy in terms of the ACC and F1. Specifically, the ACC and F1 stay at a low level when K is more than 10, while they keep stable as K ranges in [1, 10], attributed to the reason that the graph imputation

generator can better repair the missing links in subgraphs to promote feature propagation in local models within fewer edge-client communications, thereby improving the training of the global node classifiers. In this regard, the suggested values of K range from 1 to 10. FIG. 6 presents the influence of local training iteration T_lon the SpreadFGL. The SpreadFGL converges slowly and achieves a local optimum when T_lis less than 5. This is because local models cannot sufficiently learn feature patterns within fewer local iterations, leading to slow model convergence. It is noted that the ACC declines when T_lexceeds 50 due to the overfitting of the model. Therefore, a suitable range of T_lis [10, 20], considering both accuracy and convergence speed.

Ablation Study. As shown in FIG. 7, we regard the FedAvgfusion as a baseline that adopts the FedAvg to aggregate the parameters from multiple clients on an edge server. Also, we test the performance of the FedGL without a negative sampling mechanism (denoted by NS), versatile assessor (denoted by Assor), and the FedGL without NS. The proposed FedGL and SpreadFGL achieve comparable performance and outperform others by combining graph imputation generator, versatile assessor, and negative sampling mechanism. It is noted that there is only a small performance improvement when just utilizing one of the core components designed in the proposed frameworks. It obtains considerable improvement when the SpreadFGL adopts all the proposed components. This demonstrates that the integration of these components is able to better extract more refined potential cross-subgraph links, thereby promoting the accuracy of classification tasks.

Convergence Validation. FIG. 8 illustrates the training loss of different FGL-based frameworks on Cora and Citeseer datasets. It can be observed that both the FedGL and SpreadFGL can always rapidly converge compared to the state-of the-art algorithms, validating the effectiveness of the proposed frameworks in node classification tasks. FIG. 9 shows the

In this application, we propose a novel FGL-based framework named FedGL and its extended framework SpreadFGL, addressing the challenges of generating cross-subgraph links and single-node overloading. First, we design the FedGL to repair the missing links between clients, where a new graph imputation generator is developed that incorporates a versatile assessor and negative sampling mechanism to explore refined global information flow, extracting unbiased latent links and thus improving the training effect. Next, to alleviate the overloading issue at the edge layer, we extend the FedGL and propose the SpreadFGL with multi-edge collaboration to enhance the global information exchange. Extensive experiments are conducted on real-world testbed and benchmark graph datasets to verify the superiority of the proposed FedGL and SpreadFGL. The results show that the FedGL and SpreadFGL outperform state-of-the-art algorithms in terms of model accuracy. Further, through ablation experiments and convergence analysis, we validate the effectiveness of the core components designed in the proposed frameworks and the advantage of the SpreadFGL for achieving faster convergence speed.

	Number	Date	Country
Parent	PCT/CN2023/132495	Nov 2023	WO
Child	18399696		US

EDGE-CLIENT COLLABORATIVE FEDERATED GRAPH LEARNING WITH ADAPTIVE NEIGHBOR GENERATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS REFERENCE TO THE RELATED APPLICATIONS

Continuations (1)