The present invention belongs to the technical field of Mobile Edge Computing, in particular relates to a Collaborative Caching Framework for Multi-edge Systems with Robust Federated Deep Learning.
With the tremendous development of the 5G technique, massive intelligent applications are expanding across industrial manufacturing, digital economy, vehicle networking, and smart cities. For cloud computing, the tasks and data generated by the applications are uploaded to the remote cloud for processing, causing serious network congestion and service delay. To relieve this problem, the emerging Mobile Edge Computing (MEC) deploys computing and storage resources at the network edge that is close to end devices, offering sturdy support of real-time computing and data storage for end intelligent applications. Thus, MEC nodes can perform various management operations such as signal processing, distributed caching, and wireless resource collaboration. Among these operations, the distributed caching caches user-interested content on MEC nodes, aiming to reduce access delay and data duplication storage, thus enhancing user experience and saving system costs. However, the cache performance is commonly limited by the size of cache space and overheads. Therefore, how to effectively utilize the MEC cache space and improve cache performance has attracted extensive attention from both academia and industry. Generally, cache performance is constrained by many factors including cache size, content relevance, cache partitioning, and cache replacement. It is worth noting that it would be helpful to find the optimal configuration of cache resources via exploring the potential connection between user and content characteristics in the multi-dimensional space, which will enhance the hit rate of user-accessed resources. Also, the multi-dimensional partitioning of the cache space assists MEC systems in providing more accurate recommendations of popular content to users. However, it is still highly challenging to effectively explore and partition the multi-dimensional cache space.
Multi-edge collaborative caching works as a feasible mechanism to further optimize cache resource configuration and reduce service delay. Users can find their requested contents from other MEC nodes that perform collaborative caching if their connected MEC nodes do not match their requests. Nevertheless, most of the existing studies cannot well address the problems of inefficient multi-edge collaboration and irrational cache resource configuration. As a distributed training framework, Federated Learning (FL) is regarded as a promising solution to optimize the above problems. Following the basic idea of FL, MEC nodes collaborate to train a global model by uploading model parameters without revealing raw data. However, in complex MEC environments, unintentional model corruption or adversarial model interference with malicious intention may result in model training inability and degraded quality of the global model. Specifically, unintentional model corruption may happen due to noisy training labels, insufficient data samples, and unintentionally uploaded models with low quality. Malicious MEC nodes may deliberately launch adversarial attacks to tamper models such as Byzantine and Backdoor attacks. The key challenges of applying FL to deal with the problem of multi-edge collaborative caching are summarized below.
The purpose of the present invention is to provide a Collaborative Caching Framework for Multi-edge Systems with Robust Federated Deep Learning; wherein the Collaborative Caching Framework for Multi-edge Systems which consists of M MEC nodes, each contains a MEC server and a base station, donated by the set E={e1, e2, . . . , em, . . . , eM}, and N users, donated by the set U={u1, u2, . . . , un, . . . , UN}; the caching space of MEC nodes is donated as the set C={C1, C2, . . . , Cm, . . . , CM}; each user is connected to a MEC node, and they communicate with each other via the wireless link provided by the associated base station; furthermore, the communications among MEC nodes and between MEC nodes and the cloud data center are conducted via the backhaul link; the caching space status of each MEC node is periodically broadcast to the other MEC nodes within the proposed system; moreover, the content library of the cloud data center, denoted by F={f1, f2, . . . , fi, . . . , fI}, where I indicates the number of accessible contents; It is noted that users are discretely distributed in the service zone of each edge node.
When the user un sends a request for the content fi to its connected MEC node, the workflow is given as follows;
The popularity of fi on the MEC node em is defined as
The proposed RoCoCache enables precise prediction of content popularity; to evaluate the prediction accuracy, the global loss function is defined as
Where r indicates the FL communication round, w(r) is the parameter of the global prediction model, req is the total number of requests received by all MEC nodes, wm(r) is the parameter of a local prediction model, and Mean-Square Error (MSE) is defined as
Where θm(fi) indicates whether em caches the content requested by users or not, and it is defined as
Proposing a user partitioning method based on multi-dimensional user features including gender, age, and occupation, which are mapped to coordinate axes, denoted by Θ={l1, l2, . . . , lt, . . . , lT}; at the initial stage (Grade=0), all users with different features are placed within the same user interval (h0); if the number of users in h0 exceeds the threshold ζ(Grade), it will be equally divided into 2T user intervals along each dimension, where the length of each divided dimension will be halved (lt=lt/2); ζ(Grade) determines the number of users in a user interval; when ζ(Grade) is larger, there are more users in each user interval, which cannot well reflect the unique preferences of different users; when ζ(Grade) is smaller, there are fewer users in each user interval, which may lead to inaccurate cache prediction; to achieve adaptive partitioning of user intervals and capture the potential relationships between interval users and their preferred contents, we set ζ(Grade)=α2Grade, where α is a hyper-parameter; the partitioned user intervals are denoted as H={h1, h2, . . . , hs, . . . , hS}, where S is the number of user intervals; the partitioning may continue and go to the following stages (e.g., Grade=1, 2, . . . ) according to performance requirements.
The user activity and memory access interval are defined as
Considering the above factors, the size of cache space allocated to hs is defined as
The user request matrix X contains historical information of user-requested contents on MEC nodes, which is defined as
The input of the decoder is defined as
To address the problem of gradient collapse caused by introducing the implicit embedded space, we replicate the gradient ∇zL from the decoder network to the encoder network during the back-propagation;
When training the VQ-VAE, the loss function is defined as
Next, the log-likelihood function is defined as
According to Jensen's Inequality, Eq. (13) is rewritten as
First combine the model parameters from all MEC nodes into a matrix R∈M×θ, which is defined as
Next, we arrange the elements of each column in R in descending order, retain their sorted positions, and transform them into {tilde over (R)}; for example, R(5.3, 6.7, 0.7,0.4)→{tilde over (R)}(2, 1, 3, 4); specifically, the mean and standard deviation (STD) of {tilde over (R)} are defined as
Following the mean and STD, we can divide the normal and adversarial model updates into two clusters through the K-means, where the adversarial model updates can be easily identified by the proposed residual-based detection; thus, the MEC nodes that offer normal model updates can be filtered, denoted by E′={e1, e2, . . . , eM′}; to avoid the model destruction caused by adversarial model updates, we design a similarity-based federated aggregation method; specifically, we adopt the canonical correlation analysis (CCA) to measure the similarity between the model updates of each MEC node and the average one, which determines the weights of different model updates when performing federated aggregation; this process is described as
Based on the proposed RFDL, design a proactive cache replacement strategy with multi-edge collaboration; for each MEC node, initialize the cache space cache temp and set of user intervals H through the multi-dimensional cache space partitioning; while cachetemp≥0, and the user-interest contents will be placed into the temporary cache library Ctemp; to avoid the cache redundancy caused by overlapping userinterest contents in different intervals, replace Ctemp by Cs that selects cacheh most popular contents in the current user interval hs from Ctemp;next, remove the duplicates in the cache library Cm on each MEC node and update the available cache space; the above steps will be iterated until the cache space is fully occupied.
Compared with the prior art, the present invention has the following beneficial effects:
Through the ablation experiments, we verify that the designs of multidimensional cache space partition and collaborative caching in RoCoCache can effectively improve the cache performance. Moreover, the RoCoCache exhibits both excellent training and cache efficiency under various scenarios with different numbers of MEC nodes and cache space sizes. Besides, the RoCoCache is able to accurately identify adversarial model updates in complex network environments, demonstrating its good robustness.
The technical solution of the present invention is described in detail in combination with the accompany drawings.
Proposed in the present invention is a Collaborative Caching Framework for Multi-edge Systems with Robust Federated Deep Learning. Framework is as shown in
The method specifically comprises the following design process:
To address these important challenges, we propose RoCoCache, a novel collaborative caching framework for multi-edge systems with robust federated deep learning. The main contributions of this application are summarized as follows.
The proposed multi-edge collaborative caching system is shown in
In the scenario of multi-edge collaborative caching, users' content requests are dynamic and reveal spatio-temporal dependencies. Commonly, the cache hit rate can be greatly improved by accurately predicting content popularity and then caching the user-interest contents into the cache space of MEC nodes. Specifically, the popularity of fi on the MEC node em is defined as
The proposed RoCoCache enables precise prediction of content popularity. To evaluate the prediction accuracy, the global loss function is defined as
Where r indicates the FL communication round, w(r) is the parameter of the global prediction model, req is the total number of requests received by all MEC nodes, wm(r) is the parameter of a local prediction model, and Mean-Square Error (MSE) is defined as
Moreover, the cache hit rate is defined as
Where θm(fi) indicates whether em caches the content requested by users or not, and it is defined as
The cache performance might be affected by many factors including cache resource configuration, content popularity, model robustness, and cache replacement strategy. With the comprehensive consideration of these factors, the proposed RoCoCache is able to effectively improve the cache performance in a multi-edge collaborative caching system.
Based on the proposed system model and problem formulation, we propose RoCoCache, a novel collaborative caching framework for multi-edge systems with RFDL. First, we perceptually optimize the cache space of MEC nodes via a new multi-dimensional cache space partitioning and determine the proper size of cache space for interval users. Next, we design a new VQ-VAE to learn the implicit embedded space consisting of discrete vectors. In VQ-VAE, the decoder uses the nearest neighbor to find discrete hidden vectors, and then it generates the user request matrix that has been calibrated, thereby improving the prediction accuracy of content popularity. Next, we design a novel training mode based on RFDL to improve model scalability and robustness. In this design, we use a residual-based detection method to capture adversarial model updates. And a similarity-based FL aggregation method is utilized to avoid the damage of the globally-shared model caused by adversarial updating. Finally, we design a proactive cache replacement strategy based on RFDL to better fit the optimized cache resource configuration and improve the performance of multi-edge collaborative caching.
Multi-dimensional cache space partitioning comprised two main components: multi-dimensional user partitioning and cache space partitioning. First, we classify and segment feature groups with various numbers of users, where user-interest contents are individually cached for different groups. Next, based on the established classification, the cache space is perceptually optimized based on user features, user activities, and memory access intervals.
To a certain extent, user features reflect the user preference for cache contents. As shown in
At the initial stage (Grade=0), all users with different features are placed within the same user interval (h0). If the number of users in h0 exceeds the threshold ζ(Grade), it will be equally divided into 2T user intervals along each dimension, where the length of each divided dimension will be halved (lt=lt/2). ζ(Grade) determines the number of users in a user interval. When ζ(Grade) is larger, there are more users in each user interval, which cannot well reflect the unique preferences of different users. When ζ(Grade) is smaller, there are fewer users in each user interval, which may lead to inaccurate cache prediction. To achieve adaptive partitioning of user intervals and capture the potential relationships between interval users and their preferred contents, we set ζ(Grade)=α2Grade, where α is a hyper-parameter. The partitioned user intervals are denoted as H={h1, h2, . . . , hs, . . . , hS}, where Sis the number of user intervals. The partitioning may continue and go to the following stages (e.g., Grade=1, 2, . . . ) according to performance requirements.
Allocating proper cache space for user intervals is important to improve cache performance. In this regard, several factors need to be considered when allocating cache space, including the number of users, user activities, and memory access intervals. For example, younger users prefer richer types of contents and show higher activities, while older users may focus on limited contents and exhibit lower activities.
Specifically, in the partitioned user interval hs, the number of users is donated as num(hs). The user activity and memory access interval are defined as
where reqs is the number of user requests and T (hs) is the memory access interval of hs.
Considering the above factors, the size of cache space allocated to hs is defined as
As a classic unsupervised learning method, the Variational Auto-Encoder (VAE) uses continuous variables in hidden layers to reconstruct the compressed input data, then the data clustered in the latent space. However, when facing continuous variables in hidden layers, the VAE is prone to posterior the collapse issue, which severely affects the learning and reconstruction of the original data distribution, leading to inaccurate popularity prediction. To address this issue, the VQ-VAE adopts learnable discrete vectors to form the implicit embedding space, replacing the hidden layers in the classic VAE. When predicting the content popularity, the VQ-VAE aims to find the vector in the implicit embedding space with the closest distance to the output encoding of the encoder network, and then it reconstructs the mapped vector via the decoder network.
Specifically, the VQ-VAE learns the implicit distribution in the user request matrix X, aiming to obtain future user requests in the reconstructed matrix output by the decoder. The user request matrix X contains historical information of user-requested contents on MEC nodes, which is defined as
In VQ-VAE, the implicit embedded space is defined as v∈K×D, where K is the space size and D is the dimension of the embedded vector. Thus, there are K embedded vectors vk∈
D (k∈1, 2, 3, . . . , K). As shown in
The input of the decoder is defined as
To address the problem of gradient collapse caused by introducing the implicit embedded space, we replicate the gradient ∇zL from the decoder network to the encoder network during the back-propagation.
When training the VQ-VAE, the loss function is defined as
Next, the log-likelihood function is defined as
According to Jensen's Inequality, Eq. (13) is rewritten as
There are two key components in the proposed RFDL including the residual-based detection and the similarity-based federated aggregation. The residual-based detection is to detect adversarial model updates by parameter ranking. The similarity-based federated aggregation is to avoid the destruction of the globally-shared model by adversarial updating and generate a robust and accurate prediction model of content popularity in complex MEC environments.
For classic FL training, some adversarial model updates may happen, severely affecting model robustness. To address this issue, we design a parameter ranking matrix {tilde over (R)} to detect the adversarial updating. Typically, adversarial updates may reveal some distinctive features in the ranking domain such as unusual mean and standard deviation. As shown in M×θ, which is defined as
Next, we arrange the elements of each column in R in descending order, retain their sorted positions, and transform them into {tilde over (R)}. For example, R(5.3, 6.7, 0.7, 0.4)→{tilde over (R)}(2, 1, 3, 4). Specifically, the mean and standard deviation (STD) of {tilde over (R)} are defined as
Following the mean and STD, we can divide the normal and adversarial model updates into two clusters through the K-means, where the adversarial model updates can be easily identified by the proposed residual-based detection. Thus, the MEC nodes that offer normal model updates can be filtered, denoted by E′={e1, e2, . . . , eM′}.
To avoid the model destruction caused by adversarial model updates, we design a similarity-based federated aggregation method. Specifically, we adopt the canonical correlation analysis (CCA) to measure the similarity between the model updates of each MEC node and the average one, which determines the weights of different model updates when performing federated aggregation. This process is described as
By integrating the residual-based detection with similarity-based federated aggregation, we propose a novel RFDL, whose key steps are given in Algorithm 1.
max and
max do
) ←MEC node updates(w(r)
m);
;
) ← w(r) − η∇L(w(r); b);
) to the cloud data center.
indicates data missing or illegible when filed
D. Proactive Cache Replacement with RFDL
Based on the proposed RFDL, we design a proactive cache replacement strategy with multi-edge collaboration. The key steps are given in Algorithm 2. For each MEC node, we initialize the cache space cache temp and set of user intervals H through the multi-dimensional cache space partitioning (Line 2). While cachetemp≥0, Algorithm 1 is called to predict and sort the content popularity, and the user-interest contents will be placed into the temporary cache library Ctemp (Line 4). To avoid the cache redundancy caused by overlapping userinterest contents in different intervals, we replace Ctemp by Cs that selects cacheh most popular contents in the current user interval hs from Ctemp (Lines 5˜7). Next, we remove the duplicates in the cache library Cm on each MEC node and update the available cache space (Lines 8˜9). The above steps will be iterated until the cache space is fully occupied.
Next, we first introduce the real-world experiment setup. Next, we evaluate the proposed RoCoCache through extensive comparative experiments.
← Select cache
most popular contents
from Ctemp;
indicates data missing or illegible when filed
Real-world Testbed. We construct a real-world testbed that consists of a workstation and a set of Jetson TX2, as shown in
Datasets. We adopt the real-world datasets of MovieLens collected by the GroupLens Research, which contains about 1 million rating information of 3883 movies by 6040 anonymous users. The datasets offer user serial numbers, movie indexes, movie ratings, timestamp labels, and user context information. Specifically, we select the user gender, age, and occupation as user features and regard the movie rating as user requests. The datasets are split into the training (70%), validation (10%), and testing (20%) sets.
Parameter Settings. Based on the above real-world testbed and datasets, we simulate the scenario of multi-edge collaborative caching that consists of one cloud data center, 5˜20 MEC nodes, and 6040 users. The cloud data center stores the complete MovieLens datasets, each MEC node is equipped with a fixed size of cache space, and users are randomly distributed in the service zone of each edge node. We implement the RoCoCache based on Python 3.8 and Tensorflow 2.4.0. Specifically, the hyper-parameter a in the multi-dimensional cache space partitioning is 512, the size of the VQ-VAE hidden embedded space K is 128, the dimension D of the embedded vector ve is 16, the number of FL communication rounds rmax is 50, the batch size in VQ-VAE is 32, the number of training epochs cmax is 300, and the learning rate η is 0.001.
Comparison Approaches. We compare the RoCoCache with the optimum and the following benchmark methods. Moreover, we conduct ablation experiments to analyze the effectiveness of the multi-dimensional cache space partitioning and collaborative caching in RoCoCache. Meanwhile, we test the training and caching efficiency of the RoCoCache.
Attack Models. We evaluate the robustness of the RoCoCache by using the following two attack models.
Comparison with Benchmarks. We conduct comparison experiments under different sizes of MEC cache space in terms of cache hit rate. As shown in
Ablation Experiments. We conduct ablation experiments to test the impact of multi-dimensional cache space partitioning and collaborative caching on the VQ-VAE-based methods. As shown in
Convergence Analysis.
Training Efficiency. We test the training efficiency of the RoCoCache in the scenarios with different numbers of MEC nodes. As shown in
Caching Efficiency. We test the caching efficiency of different methods in the scenario with five MEC nodes in terms of the delay of content requests. The Uncollaborative indicates the RoCoCache without collaborative caching, and the Distributed only caches one copy of contents on each MEC node according to the content popularity. As shown in Table 1, the delay of content requests declines as the size of MEC cache space increases. The RoCoCache reveals the best caching efficiency because it can handle content requests via three ways and accurately predict the content popularity. The Uncollaborative does not use collaborative caching, and thus it needs to forward the requests of missing contents from local devices to the remote cloud. Moreover, due to the low cache hit rate, the Distributed needs to constantly send content requests to other MEC nodes and the remote cloud. Therefore, these two methods result in excessive delay.
Robustness Analysis. We evaluate the robustness of the RoCoCache from two aspects. On the one hand, we test the ability of the RoCoCache to detect adversarial model updates.
On the other hand,
In this application, we propose RoCoCache, a novel collaborative caching framework for multi-edge systems with RFDL. First, we design a multi-dimensional cache space partitioning mechanism to perceptually optimize the cache space of MEC nodes, offering accurate content recommendations in user classification intervals. Next, we develop a VQ-VAE-based content popularity prediction algorithm, addressing the posterior collapse and enhancing the prediction accuracy. Finally, we create a new training mode and proactive cache replacement strategy based on RFDL for better adaptability and robustness in complex network environments. Using real-world testbed and MovieLens datasets, the extensive experiments verify the effectiveness of the proposed RoCoCache. The results show that the RoCoCache achieves a higher cache hit rate than benchmark methods and approximates the optimum. Through the ablation experiments, we verify that the designs of multidimensional cache space partition and collaborative caching in RoCoCache can effectively improve the cache performance. Moreover, the RoCoCache exhibits both excellent training and cache efficiency under various scenarios with different numbers of MEC nodes and cache space sizes. Besides, the RoCoCache is able to accurately identify adversarial model updates in complex network environments, demonstrating its good robustness.
This application is the continuation application of International Application No. PCT/CN2023/132497, filed on Nov. 20, 2023, the entire contents of which are incorporated herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2023/132497 | Nov 2023 | WO |
| Child | 18408610 | US |