The present invention relates to the field of wireless networks, in particular to a digital twin-based edge-end collaborative scheduling method for heterogeneous tasks and resources.
With the rapid development of 5G, more and more devices have been connected to the Internet, as a result of which people, machines and things are connected to each other to realize the Internet of everything. Therefore, the transmission of large-scale heterogeneous tasks on 5G networks is explosive. Typical heterogeneous tasks include: video/media tasks that require broadband communications, sensing/measurement tasks that require low-power communications, and industrial control tasks that require real-time computation and deterministic communications. To complete a complex task, multiple heterogeneous tasks shall be coordinated. When these heterogeneous tasks perform highly concurrent access, they must compete for limited communication resources in the “spatio-temporal-frequency” domain, thereby causing transmission conflicts and reducing QoS.
To improve the quality of service (QoS), multi-access edge computing can be used to assist in processing the tasks of end devices, so as to reduce task processing delay. Typically, an edge server is deployed on a base station, and the base station can achieve some network management functions and provide computing power resources for the end devices to assist computing to reduce the task processing delay. However, the multi-access edge computing may further aggravate the problem of competition for the communication resources. Especially, heterogeneous industrial tasks have different demands on computation and communication resources, resulting in fragmentation of the resources. Therefore, according to the demands on the heterogeneous tasks, reasonable scheduling of the computation and communication resources of the end devices and the edge servers is the current core challenge.
The existing methods face different multi-access edge computing scenarios, adopt different optimization algorithms or theories, and optimize different parameters to achieve different objectives such as delay minimization, energy consumption minimization and throughput maximization. However, the existing methods do not pay attention to the problems of matching of heterogeneous computation resource types, mutual interference of devices during computing offloading, and the error of resource estimation.
The present invention is oriented to the generalized scenarios with a single cloud server, multiple edge servers, and multiple end devices, adopts the digital twin technology to virtualize and model heterogeneous computation resources, and can support collaborative scheduling of heterogeneous tasks, heterogeneous computation resources and communication resources. The present invention fully considers the deadline requirements and the required computation resource types of the heterogeneous tasks, the computation resource types and the maximum computing capacity of end devices and edge servers, the estimation deviations of computation resources of digital twin, the maximum transmit power and the peak interference power of the end devices, and proposes an edge-end collaborative scheduling method for heterogeneous tasks and network computation and communication resources based on multi-agent deep reinforcement learning, which solves the problem that the traditional scheduling method is difficult to deal with the state space explosion under a dynamic network environment, achieves the minimization of the total processing delay of the heterogeneous tasks, and can support real-time collaborative processing of heterogeneous high-concurrency tasks such as computation intensive tasks and delay sensitive tasks.
To achieve the above purpose, the present invention adopts the following technical solution: a digital twin-based edge-end collaborative scheduling method for heterogeneous tasks and computation and communication resources by multi-agent deep reinforcement learning, and comprises the following steps:
The edge wireless network based on digital twin comprises: N base stations configured with edge servers and M end devices;
The base stations are configured with edge servers and used for providing computation resources for a plurality of end devices and supporting scheduling of the end devices within a coverage range;
The end devices are used for computing the heterogeneous tasks locally, and supporting offloading of the heterogeneous tasks to the edge servers through wireless channels for edge computing;
The digital twin is placed on a cloud server of the network, represented as a virtualization model established by the base stations and the end devices comprised in the network, and used for evaluating the operating states of the base stations, the edge server and the end devices, the types of the computation resources, and the amount of the computation and communication resources, and supporting the training of deep reinforcement learning methods to carry out the edge-end collaborative scheduling of the network.
For a single end device, the tasks can be non-offloaded, partially offloaded, or completely offloaded to one or more edge servers for computing;
The transmission rate of the end device during task offloading is
The edge-end collaborative scheduling problem of the heterogeneous tasks and resources is
is the target of the problem, which represents minimization of the total task processing delay; Tm represents the task processing delay of the end device m; U, V, P and F are the sets of variables to be optimized in the problem, and represent the matching decision of the computation resource types, task offloading ratio, the transmit power of the end device and the computation resource allocation of the edge server respectively;
C1 is the constraint of the task offloading ratio; wherein vm,n∈[0,1] is the task offloading ratio of the end device m to the edge server n; vm,n=0 represents that the end device m does not offload tasks to the edge server n; vm,n=1 represents that the end device m offloads tasks to the edge server n; vm,0=0 represents that the end device m does not perform local computing; and vm,0=1 represents that the end device m performs local computing;
C2 and C3 are the constraints of the transmit power of the end devices; wherein Pmax represents the maximum transmit power of the end device; Ip represents the peak interference power that the end device can tolerate; and gm,m* and gm′,m* represent the channel gains from the end device m and the end device m′ to the end device m* respectively; wherein m*=arg max gm,m′ is the end device that generates the biggest interference to the end device m;
C4 is the matching decision constraint of the heterogeneous computation resource type; wherein Om and On represent the computation resource types of the end device m and the edge server n respectively; ⊗ represents XOR operation; um,n=1 represents that the computation resource types of the end device m and the edge server n are the same; um,n=0 represents that the computation resource types of the end device m and the edge server n are different;
C5 and C6 are computation resource constraints; wherein fm,n represents the edge computation resource estimated by the digital twin; Δfm,n represents the computation resource estimation deviation of the digital twin; Fmax,n represents the maximum computation rate of the edge server n;
C7 is a task deadline constraint; wherein Tmax,n represents the deadline of the task executed by the end device m, that is, the longest task processing delay that can be accepted by the end device m.
The task processing delay of the end device is determined by the edge computing delay TnEdge and the local computing delay TmLocal, and the computation method is as follows:
The edge computing delay TmEdge is computed as
The communication delay Tm,nComm of task offloading is determined by the task offloading amount and the offloading rate of the end device, and computed as
The computing delay Tm,nComp of the task processing is determined by the task offloading amount of the end device m and the computation resources n allocated by the edge server m for the end device fm,n, and calculated as
The local computing delay TmLocal is calculated as
Converting the optimization scheduling problem into a multi-agent Markov decision process problem comprises the following steps:
The agent set is an agent set M={1 . . . ,M}formed by M end devices; The state space is a state of the agent m at time t, expressed as
The action space is an action performed by the agent m at time t, expressed as
The state transfer probability is the probability transferred by Sm(t) to Sm(t+1) when the agent m executes an action am(t), that is, zm(sm(t+1);sm(t),am(t));
The reward function is the reward or punishment for the agent to take the action in a certain set state, expressed as rm(t); wherein the individual reward obtained by the agent m is rm(t)=rmLatency(t)+ρmrmDDL(t), and ρm represents a weight parameter set according to the deadline requirement of the heterogeneous tasks; the delay reward is rmLatency(t)=−Tm(t) and the deadline reward is rmDDL(t)=Tmax,m(t)−Tm(t);
max Rm(t)
s.t. C1,C2,C3,C4,C5,C6
In the case that the constraints C1-C6 are satisfied, the long-term cumulative reward is maximized to obtain the best state transfer probability, and then obtain the strategy of minimizing the total task processing delay.
Constructing an Actor-Critic neural network model based on the multi-agent deep reinforcement learning comprises an Actor network and a Critic network;
The Actor network adopts strategy-based deep neural networks, comprising an estimation Actor network for training and a target Actor network for executing the action to generate agent actions;
The Critic network adopts value-based deep neural networks, comprising an estimation Critic network and a target Critic network to evaluate the actions of the Actor and guide the Actor to produce better actions.
Performing offline centralized training of the neural network model by digital twin comprises the following steps:
and updating the parameter θQ represents the random gradient descent computation of the loss function L(θQ
and updating the parameter θπ represents the random gradient descent computation of the loss function L(θπ
Perceiving an environment state online by end devices, and performing distributed execution of task offloading and computation and communication resource allocation according to the Actor-Critic neural network model under centralized training comprises the following steps:
The present invention has the following beneficial effects and advantages:
1. With respect to the edge-end collaborative processing problem of heterogeneous high-concurrency tasks, the present invention fully considers the deadline requirements and computation resource types of the heterogeneous tasks, the maximum computing capacity of end devices and edge servers, the estimation deviations of computation resources of digital twin, the maximum transmit power of the end devices and the peak interference power of the end devices, and proposes a digital twin-based edge-end collaborative scheduling method for heterogeneous tasks and resources, which can satisfy the QoS requirements of the heterogeneous tasks and support the edge-end collaborative processing of the heterogeneous tasks.
2. With respect to the problems of difficult modeling and algorithm state space explosion caused by complicated coupling of heterogeneous computation and communication multidimensional resources, the present invention adopts the multi-agent deep reinforcement learning method to propose the digital twin-based edge-end collaborative scheduling method for heterogeneous tasks and resources, which achieves centralized offline training and distributed online execution of the scheduling algorithm, can minimize the total processing delay of the tasks, and simultaneously satisfies different deadline requirements of the heterogeneous tasks.
The present invention will be further described in detail below in combination with the drawings and the embodiments.
The present invention is oriented to the edge-end collaborative processing of large-scale heterogeneous tasks under the scenarios with a single cloud server, multiple edge servers and multiple end devices, and proposes an edge-end collaborative scheduling method for heterogeneous tasks and network computation and communication resources based on multi-agent deep reinforcement learning. The method of the present invention can support the on-demand offloading of the heterogeneous tasks and realize edge-end resource collaboration. On the premise of satisfying the limitations such as deadline requirements and computation resource types of the heterogeneous tasks, the maximum computing capacity of end devices and edge servers, the estimation deviations of computation resources of digital twin, the maximum transmit power of the end devices and the peak interference power of the end devices, the present invention achieves the minimization of the total processing delay of the heterogeneous tasks and supports the edge-end collaborative processing of the heterogeneous tasks.
The digital twin-based edge-end collaborative scheduling method for heterogeneous tasks and resources proposed by the present invention comprises the following steps: 1) establishing an edge wireless network based on digital twin; 2) constructing an edge-end collaborative scheduling problem of heterogeneous tasks and resources according to the deadline requirements of the heterogeneous tasks and the constraints of the heterogeneous computation and communication resources; 3) converting the scheduling problem into a multi-agent Markov decision process problem; 4) constructing an Actor-Critic neural network model based on the multi-agent deep reinforcement learning to solve the multi-agent Markov decision process problem; 5) performing offline centralized training of the Actor-Critic neural network model by digital twin to obtain an experience pool and neural network parameters; and 6) performing online distributed execution of task offloading and computation and communication resource allocation by end devices to collaboratively process the heterogeneous tasks and minimize the total task processing delay. The overall flow of the present invention is shown in
1) The edge wireless network based on digital twin is established. As shown in
For a single end device, the tasks can be non-offloaded, partially offloaded, or completely offloaded to one or more edge servers for computing. The transmission rate of the end device during task offloading is
2) An edge-end collaborative scheduling problem of heterogeneous tasks and resources is constructed.
The task processing delay Tm of the end device is determined by the edge computing delay TmEdge and the local computing delay TmLocal, and a computation method is as follows:
The communication delay Tm,nComm of edge computing is determined by the task offloading amount and the offloading rate of the end device, and computed as
The computing delay TmComp of the edge computing is determined by the task offloading amount of the end device m and the computation resources fm,n allocated by the edge server n for the end device m, and calculated as
{tilde over (T)}mComp is the local computing delay estimated by the digital twin, and calculated as
Constructing a joint scheduling problem of heterogeneous tasks and network computation and communication resources with minimization of total task processing delay as the target according to the deadline requirements of the heterogeneous tasks and the constraints of the heterogeneous computation and communication resources is as follows:
is the target of the problem, i.e., minimization of the completion time for the total task;
The action space is an action performed by the agent m at time t, expressed as
The state transfer probability is the probability transferred by Sm(t) to Sm(t+1) when the agent m executes an action am(t), that is, zm(sm(t+1);sm(t),am(t));
The reward function is the reward or punishment for the agent to take the action in a certain set state, expressed as rm(t); wherein the individual reward obtained by the agent m is rm(t)=rmLatency(t)+ρmrmDDL(t), and ρm represents a weight parameter set according to the deadline requirement of the heterogeneous tasks; the delay reward is rmLatency(t)=−Tm(t) and the deadline reward is rmDDL(t)32 Tmax,m(t)−Tm(t);
max Rm(t)
s.t. C1,C2,C3,C4,C5,C6
In the case that the constraints C1-C6 are satisfied, the long-term cumulative reward is maximized to obtain the best state transfer probability, and then obtain the strategy of minimizing the total task processing delay.
4) An Actor-Critic neural network model is constructed based on multi-agent deep reinforcement learning.
The Actor network and the Critic network are shown as
wherein the Actor network adopts strategy-based deep neural networks, and the Critic network adopts value-based deep neural networks. The Actor network is composed of an input layer, three fully connected layers, a softmax layer and an output layer. For the first two hidden layers, the ReLU function is used as a nonlinear approximate activation function. For the last hidden layer, Tanh is used as the activation function to constrain the actions. Through the softmax layer, the output probability sum for each action is 1. Then, an action is selected as the final output action am(t). The Critic network is composed of an input layer, three fully connected layers and an output layer, wherein the activation function of the first two hidden layers is ReLU.
5) Offline centralized training is performed for the neural network model by digital twin.
To obtain the strategy for minimization of the total task processing delay, as shown in
and updating the parameter θQ represents the random gradient descent computation of the loss function L(θQ
and updating the parameter θπ represents the random gradient descent computation of the loss function L(θπ
6) Online distributed execution of task offloading and computation and communication resource allocation are performed by end devices.
Performing online distributed execution of wireless communication and task offloading by end devices to collaboratively process the heterogeneous tasks according to the strategy for minimization of the total task delay comprises the following steps:
Number | Date | Country | Kind |
---|---|---|---|
202310046985.0 | Jan 2023 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/105898 | 7/5/2023 | WO |