The present disclosure relates to network and cloud control technology, and in particular, relates to control for allocating tasks.
With the development of communication technology, a variety of applications are emerging in various fields such as healthcare, smart cities, and manufacturing. These applications are offloaded to cloud servers for processing because end devices (EDs) such as personal computers, smartphones, IoT devices, and automobiles have limited computing resources.
This mechanism is called cloud computing (CC). Tasks of the offloaded applications include computing resource requests and communication requests with different characteristics, such as traffic-heavy characteristics, compute-heavy characteristics, and latency (delay time)-sensitive characteristics.
Here, the term “traffic-heavy task” refers to a task that requires a large amount of traffic. The term “latency-sensitive task” refers to a task that has strict requirements regarding communication delay. Because cloud servers are typically located far away from end devices, additional communication delays are inevitably incurred when end devices offload tasks to the cloud. Therefore, cloud computing poses a problem in that it degrades the performance of tasks that are sensitive to delays.
In order to address the above problem, edge computing (EC) has been proposed in which computing resources are placed on edge servers close to end devices. Combining cloud computing and edge computing creates a plurality of offloading options and increases the efficiency of task offloading. For example, clouds generally have sufficient computing resources, and therefore offloading compute-heavy tasks to the cloud can improve the efficiency of task offloading.
Additionally, several studies have conventionally addressed the problem of task offloading in cloud computing and edge computing. Specifically, methods using reinforcement learning (RL) are attracting attention (Non-Patent Literature 1 to Non-Patent Literature 4).
Reinforcement learning can immediately output efficient task offloading by learning in advance the relationship between an input network pattern and an output task offloading.
Non-Patent Literature 1: Y. Zhan, S. Guo, P. Li, and J. Zhang, “A deep reinforcement learning based offloading game in edge computing,” IEEE Trans. Comput., vol. 69, No. 6, pp. 883-893, 2020.
Non-Patent Literature 2: D.C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Deep reinforcement learning for collaborative offloading in heterogeneous edge networks,” in Proc. IEEE/ACM CCGrid. IEEE, 2021, pp. 297-303.
Non-Patent Literature 3: W. Hou, H. Wen, H. Song, W. Lei, and W. Zhang, “Multi-agent deep reinforcement learning for task offloading and resource allocation in cybertwin based networks,” IEEE Internet Things J., 2021.
Non-Patent Literature 4: Y. Zhang, B. Di, Z. Zheng, J. Lin, and L. Song, “Distributed multi-cloud multi-access edge computing by multi-agent reinforcement learning,” IEEE Trans. Wireless Commun., vol. 20, No. 4, pp. 2565-2578, 2020.
However, the following two problems arise with the conventional method.
The first problem is that existing studies do not consider cloud computing or only target networks with a single cloud server. As mentioned above, combining cloud computing and edge computing is essential to improve the efficiency of task offloading. Furthermore, in a general network, there are a plurality of cloud servers.
The second problem is that existing studies do not take into account bandwidth or the topology of the backbone network, which is the core communication network that connects business operators. Many previous studies try to minimize task delay by shortening the route through which offloaded tasks pass. However, with control that does not take bandwidth into account, congestion may occur due to concentration of task loads on a link.
Additionally, multi-agent reinforcement learning is an effective means for dealing with more complex problems by solving one problem with a plurality of agents. Each agent cooperates with other agents to maximize its reward. By allocating each agent to its own task, the learning cost for each agent can be reduced. However, when each agent learns independently, there is a problem in that each agent will take selfish actions. As a concrete example of this problem, when each agent learns independently and simultaneously and acts independently, all tasks are concentrated on a predetermined cloud server with the lightest load, and as a result, the predetermined cloud server becomes overloaded.
The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to improve the efficiency of task offloading in consideration of network usage statuses such as network topology and bandwidth.
In order to solve the above problem, the invention according to claim 1 provides a control apparatus that controls allocation of a task to a physical network constructed and modeled by respective nodes including respective edge nodes and respective cloud nodes, the control apparatus including: an observation unit that observes task information regarding the task requested by an end device and network usage information indicating a usage status of the physical network; a calculation unit that calculates an optimal specific node for offloading the task based on the observation result of the observation unit; and a transfer unit that transfers the task to the specific node.
According to the present invention, the efficiency of task offloading can be improved in consideration of network usage statuses such as network topology and bandwidth.
Hereinafter, an overview of a communication system that performs task offloading will be described with reference to
As illustrated in
The control apparatus 50 acquires task information and network usage information from the modeled physical network 140, and performs control of task allocation to the modeled physical network 140.
Specifically, the control apparatus 50 formulates optimal task offload problems for multi-cloud and multi-edge networks in consideration of constraints on usage statuses of a physical network such as network topology and/or bandwidth. Here, optimal offloading is defined as a solution that maximizes the resource utilization efficiency of servers and links and minimizes task delay while satisfying the constraints of server capacity, link capacity, and task delay. The decision variables here are the allocation of computing resources for the task and the route between an end device and the allocated server. The control apparatus 50 also proposes a task offloading algorithm based on cooperative multi-agent deep reinforcement learning (Coop-MADRL).
The modeled physical network 140 is constructed by a plurality of end devices that request tasks, a plurality of edge nodes 121, 122, and 123, and a plurality of cloud nodes 131 and 132. Note that in
Further, the end device 11 can be connected to the plurality of edge servers 21 and 22 and the plurality of cloud servers 31 and 32 via an access network an1. Similarly, the end device 12 can be connected to the plurality of edge servers 21 and 22 and the plurality of cloud servers 31 and 32 via an access network an2. Further, a core network cn is constructed between the edge server 21 and the edge server 22. The modeled physical network 140 illustrated in
Hereinafter, the end devices 11 and 12 will be collectively referred to as an “end device 10.” The edge servers 21 and 22 are collectively referred to as an “edge server 20.” The cloud servers 31 and 32 are collectively referred to as a “cloud server 30.” The edge nodes 121, 122, and 123 are collectively referred to as an “edge node.” The cloud nodes 131 and 132 are collectively referred to as a “cloud node.” Edge nodes and cloud nodes are collectively referred to as “nodes.” Furthermore, the access networks an1 and an2 are collectively referred to as an “access network an.”
Further, the end device 10 is a personal computer, a smartphone, a smart watch, an IoT device, a home appliance, a communication device mounted on or installed on a mobile object, or the like. Mobile objects include vehicles, aircraft, ships, robots, and the like.
As illustrated in
The end device 10 is configured by a computer and generates various tasks with various applications. Each task is configured with at least one of information on a required computing resource demand, a traffic demand, or a maximum allowable delay.
Each end device 10 can calculate its own tasks within the end device 10 or offload tasks to an adjacent edge or cloud.
As illustrated in
The processor 101 serves as a control unit that controls the entire control apparatus 50, and includes various arithmetic devices such as a central processing unit (CPU). The processor 101 reads various programs onto the memory 102 and executes the programs. Note that the processor 101 may include a general-purpose computing on graphics processing units (GPGPU).
The memory 102 has main storage devices such as a read only memory (ROM) and a random access memory (RAM). The processor 101 and the memory 102 form a so-called computer, and the processor 101 executes various programs read onto the memory 102, thereby implementing various functions of the computer.
The auxiliary storage device 103 stores various programs and various types of information used when the various programs are executed by the processor 101.
The connection device 104 is a connection device that connects an external device (for example, a display device 108 and an operation device 109) and the control apparatus 50.
The communication device 105 is a communication device for transmitting and receiving various types of information to and from another device.
The drive device 106 is a device in which a recording medium 106m is set. The recording medium 106m mentioned herein includes a medium that optically, electrically or magnetically records information, such as a compact disc read-only memory (CD-ROM), a flexible disk, or a magneto-optical disk. The recording medium 106m may also include a semiconductor memory that electrically records information, such as a read only memory (ROM) and a flash memory.
Various programs to be installed in the auxiliary storage device 103 are installed, for example, by setting the distributed recording medium 106m in the drive device 106 and reading the various programs recorded in the recording medium 106m by the drive device 106. Alternatively, various programs installed in the auxiliary storage device 103 may be installed by being downloaded from the network via the communication device 105.
Since the end device 10, the edge server 20, and the cloud server 30 have the same hardware configuration as the control apparatus, description thereof will be omitted.
Next, control of the task offload system will be described with reference to
Here, a discrete time step t is considered. Assuming that each end device 10 has one or more tasks, K tasks are considered between time steps [0, T]. Specifically, the following processes are executed.
Step S11: At the start of each time step t, a task arrives at the edge server 20 closest to the end device 10.
Step S12: An observation unit 51 of each edge server 20 (control apparatus 50) acquires task information and network usage information to observe the task information and the network usage status. The task information includes at least one of information on a required computing resource demand, a traffic demand, or a maximum allowable delay time. The network information is, for example, information regarding network topology and/or bandwidth as a network usage status. Step S13: A calculation unit 55 of each edge server 20 (control apparatus 50) calculates an optimal specific node for offloading the task using the proposed method placed in the edge server 20 based on the observation results obtained in step S12 (see [Proposed Method] described later for details).
Step S14: When a plurality of tasks arrive at the edge server 20 at the same time (YES), this method repeats determination of the offload node using a first-in first-out (FIFO) method. When a plurality of tasks do not arrive at the same time (NO), the process proceeds to the next step.
Step S15: The calculation unit 55 of each edge server 20 (control apparatus 50) aggregates traffic demand information between nodes, and calculates and updates an optimal route between the nodes.
Step S16: A transfer unit 59 of each edge server 20 (control apparatus 50) transfers the task to the optimal node via the optimal route. Step S17: The node to which the task has been transferred executes the task and returns the result to the requesting end device 10.
Step S18: When a predetermined termination condition is satisfied (YES), the control of the task offload system is terminated. The predetermined termination condition is, for example, when a task request from each end device 10 is terminated. Step S19: When the predetermined termination condition is not satisfied in step S18 (NO), if a certain period of time has elapsed (YES), the process returns to step S11, and the process is repeated in the next time step t+1.
Note that it is assumed that the task being executed continues to consume the resources of the offloaded node and the link through which the task passes until the task returns the result to the end device 10. Therefore, in the present embodiment, a task for which a request is accepted at time step t does not need to be completed by time step t+1.
Next, Table 1 shows the definitions of variables in the network model.
A physical network graph G (N, L) composed of a physical node set N and a physical link set L is considered. It is assumed that each node has a role as an edge or a cloud. Here, each edge node E is assumed to be e∈E ⊂N, and each cloud node C is assumed to be c∈C⊂N. Also, the numbers of nodes, edge nodes, and cloud nodes are expressed as |N|, |E|, and |C|, respectively. The end device 10 connects to the nearest edge server 20 via the access network an, but in the present embodiment, it is assumed that the access network an is not included in G(N, L).
Also, the node processing capability of the i-th node is denoted as follows.
This indicates the upper limit of the processing capability of the computing resource, such as the CPU capability per second ([G cycles/s]) of the i-th node.
Also, the node capacity of the i-th node is denoted as follows.
For example, when one CPU core is allocated to each task,
is equal to the number of CPU cores of the i-th node.
The bandwidth capacity of the link (i, j) is denoted as follows.
Also, the bandwidth capacity of the link is denoted as follows.
Furthermore, all links have transmission delays depending on the distance between nodes. Here, the delay time of each link is determined by the following distance coefficient αLij of link (i, j).
Next, Table 2 shows the definitions of the variables of the task model.
:= {Dk}
,
: all tasks )
A task model for uniformly expressing various tasks of the end device 10 will be described. The set of tasks is denoted as follows.
Also, the k-th task is defined as follows.
Here, tk∈T is the reception time (time) of task k, βk is the type of task k that is uniquely given to each application, and Ck is the required computing resource demand ([G cycles]).
In addition,
indicates the traffic demand for upload and download.
indicates the traffic demand for download.
indicates the maximum allowable delay time ([ms]).
Tasks consume computing resources and network resources on G(N, L) according to the k-th task Dk.
When a task is allocated to the edge node closest to the end device 10, the amount of network resources consumed by G(N, L) is assumed to be 0.
Next, a task offload problem is formulated to minimize (Formula 1) while satisfying the constraint conditions (Formula 2) to (Formula 17) shown in
First, Table 3 shows the definitions of variables for the task offload problem.
kup,
kdown
The decision variables for this problem are the task allocation variable Y and the route allocation variable Xt.
Here,
is a variable that represents 1 if the computing demand of task k is allocated to node n, and 0 otherwise.
In addition,
represents the proportion of a demand that passes through the link (i, j) at time step t in the following traffic demand mtpq from the start node p to the end node q.
Here,
represents the traffic demand matrix between the node p and the node q at time step t.
Furthermore, the position of the end device 10 is defined as Zke. Here, Zke is a variable that represents 1 if the nearest edge node of task k requested by the end device 10 is e, and 0 otherwise.
Next, the objective function indicated in (Formula 1) is introduced.
Here,
represent the maximum utilization rate of nodes and links at time step t, and are defined as follows, respectively.
Here, the i-th node utilization rate is denoted as follows.
Also, the i-th link utilization rate is denoted as follows.
In addition,
represent the node delay time and the link delay 10 time of task k.
Further, λ indicates a weighting parameter that. determines the ratio of importance of each term of the objective function.
Next, three types of constraint conditions, that is, node capacity, link capacity, and task delay, are set.
First, a binary variable
is defined as indicated in (Formula 2).
Here,
is a variable that returns 1 if task k is being executed at time step t, and 0 otherwise. Here, tk indicates the reception time (time) of task k.
The task allocation variable ykn is formulated to minimize the following maximum node utilization rate UtN while satisfying the node capacity constraints indicated in (Formula 3) to (Formula 6).
(Formula 3) indicates that the computing demand of each task needs to be allocated to one of the nodes. (Formula 4) represents the constraint on the capacity of the node.
in (Formula 4) indicates the allocation of the task being executed at t.
The route allocation variable
is formulated to minimize the following maximum link utilization rate U, while satisfying the link capacity constraints indicated in (Formula 7) to (Formula 11).
in (Formula 9) can be formulated as indicated in (Formula 12) and (Formula 13).
(Formula 12) indicates a request for upload traffic from the transmission source node p to the transmission destination node q. Here, zkp and yxq determine the node p and the node q. In addition,
extracts the task being executed. (Formula 13) indicates the download traffic demand from the node q to the node p, and is the opposite of the upload formula.
The delay time of the task node
and the delay time of the link
are formulated as indicated in (Formula 14) to (Formula 16).
Finally, the latency constraint is formulated as indicated in (Formula 17).
First, variables representing a subset of tasks are defined as indicated in (Formula 18) to (Formula 21) in
Here, Kt indicates a subset of tasks executed at. time step t. Moreover, Ke indicates a subset of tasks accepted by the edge node e. Further, Dt indicates a subset of tasks accepted at time step t. Also, De,t indicates a subset of tasks accepted by the edge node e at time step t.
Here, Table 4 indicates the definitions of the variables of the proposed method.
:= {ge}
:= {
e}
e
:= {
e}
e: action space of each agent )
e
|E| agents equal in number to the number of edge nodes are introduced, and the agents are allocated to task offload control for the edge nodes, respectively.
An agent ge∈G learns a method for optimizing task offloading of the edge node e. The state is defined as follows.
The observation of the agent ge is defined as follows
A candidate set of actions Ae is defined as a set of nodes that offload tasks.
When the edge node e does not accept the task at time step t, the agent ge selects the action of “doing nothing.” The reward is designed to return a negative value if the constraint condition is not satisfied, and otherwise return a positive value depending on the value of the objective function.
The proposed method (Coop-MADRL) performs intensive learning and distributed execution.
Line 1 indicates initialization of agent parameters. A series of procedures (lines 2 to 18) are repeatedly executed until learning is completed. Lines 3 to 4 indicate creation of a task and initialization of environment parameters. A series of actions is called an episode, and each episode (lines 5 to 16) is repeatedly executed.
In each episode, the agent collects training samples that are combinations of <ot, at, rt>. The time step of the network simulator is tsim, which is reset at the beginning of each episode.
In line 7, when the edge e receives a plurality of tasks in tsim, the agent ge selects one task using a FIFO method.
In line 9, a random action is selected with probability ϵ, and if not, an action that maximizes the following is selected with probability 1−ϵ.
Each agent executes lines 7 to 9 in parallel.
In line 10, the task offload is updated according to at by Algorithm 3.
Line 11 calculates the reward.
Lines 12 to 13 indicate the termination condition of agent learning.
In lines 14 and 15, if all the tasks accepted by tsim have been allocated, the process proceed to the next tsim+1.
Line 17 indicates storage to Replay memory M.
In line 18, all agents G are trained by the history of episodes randomly acquired from M.
In line 1, G is trained in advance using Algorithm 1.
Next, Algorithm 2 continuously repeats lines 2 to 9 as long as the system receives and accepts new tasks.
In line 6, each agent selects the following that maximizes Qe (oe, ae).
Line 1 indicates the calculation of Y.
Line 2 indicates the calculation of the following.
Line 3 indicates the calculation of Mt.
Line 4 indicates the calculation of the following.
Line 5 indicates the calculation of delay.
Finally, Algorithm 3 returns variables for reward calculation.
A reward function is designed based on the objective function (Formula 1).
The function of (Formula 22) is designed so that the efficiency becomes worse as x becomes larger.
Also, depending on x, it returns a positive value if x<0.8, and a negative value otherwise.
Note that,
indicates an average satisfaction level of latency, and is defined as (Formula 23).
This concludes the description of the proposed method.
According to the present embodiment, the efficiency of task offloading can be improved by introducing a cooperative multi-agent method. That is, an agent that has learned the optimal task offload is placed at each edge. Furthermore, by introducing a mechanism in which each agent learns cooperatively, the selfish action of each agent is prevented. Thereby, the efficiency of task offloading can be improved in consideration of constraints on network usage statuses such as network topology and/or bandwidth.
Furthermore, by using deep reinforcement learning to learn in advance the relationship between network demand patterns and optimal task offloading, efficient task offloading can be quickly obtained.
The present invention is not limited to the above-described embodiment, and may be configured or processed (operations) as described below.
For example, the control apparatus 50 can be implemented by a computer and a program, and the program can be recorded in a (non-transitory) recording medium or provided via a communication network such as the Internet.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/017008 | 4/1/2022 | WO |