CONTROL APPARATUS, CONTROL METHOD, AND PROGRAM

TECHNICAL FIELD

The present disclosure relates to network and cloud control technology, and in particular, relates to control for allocating tasks.

BACKGROUND ART

With the development of communication technology, a variety of applications are emerging in various fields such as healthcare, smart cities, and manufacturing. These applications are offloaded to cloud servers for processing because end devices (EDs) such as personal computers, smartphones, IoT devices, and automobiles have limited computing resources.

This mechanism is called cloud computing (CC). Tasks of the offloaded applications include computing resource requests and communication requests with different characteristics, such as traffic-heavy characteristics, compute-heavy characteristics, and latency (delay time)-sensitive characteristics.

Here, the term “traffic-heavy task” refers to a task that requires a large amount of traffic. The term “latency-sensitive task” refers to a task that has strict requirements regarding communication delay. Because cloud servers are typically located far away from end devices, additional communication delays are inevitably incurred when end devices offload tasks to the cloud. Therefore, cloud computing poses a problem in that it degrades the performance of tasks that are sensitive to delays.

In order to address the above problem, edge computing (EC) has been proposed in which computing resources are placed on edge servers close to end devices. Combining cloud computing and edge computing creates a plurality of offloading options and increases the efficiency of task offloading. For example, clouds generally have sufficient computing resources, and therefore offloading compute-heavy tasks to the cloud can improve the efficiency of task offloading.

Additionally, several studies have conventionally addressed the problem of task offloading in cloud computing and edge computing. Specifically, methods using reinforcement learning (RL) are attracting attention (Non-Patent Literature 1 to Non-Patent Literature 4).

Reinforcement learning can immediately output efficient task offloading by learning in advance the relationship between an input network pattern and an output task offloading.

CITATION LIST
Non-Patent Literature

Non-Patent Literature 1: Y. Zhan, S. Guo, P. Li, and J. Zhang, “A deep reinforcement learning based offloading game in edge computing,” IEEE Trans. Comput., vol. 69, No. 6, pp. 883-893, 2020.

Non-Patent Literature 2: D.C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Deep reinforcement learning for collaborative offloading in heterogeneous edge networks,” in Proc. IEEE/ACM CCGrid. IEEE, 2021, pp. 297-303.

Non-Patent Literature 3: W. Hou, H. Wen, H. Song, W. Lei, and W. Zhang, “Multi-agent deep reinforcement learning for task offloading and resource allocation in cybertwin based networks,” IEEE Internet Things J., 2021.

Non-Patent Literature 4: Y. Zhang, B. Di, Z. Zheng, J. Lin, and L. Song, “Distributed multi-cloud multi-access edge computing by multi-agent reinforcement learning,” IEEE Trans. Wireless Commun., vol. 20, No. 4, pp. 2565-2578, 2020.

SUMMARY OF INVENTION
Technical Problem

However, the following two problems arise with the conventional method.

The first problem is that existing studies do not consider cloud computing or only target networks with a single cloud server. As mentioned above, combining cloud computing and edge computing is essential to improve the efficiency of task offloading. Furthermore, in a general network, there are a plurality of cloud servers.

The second problem is that existing studies do not take into account bandwidth or the topology of the backbone network, which is the core communication network that connects business operators. Many previous studies try to minimize task delay by shortening the route through which offloaded tasks pass. However, with control that does not take bandwidth into account, congestion may occur due to concentration of task loads on a link.

Additionally, multi-agent reinforcement learning is an effective means for dealing with more complex problems by solving one problem with a plurality of agents. Each agent cooperates with other agents to maximize its reward. By allocating each agent to its own task, the learning cost for each agent can be reduced. However, when each agent learns independently, there is a problem in that each agent will take selfish actions. As a concrete example of this problem, when each agent learns independently and simultaneously and acts independently, all tasks are concentrated on a predetermined cloud server with the lightest load, and as a result, the predetermined cloud server becomes overloaded.

The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to improve the efficiency of task offloading in consideration of network usage statuses such as network topology and bandwidth.

Solution to Problem

In order to solve the above problem, the invention according to claim 1 provides a control apparatus that controls allocation of a task to a physical network constructed and modeled by respective nodes including respective edge nodes and respective cloud nodes, the control apparatus including: an observation unit that observes task information regarding the task requested by an end device and network usage information indicating a usage status of the physical network; a calculation unit that calculates an optimal specific node for offloading the task based on the observation result of the observation unit; and a transfer unit that transfers the task to the specific node.

Advantageous Effects of Invention

According to the present invention, the efficiency of task offloading can be improved in consideration of network usage statuses such as network topology and bandwidth.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram illustrating an example of an overall configuration of a communication system in an embodiment of the present invention.

FIG. 2 is a conceptual diagram illustrating a physical network of the present embodiment.

FIG. 3 is a hardware configuration diagram of a control apparatus of the present embodiment.

FIG. 4 is a flowchart indicating control of a task offload system.

FIG. 5 is a flowchart indicating control of the task offload system.

FIG. 6 is a diagram indicating each formula.

FIG. 7 is a diagram indicating each formula.

FIG. 8 is a diagram indicating each formula.

FIG. 9 is a diagram indicating each formula.

FIG. 10 is a diagram indicating each formula.

FIG. 11 is a diagram indicating each formula.

FIG. 12 is a diagram indicating Algorithm 1.

FIG. 13 is a diagram indicating Algorithm 2.

FIG. 14 is a diagram indicating Algorithm 3.

FIG. 15 is a diagram indicating Algorithm 4.

FIG. 16 is a diagram indicating each formula.

DESCRIPTION OF EMBODIMENTS
[Overview of Embodiment]

Hereinafter, an overview of a communication system that performs task offloading will be described with reference to FIGS. 1 and 2. FIG. 1 is a diagram illustrating an example of an overall configuration of a communication system in an embodiment of the present invention.

As illustrated in FIG. 1, the communication system of the present embodiment is constructed by a control apparatus 50 and a modeled physical network 140.

The control apparatus 50 acquires task information and network usage information from the modeled physical network 140, and performs control of task allocation to the modeled physical network 140.

Specifically, the control apparatus 50 formulates optimal task offload problems for multi-cloud and multi-edge networks in consideration of constraints on usage statuses of a physical network such as network topology and/or bandwidth. Here, optimal offloading is defined as a solution that maximizes the resource utilization efficiency of servers and links and minimizes task delay while satisfying the constraints of server capacity, link capacity, and task delay. The decision variables here are the allocation of computing resources for the task and the route between an end device and the allocated server. The control apparatus 50 also proposes a task offloading algorithm based on cooperative multi-agent deep reinforcement learning (Coop-MADRL).

The modeled physical network 140 is constructed by a plurality of end devices that request tasks, a plurality of edge nodes 121, 122, and 123, and a plurality of cloud nodes 131 and 132. Note that in FIG. 1, only a limited number of end devices, edge nodes, and cloud nodes are illustrated due to space limitations, but each of them may exist in more than the number of those illustrated in FIG. 1.

FIG. 2 is a conceptual diagram illustrating a physical network of the present embodiment. A physical network 40 is constructed by a plurality of end devices 11 and 12 that request tasks, a plurality of edge servers 21 and 22, and a plurality of cloud servers 31 and 32.

Further, the end device 11 can be connected to the plurality of edge servers 21 and 22 and the plurality of cloud servers 31 and 32 via an access network an1. Similarly, the end device 12 can be connected to the plurality of edge servers 21 and 22 and the plurality of cloud servers 31 and 32 via an access network an2. Further, a core network cn is constructed between the edge server 21 and the edge server 22. The modeled physical network 140 illustrated in FIG. 1 corresponds to the physical network 40 illustrated in FIG. 2. Note that in FIG. 2, only a limited number of end devices, edge nodes, cloud nodes, access networks, and core networks are illustrated due to space limitations, but each of them may exist in more than the number of those illustrated in FIG. 2.

Hereinafter, the end devices 11 and 12 will be collectively referred to as an “end device 10.” The edge servers 21 and 22 are collectively referred to as an “edge server 20.” The cloud servers 31 and 32 are collectively referred to as a “cloud server 30.” The edge nodes 121, 122, and 123 are collectively referred to as an “edge node.” The cloud nodes 131 and 132 are collectively referred to as a “cloud node.” Edge nodes and cloud nodes are collectively referred to as “nodes.” Furthermore, the access networks an1 and an2 are collectively referred to as an “access network an.”

Further, the end device 10 is a personal computer, a smartphone, a smart watch, an IoT device, a home appliance, a communication device mounted on or installed on a mobile object, or the like. Mobile objects include vehicles, aircraft, ships, robots, and the like.

As illustrated in FIG. 2, all nodes have computing resources to execute tasks on behalf of the end device 10 as the edge server 20 or the cloud server 30. All nodes are also connected to routers r1, r2, r3, and r4, which transfer traffic to other nodes, respectively. Each edge server 20 has the control apparatus 50 (see FIG. 1) for determining the optimal node to offload tasks.

The end device 10 is configured by a computer and generates various tasks with various applications. Each task is configured with at least one of information on a required computing resource demand, a traffic demand, or a maximum allowable delay.

Each end device 10 can calculate its own tasks within the end device 10 or offload tasks to an adjacent edge or cloud.

[Hardware Configuration of Embodiment]

FIG. 3 is a hardware configuration diagram of the control apparatus of the present embodiment.

As illustrated in FIG. 3, the control apparatus 50 includes a processor 101, a memory 102, an auxiliary storage device 103, a connection device 104, a communication device 105, and a drive device 106. Each piece of hardware constituting the control apparatus 50 is interconnected via a bus 107.

The processor 101 serves as a control unit that controls the entire control apparatus 50, and includes various arithmetic devices such as a central processing unit (CPU). The processor 101 reads various programs onto the memory 102 and executes the programs. Note that the processor 101 may include a general-purpose computing on graphics processing units (GPGPU).

The memory 102 has main storage devices such as a read only memory (ROM) and a random access memory (RAM). The processor 101 and the memory 102 form a so-called computer, and the processor 101 executes various programs read onto the memory 102, thereby implementing various functions of the computer.

The auxiliary storage device 103 stores various programs and various types of information used when the various programs are executed by the processor 101.

The connection device 104 is a connection device that connects an external device (for example, a display device 108 and an operation device 109) and the control apparatus 50.

The communication device 105 is a communication device for transmitting and receiving various types of information to and from another device.

The drive device 106 is a device in which a recording medium 106m is set. The recording medium 106m mentioned herein includes a medium that optically, electrically or magnetically records information, such as a compact disc read-only memory (CD-ROM), a flexible disk, or a magneto-optical disk. The recording medium 106m may also include a semiconductor memory that electrically records information, such as a read only memory (ROM) and a flash memory.

Various programs to be installed in the auxiliary storage device 103 are installed, for example, by setting the distributed recording medium 106m in the drive device 106 and reading the various programs recorded in the recording medium 106m by the drive device 106. Alternatively, various programs installed in the auxiliary storage device 103 may be installed by being downloaded from the network via the communication device 105.

Since the end device 10, the edge server 20, and the cloud server 30 have the same hardware configuration as the control apparatus, description thereof will be omitted.

[Processing of Embodiment]
<Control Procedure of Task Offload System>

Next, control of the task offload system will be described with reference to FIGS. 4 and 5. FIGS. 4 and 5 are flowcharts indicating control of the task offload system.

Here, a discrete time step t is considered. Assuming that each end device 10 has one or more tasks, K tasks are considered between time steps [0, T]. Specifically, the following processes are executed.

Step S11: At the start of each time step t, a task arrives at the edge server 20 closest to the end device 10.

Step S12: An observation unit 51 of each edge server 20 (control apparatus 50) acquires task information and network usage information to observe the task information and the network usage status. The task information includes at least one of information on a required computing resource demand, a traffic demand, or a maximum allowable delay time. The network information is, for example, information regarding network topology and/or bandwidth as a network usage status. Step S13: A calculation unit 55 of each edge server 20 (control apparatus 50) calculates an optimal specific node for offloading the task using the proposed method placed in the edge server 20 based on the observation results obtained in step S12 (see [Proposed Method] described later for details).

Step S14: When a plurality of tasks arrive at the edge server 20 at the same time (YES), this method repeats determination of the offload node using a first-in first-out (FIFO) method. When a plurality of tasks do not arrive at the same time (NO), the process proceeds to the next step.

Step S15: The calculation unit 55 of each edge server 20 (control apparatus 50) aggregates traffic demand information between nodes, and calculates and updates an optimal route between the nodes.

Step S16: A transfer unit 59 of each edge server 20 (control apparatus 50) transfers the task to the optimal node via the optimal route. Step S17: The node to which the task has been transferred executes the task and returns the result to the requesting end device 10.

Step S18: When a predetermined termination condition is satisfied (YES), the control of the task offload system is terminated. The predetermined termination condition is, for example, when a task request from each end device 10 is terminated. Step S19: When the predetermined termination condition is not satisfied in step S18 (NO), if a certain period of time has elapsed (YES), the process returns to step S11, and the process is repeated in the next time step t+1.

Note that it is assumed that the task being executed continues to consume the resources of the offloaded node and the link through which the task passes until the task returns the result to the end device 10. Therefore, in the present embodiment, a task for which a request is accepted at time step t does not need to be completed by time step t+1.

Next, Table 1 shows the definitions of variables in the network model.

TABLE 1

Symbol
Definition

G(N, L)
Network graph

n ∈ N
Node

link (i, j) ∈ L
Link

e ∈ E ⊂ N
Edge

c ∈ C ⊂ N
Cloud

v_i^N
i-th node processing capability

w_i^N
i-th node capacity

w_ij^L
Capacity of link (i, j)

α_ij^L
Distance coefficient of link (i, j)

A physical network graph G (N, L) composed of a physical node set N and a physical link set L is considered. It is assumed that each node has a role as an edge or a cloud. Here, each edge node E is assumed to be e∈E ⊂N, and each cloud node C is assumed to be c∈C⊂N. Also, the numbers of nodes, edge nodes, and cloud nodes are expressed as |N|, |E|, and |C|, respectively. The end device 10 connects to the nearest edge server 20 via the access network an, but in the present embodiment, it is assumed that the access network an is not included in G(N, L).

Also, the node processing capability of the i-th node is denoted as follows.

$\begin{matrix} v_{i}^{N} & [Math . 1] \end{matrix}$

This indicates the upper limit of the processing capability of the computing resource, such as the CPU capability per second ([G cycles/s]) of the i-th node.

Also, the node capacity of the i-th node is denoted as follows.

$\begin{matrix} w_{i}^{N} & [Math . 2] \end{matrix}$

For example, when one CPU core is allocated to each task,

$\begin{matrix} w_{i}^{N} & [Math . 3] \end{matrix}$

is equal to the number of CPU cores of the i-th node.

The bandwidth capacity of the link (i, j) is denoted as follows.

$\begin{matrix} w_{ij}^{L} & [Math . 4] \end{matrix}$

Also, the bandwidth capacity of the link is denoted as follows.

$\begin{matrix} w_{ij}^{L} & [Math . 5] \end{matrix}$

Furthermore, all links have transmission delays depending on the distance between nodes. Here, the delay time of each link is determined by the following distance coefficient α^L_ijof link (i, j).

$\begin{matrix} α_{ij}^{L} & [Math . 6] \end{matrix}$

Next, Table 2 shows the definitions of the variables of the task model.

TABLE 2

Symbol
Definition

t ∈ T
Time step (T: total time step)

custom-character

:= {D_k}
Task set (k ∈ custom-character

: all tasks )

t_k
Reception time of task k

β_k
Type of task k

C_k
Computing resource demand for task k

B_k^up, B_k^down
Upload/Download traffic demand for task k

τ_k^max
Maximum allowable delay for task k

A task model for uniformly expressing various tasks of the end device 10 will be described. The set of tasks is denoted as follows.

$\begin{matrix} 𝒟 = {D_{k}} & [Math . 7] \end{matrix}$

Also, the k-th task is defined as follows.

$\begin{matrix} D_{k} := [t_{k} . β_{k}, C_{k}, B_{k}^{up}, B_{k}^{down}, τ_{k}^{\max}] & [Math . 8] \end{matrix}$

Here, t_k∈T is the reception time (time) of task k, β_kis the type of task k that is uniquely given to each application, and C_kis the required computing resource demand ([G cycles]).

In addition,

$\begin{matrix} B_{k}^{up} & [Math . 9] \end{matrix}$

indicates the traffic demand for upload and download.

$\begin{matrix} B_{k}^{down} & [Math . 10] \end{matrix}$

indicates the traffic demand for download.

$\begin{matrix} τ_{k}^{\max} & [Math . 11] \end{matrix}$

indicates the maximum allowable delay time ([ms]).

Tasks consume computing resources and network resources on G(N, L) according to the k-th task D_k.

When a task is allocated to the edge node closest to the end device 10, the amount of network resources consumed by G(N, L) is assumed to be 0.

Next, a task offload problem is formulated to minimize (Formula 1) while satisfying the constraint conditions (Formula 2) to (Formula 17) shown in FIGS. 6 to 10. Note that FIGS. 6 to 10 are diagrams indicating each formula.

First, Table 3 shows the definitions of variables for the task offload problem.

TABLE 3

Symbol
Definition

X_t:= {x_ij,t^pq}
m_t^pqpassing rate on link (i, j)

Y := {y_kn}
Task allocation (task k, node n)

M_t:= {m_t^pq}
Traffic matrix

Z := {z_ke}
ED position (task k, edge e)

U_t^N:= {u_i,t^N}
i^thnode utilization rate

U_t^L:= {u_i,j,t^L}
(i, j) link utilization rate

U_t^N= max_i(U_t^N)
Maximum node utilization rate

U_t^L= max_i,j(U_t^L)
Maximum link utilization rate

τ_k^N, τ_k^L
Node delay, link delay

λ
Objective function weight

II_k,t
Binary variable that indicates whether or not

task is being executed

custom-character

_k^up,

_k^down
Upload traffic route, download traffic route

τ_p
Delay coefficient

custom-character

Set of links on route p

r_p
Traffic passing rate on route p

The decision variables for this problem are the task allocation variable Y and the route allocation variable X_t.

Here,

$\begin{matrix} Y := {{yk}_{n}} & [Math . 12] \end{matrix}$

is a variable that represents 1 if the computing demand of task k is allocated to node n, and 0 otherwise.

In addition,

$\begin{matrix} X_{t} := {x_{ij, t}^{pq}} & [Math . 13] \end{matrix}$

represents the proportion of a demand that passes through the link (i, j) at time step t in the following traffic demand m_t^pqfrom the start node p to the end node q.

$\begin{matrix} m_{t}^{pq} & [Math . 14] \end{matrix}$

Here,

$\begin{matrix} M_{t} := {m_{t}^{pq}} & [Math . 15] \end{matrix}$

represents the traffic demand matrix between the node p and the node q at time step t.

Furthermore, the position of the end device 10 is defined as Z_ke. Here, Z_keis a variable that represents 1 if the nearest edge node of task k requested by the end device 10 is e, and 0 otherwise.

Next, the objective function indicated in (Formula 1) is introduced.

Here,

$\begin{matrix} U_{t}^{N} & [Math . 16] \end{matrix}$

$\begin{matrix} U_{t}^{L} & [Math . 17] \end{matrix}$

represent the maximum utilization rate of nodes and links at time step t, and are defined as follows, respectively.

$\begin{matrix} U_{t}^{N} := \max_{i} (U_{t}^{N}) & [Math . 18] \end{matrix}$

$\begin{matrix} U_{t}^{L} := \max_{ij} (U_{t}^{L}) & [Math . 19] \end{matrix}$

Here, the i-th node utilization rate is denoted as follows.

$\begin{matrix} u_{i, t}^{N} \in U_{t}^{N} & [Math . 20] \end{matrix}$

Also, the i-th link utilization rate is denoted as follows.

$\begin{matrix} u_{ij, t}^{L} \in U_{t}^{L} & [Math . 21] \end{matrix}$

In addition,

$\begin{matrix} τ_{k}^{N} & [Math . 22] \end{matrix}$

$\begin{matrix} τ_{k}^{L} & [Math . 23] \end{matrix}$

represent the node delay time and the link delay 10 time of task k.

Further, λ indicates a weighting parameter that. determines the ratio of importance of each term of the objective function.

Next, three types of constraint conditions, that is, node capacity, link capacity, and task delay, are set.

First, a binary variable

$\begin{matrix} 𝕀_{k, t} & [Math . 24] \end{matrix}$

is defined as indicated in (Formula 2).

Here,

$\begin{matrix} 𝕀_{k, t} & [Math . 25] \end{matrix}$

is a variable that returns 1 if task k is being executed at time step t, and 0 otherwise. Here, t_kindicates the reception time (time) of task k.

The task allocation variable y_knis formulated to minimize the following maximum node utilization rate U_t^Nwhile satisfying the node capacity constraints indicated in (Formula 3) to (Formula 6).

$\begin{matrix} U_{t}^{N} & [Math . 26] \end{matrix}$

(Formula 3) indicates that the computing demand of each task needs to be allocated to one of the nodes. (Formula 4) represents the constraint on the capacity of the node.

$\begin{matrix} 𝕀_{k, t 𝒴_{kn}} & [Math . 27] \end{matrix}$

in (Formula 4) indicates the allocation of the task being executed at t.

The route allocation variable

$\begin{matrix} x_{ij, t}^{pq} & [Math . 28] \end{matrix}$

is formulated to minimize the following maximum link utilization rate U, while satisfying the link capacity constraints indicated in (Formula 7) to (Formula 11).

$\begin{matrix} U_{t}^{L} & [Math . 28] \end{matrix}$

$Here,$

$\begin{matrix} m_{t}^{pq} & [Math . 30] \end{matrix}$

in (Formula 9) can be formulated as indicated in (Formula 12) and (Formula 13).

(Formula 12) indicates a request for upload traffic from the transmission source node p to the transmission destination node q. Here, z_kpand y_xqdetermine the node p and the node q. In addition,

$\begin{matrix} 𝕀_{k, t} & [Math . 31] \end{matrix}$

extracts the task being executed. (Formula 13) indicates the download traffic demand from the node q to the node p, and is the opposite of the upload formula.

The delay time of the task node

$\begin{matrix} τ_{k}^{N} & [Math . 32] \end{matrix}$

and the delay time of the link

$\begin{matrix} ?_{k}^{N} & [Math . 33] \end{matrix}$

$? indicates text missing or illegible when filed$

are formulated as indicated in (Formula 14) to (Formula 16).

Finally, the latency constraint is formulated as indicated in (Formula 17).

<Proposed Method>
(Modeling)

First, variables representing a subset of tasks are defined as indicated in (Formula 18) to (Formula 21) in FIG. 11. FIG. 11 is a diagram indicating each formula.

Here, K_tindicates a subset of tasks executed at. time step t. Moreover, K_eindicates a subset of tasks accepted by the edge node e. Further, D_tindicates a subset of tasks accepted at time step t. Also, D_e,tindicates a subset of tasks accepted by the edge node e at time step t.

Here, Table 4 indicates the definitions of the variables of the proposed method.

TABLE 4

Symbols
Definitions

t^sim∈ T^sim
Time step of simulator (T^sim: total number of steps )

custom-character

:= {g_e}
Agent set (1 ≤ e ≤ |E|)

s_t∈ S
State (S: state space )

custom-character

:= {

^e}
Observation of all agents

o_t^e∈ custom-character

^e
Observation of agent g_e

o_t:= {o_t^e}
Observation of all agents at time step t

custom-character

:= {

^e}
Action space of all agents ( custom-character

^e: action space of each agent )

a_t^e:= custom-character

^e
Action of agent g_e

a_t:= {a_t^e}
Action of all agents at time step t

r_t
Reward at time step t

Q_e(o_t^e, a_tⁿ)
Action value function of agent g_e

Q_tot(o_t, a_t)
Action value function of all agents

custom-character

Replay memory

|E| agents equal in number to the number of edge nodes are introduced, and the agents are allocated to task offload control for the edge nodes, respectively.

An agent g_e∈G learns a method for optimizing task offloading of the edge node e. The state is defined as follows.

$\begin{matrix} s_{t} = [D_{t}, U_{t}^{L}, U_{t}^{S}] & [Math . 34] \end{matrix}$

The observation of the agent g_eis defined as follows

$\begin{matrix} o_{t}^{e} = [D_{e, t}, U_{t}^{L}, U_{t}^{S}] & [Math . 35] \end{matrix}$

A candidate set of actions A^eis defined as a set of nodes that offload tasks.

When the edge node e does not accept the task at time step t, the agent g_eselects the action of “doing nothing.” The reward is designed to return a negative value if the constraint condition is not satisfied, and otherwise return a positive value depending on the value of the objective function.

(Formulation)

The proposed method (Coop-MADRL) performs intensive learning and distributed execution.

Algorithm 1

FIG. 12 is a diagram indicating Algorithm 1. Algorithm 1 indicates intensive learning using Coop-MADRL.

Line 1 indicates initialization of agent parameters. A series of procedures (lines 2 to 18) are repeatedly executed until learning is completed. Lines 3 to 4 indicate creation of a task and initialization of environment parameters. A series of actions is called an episode, and each episode (lines 5 to 16) is repeatedly executed.

In each episode, the agent collects training samples that are combinations of <o_t, a_t, r_t>. The time step of the network simulator is t^sim, which is reset at the beginning of each episode.

In line 7, when the edge e receives a plurality of tasks in t^sim, the agent g_eselects one task using a FIFO method.

In line 9, a random action is selected with probability ϵ, and if not, an action that maximizes the following is selected with probability 1−ϵ.

$\begin{matrix} Q_{e} (o_{t}^{e}, a^{'}) & [Math . 36] \end{matrix}$

Each agent executes lines 7 to 9 in parallel.

In line 10, the task offload is updated according to at by Algorithm 3.

Line 11 calculates the reward.

Lines 12 to 13 indicate the termination condition of agent learning.

In lines 14 and 15, if all the tasks accepted by t^simhave been allocated, the process proceed to the next t^sim+1.

Line 17 indicates storage to Replay memory M.

In line 18, all agents G are trained by the history of episodes randomly acquired from M.

Algorithm 2

FIG. 13 is a diagram indicating Algorithm 2. Algorithm 2 indicated in FIG. 13 proposes a task offloading method using Coop-MADRL.

In line 1, G is trained in advance using Algorithm 1.

Next, Algorithm 2 continuously repeats lines 2 to 9 as long as the system receives and accepts new tasks.

In line 6, each agent selects the following that maximizes Q_e(o^e, a^e).

$\begin{matrix} a_{t}^{e} & [Math . 37] \end{matrix}$

(Update Environment)
Algorithm 3

FIG. 14 is a diagram indicating Algorithm 3. Algorithm 3 indicated in FIG. 14 indicates an update procedure of environment. In Algorithm 3, the task allocation variable Y and the route allocation variable X_tare updated.

Line 1 indicates the calculation of Y.

Line 2 indicates the calculation of the following.

$\begin{matrix} U_{t}^{N} & [Math . 38] \end{matrix}$

Line 3 indicates the calculation of M_t.

Line 4 indicates the calculation of the following.

$\begin{matrix} U_{t}^{N} & [Math . 39] \end{matrix}$

Line 5 indicates the calculation of delay.

Finally, Algorithm 3 returns variables for reward calculation.

(Reward Calculation)

A reward function is designed based on the objective function (Formula 1).

Algorithm 4

FIG. 15 is a diagram indicating Algorithm 4. Algorithm 4 indicates the procedure for calculating G's reward. Eff(x) represents an efficiency function and is defined as shown in (Formula 22) in FIG. 16. FIG. 16 is a diagram indicating each formula.

The function of (Formula 22) is designed so that the efficiency becomes worse as x becomes larger.

Also, depending on x, it returns a positive value if x<0.8, and a negative value otherwise.

Note that,

$\begin{matrix} ? & [Math . 40] \end{matrix}$

$? indicates text missing or illegible when filed$

indicates an average satisfaction level of latency, and is defined as (Formula 23).

This concludes the description of the proposed method.

[Main Effects of Embodiment]

According to the present embodiment, the efficiency of task offloading can be improved by introducing a cooperative multi-agent method. That is, an agent that has learned the optimal task offload is placed at each edge. Furthermore, by introducing a mechanism in which each agent learns cooperatively, the selfish action of each agent is prevented. Thereby, the efficiency of task offloading can be improved in consideration of constraints on network usage statuses such as network topology and/or bandwidth.

Furthermore, by using deep reinforcement learning to learn in advance the relationship between network demand patterns and optimal task offloading, efficient task offloading can be quickly obtained.

[Supplement]

The present invention is not limited to the above-described embodiment, and may be configured or processed (operations) as described below.

For example, the control apparatus 50 can be implemented by a computer and a program, and the program can be recorded in a (non-transitory) recording medium or provided via a communication network such as the Internet.

REFERENCE SIGNS LIST

- 11 End device
- 12 End device
- 21 Edge server
- 22 Edge server
- 31 Cloud server
- 32 Cloud server
- 40 Physical network
- 50 Control apparatus
- 51 Observation unit
- 55 Calculation unit
- 59 Transfer unit
- 121 Edge node
- 122 Edge node
- 131 Cloud node
- 132 Cloud node
- 133 Cloud node
- 140 Modeled physical network

CONTROL APPARATUS, CONTROL METHOD, AND PROGRAM

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

PCT Information