METHOD FOR SELF-ADAPTIVE SERVICE FUNCTION CHAIN MAPPING BASED ON DEEP REINFORCEMENT LEARNING

Description

CROSS REFERENCE TO THE RELATED APPLICATION

The present application is based upon and claims priority to Chinese Patent Application No. 202210562305.6, filed on May 23, 2022, the entire content of which is hereby incorporated by reference.

TECHNICAL FIELD

The present invention relates to the technical field of service function chain mapping, and in particular to a method for self-adaptive service function chain mapping based on deep reinforcement learning.

BACKGROUND

In recent years, with the explosive growth of network users and the increasingly diversified network service demands, a rigid deployment system in which network functions need to be run by dedicated equipment in a conventional network architecture mode faces a great challenge. Especially in a data center, when a huge and complex network architecture mode faces flexible business requirements of users, resources are not allocated uniformly, which causes a decrease in the quality of service of business. Network function virtualization (NFV) techniques provide a more flexible and efficient response mode for service requests of users to solve the problem of rigid deployment of network functions. By NFV, a virtualization technology is used to convert network function software into a virtual network function (VNF) which is deployed on a general hardware platform in a virtual network function instance (VNFI) manner, such that the flexibility and expandability of network function deployment are greatly improved, and the hardware investment cost and operation and maintenance cost of network operators are reduced. In the NFV, a user initiates a service request to a network service provider, and a network service data flow sequentially passes through a series of VNFs from a source node to a destination node in a specific order. Such a chained service request is referred to as service function chain (SFC), and the network service request initiated by the user is referred to as service function chain request (SFCR). SFC technology promotes the construction of a highly extensible network business flow processing platform, and improves the speed and flexibility of processing business requests from users. However, the existing SFC mapping method does not take the unsuitability of the existing VNF for current network business requirements of a user over time into consideration, which causes the problems of waste of idle resources, low user request mapping rate and the like. Therefore, a mapping optimization method needs to be established. Based on the fact that an existing physical network topology structure is unchangeable, the VNF is redeployed by improving a deep reinforcement learning framework and collecting historical mapping data, such that the VNF can be more scientifically deployed in the physical network topology to meet service requests of users in different time periods, improve the user request mapping rate and reduce the mapping cost.

SUMMARY

For technical problems such as a difference in demands for deployed VNFs in a static network environment due to the fact that the intensity of demands of users for services changes over time, leading to an untimely response to a service request in a service node with a higher demand and a long idle period of a service node with a lower demand, which increases unnecessary service cost, the present invention provides a method for self-adaptive service function chain mapping based on deep reinforcement learning. VNFs are redeployed in a basic physical network to improve the self-adaptivity of the VNFs to service requests, thus improving the effective service cost rate and the mapping rate.

In order to achieve the above purpose, the technical scheme of the present invention is implemented as follows: Provided is a method for self-adaptive service function chain mapping based on deep reinforcement learning, comprising:

step I: establishing a service function chain (SFC) mapping model, dividing an SFC mapping process into a three-layer structure comprising a network function service request set, a request mapping layer and a basic physical network, and representing the three-layer structure with abstract parameters;

step II: building an service function chain request (SFCR) mapping learning neural network, initializing parameters of the SFCR mapping learning neural network, and mapping the abstract parameters in the step Ito a state, an action and a reward value involved in the SFCR mapping learning neural network;

step III: establishing an empirical playback pool and updating network parameters of the SFCR mapping learning neural network;

step IV: determining whether a current time slot t meets a redeployment requirement, and if not, proceeding to the step III, or otherwise, proceeding to step V;

step V: summarizing request rates and utilization rates of different virtual network functions (VNFs), a number of currently deployed VNFs and a number of unactivated VNFs based on historical SFCR mapping data stored in the empirical playback pool; and

step VI: designing a VNF redeployment strategy, and redeploying the VNFs in the basic physical network according to the data summarized in the step V.

In the step I, representing the network function service request set, the request mapping layer and the basic physical network topology with abstract parameters comprises:

- abstracting the network function service request set into SRs={SCFR₁, SCFR₂, SCFR₃. . . }, wherein SCFR₁, SCFR₂and SCFR₃respectively represent the first, the second and the third SFCRs in the set SRs; and representing the f^thservice function chain request SFCR_fwith a directed weighted graph SR^f=(V^f, E^f, d^f), wherein a virtual network function node set is V^f={v₁^f, v₂^f. . . v_l^f}, v₁^frepresents a source node of the directed weighted graph SR-f vi represents a destination node of the directed weighted graph SR^f, V_l^frepresents a number of network functions required by the service function chain request SFCR_f, and v₂^f. . . is a middle virtual network function node; virtual links e_i,j^fbetween a virtual network function node v_i^fand a virtual network function node v_j^fconsist of a virtual link set E^f={e_i,j^f|i, j≤l}; d^frepresents that the time of a service node resource and a bandwidth resource occupied by the service function chain request SFCR_fis d; a central processing unit (CPU) resource required by normal running of the virtual network function node v_i^fis vC(v_i^f), an internal memory resource demand is vM(v_i^f), and a bandwidth resource required by each virtual link e_ei,j^fis vB(e_i,j^f);
- abstracting the basic physical network into a weighted undirected graph G={N, L} representation, wherein N={n₁, n₂. . . n_m} represents a set of physical service nodes n₁, n₂. . . n_m, m is a total number of the physical service nodes, L={l_a,b|a ,b≤m} is a physical link set, and l_a,brepresents physical links between two physical service nodes n_aand n_b; various virtual network function instances (VNFIs) can be deployed on each physical service node, and a VNFI set of the physical service nodes n_ais denoted by VNFIs^a={VNF_x, p|p=0,1}; p=0 indicates that the x^thVNF is not activated, and p=1 indicates the x^thVNF is activated; x is in a range of 0 to k, and k represents the type of a VNF; a current remaining CPU resource of the physical service node n_ais C(n_a), an internal memory resource is M(n_a), a bandwidth resource of a physical link l_a,bof the physical service nodes n_aand n_bis B(l_a,b); and
- abstracting the request mapping layer into an undirected graph G^M=(V^f, N, vE), wherein the undirected graph G^Mrepresents a mapping topology graph of an SFCR in the basic physical network, V^fis a set of virtual network function nodes of the f^thservice function chain request SFCR_f, N is a physical network service node set, and vE={M_v_i_f_,n_j} represents a mapping link between the i^thvirtual network function node v_i^fand the j^thvirtual network function node n_jin the f^thservice function chain request SFCR_f.

In the virtual network function node set V^f, an order of middle virtual network function nodes v₂^f. . . v_l−1is an order where SFC network flows or business flow pass through the network functions.

In the step II, initializing parameters of the SFCR mapping learning neural network comprises: initializing a mapping learning framework: setting the mapping topology graph G^Mto null, initializing the empirical playback pool to null, randomly initializing a current strategy network parameter Bid and a current value network parameter θ^Q, and respectively copying the current strategy network parameter and the current value network parameter to a target strategy network parameter θ^μ′and a target value network parameter θ^Q′.

In the step II, mapping the abstract parameters to the SFCR mapping learning neural network comprises:

- including a network state G(t) of a physical service node in the basic physical network and a network function service request set SRs(t) at the current time slot t in each state s_t={G(t), SRs(t)} in a state space S(t)={s₁, s₂. . . s_t}, wherein the state St is an input of the SFCR mapping learning neural network;
- obtaining a mapping action α_t=μ(s_t|θ^μ) taken under each state s_taccording to an action strategy function to form an action space A(t)={a₁, a₂. . . a_t}, wherein μ() represents an action selection strategy, the mapping action is a_t={a_v, a_m, a_s}, a_vis a mapping action between a VNF and a physical service node, a_mis a mapping between a virtual link and a physical link, and a_sis a VNF activation and dormancy action; updating the mapping topology graph G^M←(G^M, a_t) based on the state of the mapping topology graph G^Mand a current mapping action a_t, and updating the network state G(t) of the physical service node and the network service function request set SRs(t) according to the updated mapping topology graph G^Mto obtain a next state s_t+1; and
- generating an instant return r(s_t, a_t) by each action, wherein reward values r_tof the instant return form a reward space R(t)={r₁, r₂. . . r_t}.

The instant return is r(s_t, a_t)=α₁Ur(t)+α₂avgM(t), weights α₁, α₂∈[0,1], and Ur(t) and avgM (t) are respectively an effective service cost rate and an average mapping rate within the current time slot t.

The effective service cost rate is

$Ur (t) = \frac{\sum_{VNFI \in G^{M}, x = 1} Cr (t)}{Co (t)},$

and the average mapping rate is

$avgM (t) = \frac{Sum (SRs (t) ❘ G^{M} (t))}{Sum (SRs (t))};$

- wherein the total service cost Co(t)=Cr(t)+Ca(t)+Cs(t) is a sum of a total running cost Cr(t), a total activation cost Ca(t) and a total installation cost Cs(t);
- the total running cost Cr(t) is

$Cr (t) = \sum_{i = 1}^{m} {VNFIs}^{i} {{VNF}_{x}^{i}, p | p = 1} \times r ({VNF}_{x}), x \leq k;$

- the total activation cost Ca(t) is:

$Ca (t) = \sum_{i = 1}^{m} ({{VNFIs}^{i} (t), p | p = 1} - {{VNFIs}^{i} (t - 1), p | p = 0}) \times a ({VNF}_{x}), ⁠ x \leq k;$

- the total installation cost Cs(t) is:

$Cs (t) = \sum_{i = 1}^{m} ({{VNFIs}^{i} (t) | {VNF}_{x}} - {{VNFIs}^{i} (t - 1) | {VNF}_{x}}) \times s ({VNF}_{x}), x \leq k;$

- wherein m represents a number of physical service nodes, VNFIsⁱrepresents a VNFI set on the i^thphysical service node, VNF_xⁱrepresents the x^thVNF in the i^thVNFI set, k represents a total number of VNF types, and r(VNF_x) represents the running cost of the x^thVNF; {VNFIs_i(t), p|p=1} represents a VNF in an activated state at the time slot t, {VNFIsⁱ(t−1), p|p=0} represents a VNF that is not activated at the time slot t−1, and a(VNF) represents the activation cost of the x^thVNF; {VNFIsⁱ(t)|VNF_x} represents a deployment condition of the x^thVNF at the time slot t, {VNFIsⁱ(t−1)|VNF_x} represents a deployment condition of the x^thVNF at the time slot t−1, and s(VNF_x) represents an installation cost of the x^thVNF.

An implementation method of the step III comprises: storing an acquired state s_t, an action a_t, a reward value r_tand a next state s_t+1into the empirical playback pool in a form of quad <s_t, a_t, r_t, s_t+1>; updating network parameters comprises: putting the current state s_iand action a_iinto a current value network to obtain Q_i=Q(s_i, μ(s_i|θ^μ)θ^Q), wherein θ^μis a current strategy network parameter, θ^Qis a current value network parameter, and Q( ) represents an action value function; randomly sampling W time period vectors from the empirical playback pool, and sending the time period vectors into a target value network for training to obtain a target value Q′_i+1=Q(s_i+1, μ′(s_i|θ^μ′)|θ^Q′), wherein θ^μ′is a target strategy network parameter, and θ^Q′is a target value network parameter; calculating a target return y_i=r_i+γQ′_i+1; finally, updating the current strategy network parameter θ^μand the current value network parameter θ^Qthrough the target return and a variance Loss=1/N×Σ_i(y_i−Q_i)²of an actual value Q_i; and updating the target strategy network parameter θ^μ′and the target value network θ^Q′of the target strategy network A′ and the target value network Q′ by setting a soft update coefficient τ and using a soft update algorithm: θ′=τθ′+(1−τ)θ′.

The step V comprises summarizing the SFCR mapping vectors sampled from the empirical playback pool in previous W time periods to obtain a quad {s_t, a_t, r_t, s_t+1}, wherein s_t, s_t+1respectively represent states at the time slots t and t+1; a_tis an action, and r_tis a reward value;

- initializing parameter arrays to be summarized to null: Res=(0, 0, 0 . . . ) , Uses=(0, 0, 0 . . . ), Va=(0, 0, 0 . . . ) and Slp=(0, 0, 0 . . .);
- traversing the sampled mapping vector groups of the previous W time periods, recording a currently traversed period number using a parameter t, summarizing statistical data corresponding to various VNFs, recording the current traversed VNFs using parameter x, thus obtaining a request rate:

$Res (x) = \frac{Sum (SRs (t), {VNF}_{x})}{Sum (SRs (t))} \times 1 0 0 %, x \leq k;$

- the utilization rate is:

$Uses (x) = \frac{Sum (G^{M} (t), {VNF}_{x})}{Sum (SRs (t))} \times 1 0 0 %, x \leq k;$

- the number of the currently deployed VNFs is: Va(x)=Sum(SRs(t)|VNF_x, p=0)+Sum(SRs(t)|VNF, p=1), x≤k;
- the number of unactivated VNFs is: Slp(x)=Sum(SRs(t)|VNF_x, p=0), x≤k;
- wherein, Sum(SRs(t),VNF_x) represents a sum of request numbers for the x^thVNF at the time slot t from the network service function request set SRs(t), and k is the total number of VNF types; Sum(SRs(t)) represents a sum of request numbers for all the VNFs in the network service function request set SRs(t) at the time slot t; Sum(G^M(t), VNF_x) represents a sum of mapping numbers of the x^thVNF in the service mapping topology graph G^M(t) at the time slot t; Sum(SRs(t)|VNF_x, p=0) represents a sum of numbers of dormancy states of the x^thVNF at the time slot t, and Sum(SRs(t)|VNF_x, p=1) represents a sum of numbers of activated states of the x^thVNF at the time slot t;
- an average of request rates is an average request rate

$AvgRes = \frac{\sum_{x = 0}^{k} Res (x)}{k};$

- an average of utilization rates is an average utilization rate

$AvgUses = \frac{\sum_{x = 0}^{k} Uses (x)}{k};$

- an average of numbers of deployed VNFs is an average number

$AvgVa = \frac{\sum_{x = 0}^{k} Va (x)}{k}$

- of deployed VNF; and
- an average of numbers of unactivated VNFs is an average number

$AvgSlp = \frac{\sum_{x = 0}^{k} Slp (x)}{k}$

of unactivated VNFs.

The VNF redeployment strategy in the step VI comprises: (1) uninstalling: if a request rate is less than 70% of the average request rate AvgRes, a utilization rate is less than 70% of the average utilization rate AvgUses, a number of deployed VNFs is greater than 120% of the average number AvgVa of deployed VNFs, and a number of unactivated VNFs is greater than 110% of AvgSlp, uninstalling 10% of unactivated VNFIs; (2) installing: if a request rate is greater than 130% of AvgRes, a utilization rate is greater than 130% of AvgUses, a number of deployed VNFs is less than 80% of AvgVa, and a number of unactivated VNFs is zero, performing an incremental deployment on the VNFIs, wherein the number of deployed VNFs is 10% of the existing number Va(x); (3) activating: if a request rate is greater than 110% of AvgRes, a utilization rate is greater than 110% of AvgUses, and there are unactivated VNFIs, activating 10% of sleeping VNFIs; and (4) sleeping: if a request rate is less than 90% of AvgRes, a utilization rate is less than 90% of AvgUses, and there are activated VNFIs, making 10% of the activated VNFIs sleeping.

Compared with the prior art, the present invention has the beneficial effects:

(1) The present invention has advantages in maintaining the stability of a network environment and improving the quality of service for a user, and can effectively solve the problems of small proportion of effective service cost and low service mapping efficiency caused by dynamic change over time in the existing mapping method.

(2) The method has good self adaptivity; an improved deep deterministic policy gradient (DDPG) is used as an SFCR mapping learning framework; the effective service cost rate and the request mapping rate are used as optimization targets; historical mapping data is used as a basis; four redeployment strategies are designed to redeploy VNFs, which can improve the effective service cost rate and the request mapping rate for processing user service requests in different time periods; compared with a DDPG algorithm and a deep Q network (DQN) method, the method improves the average effective service cost rate by up to 22.47% and the average mapping rate by up to 15.05%.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention or in the prior art, the drawings required to be used in the description of the embodiments or the prior art are briefly introduced below. It is obvious that the drawings in the description below are some embodiments of the present invention, and those of ordinary skilled in the art can obtain other drawings according to the drawings provided herein without creative efforts.

FIG. 1 is a schematic flowchart of the present invention.

FIG. 2 is a schematic structural diagram of an SFC mapping model of the present invention.

FIG. 3A is a comparison curve of simulated average effective service cost rates, wherein α₁=0.3 and α₂=0.7; FIG. 3B is a comparison curve of simulated average effective service cost rates, wherein α₁=0.5 and α₂=0.5; FIG. 3C is a comparison curve of simulated average effective service cost rates, wherein α₁=0.7 and α₂=0.3.

FIG. 4A is a comparison curve of simulated average mapping rates, wherein α₁=0.3 and α₂=0.7; FIG. 4B is a comparison curve of simulated average mapping rates, wherein α₁=0.5 and α₂=0.5; FIG. 4C is a comparison curve of simulated average mapping rates, wherein α₁=0.7 and α₂=0.3.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The technical schemes in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present invention.

In most existing mapping methods, when a state of a network environment is known and fixed, the mapping rate and reliability of network services are effectively improved, but it does not take into account the fact that the existing network environment is not suitable for a demand for network business of a current user due to dynamic changes over time, causing overstock of network business requests, resulting in an excessive network link load and affecting the stability of the network environment. Furthermore, during mapping of SFCs, continuously improving the service request mapping rate ignores the service running and maintenance cost, resulting in high SFC mapping cost. The present invention provides a method for self-adaptive service function chain mapping based on deep reinforcement learning, as shown in FIG. 1, comprising the following specific workflow procedures:

Step I: establishing a service function chain (SFC) mapping model, dividing an SFC mapping process into a three-layer structure comprising a network function service request set, a request mapping layer and a basic physical network topology, and representing the three-layer structure with abstract parameters.

The SFC mapping model graph is as shown in FIG. 2, and specific procedures are as follows:

The network function service request set is abstracted into SRs={SCFR₁, SCFR₂, SCFR₃. . . }, wherein SCFR₁, SCFR₂and SCFR₃respectively represent the first, the second and the third SFCRs in the set SRs. Due to uncertainty of user demands, the number of SFCR contained in the service request set SRs may be different. The f^thservice function chain request SFCR_fis represented with a directed weighted graph SR^f=(V^f, E^f, d^f), wherein V^f={v₁^f, v₂^f. . . v_l^f} represents a virtual network function node set, v₁^frepresents a source node of the directed weighted graph SR^f, v_l^frepresents a destination node of the directed weighted graph SR^f, l represents a number of network functions required by the service function chain request SFCR_f, and an order of middle nodes is an order where SFC network flows or business flows pass through network functions; a virtual link set is E^f={e_i,j^f|i,j≤l}, wherein e_i,j^frepresents virtual links between virtual network function nodes v_i^fand v_j^f; d^frepresents that the time of a service node resource and a bandwidth resource occupied by the service function chain request SFCR_fis d; a central processing unit (CPU) resource vC(v_i^f) and an internal memory resource demand vM(v_i^f) required by normal running of each virtual network function node v_i^f(v_i^f∈V^f) are represented by (vC(v_i^f), vM(v_i^f)) and a bandwidth resource required by each virtual link e_i,j^f(e_i,j^f∈E^f) is represented by vB(e_i,j^f). During subsequent mapping of SFCRs, the nodes and links which satisfy normal running resources are selected for mapping. If there are no nodes and links satisfying normally running resources, the mapping fails.

The basic physical network topology is abstracted to be represented by a weighted undirected graph G={N, L}, wherein N represents a set of physical service nodes n₁, n₂. . . n_mwhich can be represented by N={n₁, n₂, . . . n_m}, m is a total number of physical service nodes, L={l_a,b|a,b≤m} is a physical link set, and l_a,brepresents physical links between two physical service nodes n_aand n_b. Various virtual network function instances (VNFIs) can be deployed on each physical service node, and a VNFI set of the physical service nodes n_ais denoted by VNFIs^a={VNF_x, p|p=0,1}. p=0 indicates that the x^thVNF is not activated, and the network function service cannot be provided. p=1 indicates that the x^thVNF is activated, and the network function service can be provided. x is in a range of 0 to k, and k represents the type of a VNF. The current remaining CPU resources and internal memory resources of the physical service node n_aare represented by (C(n_a),M(n_a)). The remaining bandwidth resources of the physical link l_a,bof the physical service nodes n_aand n_bare represented by B(l_a,b). During subsequent mapping of SFCRs, the remaining resources of nodes and links will be conditions for determining whether the nodes and links can be mapped. If the remaining resources of a physical node and link are more than resources required by normal running of an SFCR, the SFCR can be mapped to the physical node and link. If the physical node and link do not have resources satisfying the normal running of the SFCR, the SFCR cannot be mapped to the basic physical network, and the mapping fails.

The request mapping layer is abstracted into an undirected graph G^M=(V^f,N, vE) representing a service mapping graph of an SFCR in the basic physical network, wherein V^fis a set of virtual network function nodes of the f^thnetwork function service request SFCR_f, N is a physical network function service node set, and vE={M_v_i_f_,n_j} represents a mapping link between the i^thvirtual network function node v_i^fand the j^thvirtual network function node n_jin the f^thnetwork function service request SFCR_f.

Step II: building an service function chain request (SFCR) mapping learning neural network, initializing parameters of the SFCR mapping learning neural network, and mapping the abstract parameters in the step Ito a state, an action and a reward value involved in the SFCR mapping learning neural network.

Specific procedures are as follows:

Initializing a mapping learning framework: the mapping topology graph G^Mis set to null, the empirical playback pool is initialized to null, a current strategy network parameter θ^μand a current value network parameter θ^{Q a}re randomly initialized, and the current strategy network parameter and the current value network parameter are respectively copied to a target strategy network parameter θ^μ′and a target value network parameter θ^Q′.

In a state space S(t)={s₁, s₂. . . s_t}, each state is s_t={G(t),SRs(t)} , including a network state G(t) of a physical service node in the basic physical network and a network function service request set SRs(t) at the current time slot t, wherein the state s_tis an input in the SFCR mapping learning neural network.

A mapping action a_t=μ(s_t|θ^μ) taken under each state can be obtained according to an action strategy function to form an action space A(t)={a₁, a₂. . . a_t}, wherein μ() represents an action selection strategy. The mapping action is a_t={a_v, a_m, a_s}, a_vis a mapping action between a VNF and a physical service node in the basic physical network, a_mis mapping between a virtual link and a physical link, and a_sis a VNF activation and dormancy action. The mapping topology graph G_M←(G_M, a_t) is updated based on the state of the mapping topology graph G_Mand a current mapping action a_t, and the network state G(t) of the physical service node and the network service function request set SRs(t) are updated according to the updated mapping topology graph G_Mto obtain a next state s_t±1.

Each action generates an instant return r(s_t, a_t), wherein recorded reward values r_tof the instant return may form a reward space R(t)={r₁, r₂. . . r_t}. The reward value of the present invention aims at the effective service cost rate and the mapping rate. The instant return is r(s_t, a_t)=α₁Ur(t)+α₂avgM(t), α₁, 60₂∈[0,1] and α₁, α₂are weights. Larger weights indicate higher emphasis to which item's influence on a final mapping result, wherein Ur(t) and avgM(t) are respectively the effective service cost rate and the average mapping rate within the current time slot t. Calculation processes are shown in formulas (1), (2) and (3):

$\begin{matrix} Ur (t) = \frac{\sum_{VNFI \in G^{M}, x = 1} Cr (t)}{Co (t)} & (1) \end{matrix}$

$\begin{matrix} avgM (t) = \frac{Sum (Srs (t) ❘ G^{M} (t))}{Sum (SRs (t))} & (2) \end{matrix}$

$\begin{matrix} Co (t) = Cr (t) + Ca (t) + Cs (t) & (3) \end{matrix}$

Within the time slot t , Ur(t) is dividing the total running cost of the activated VNFs by the total service cost; avgM(t) is dividing the total number of SFCRs by the number of the successfully mapped network function service requests SFCR; the total service cost Co(t) is a sum of the total running cost Cr(t) , the total activation cost Ca(t) and the total installation cost Cs(t) ; and the calculation formulas of the total running cost Cr(t) , the total activation cost Ca(t) and the total installation cost Cs(t) are shown in formulas (4), (5) and (6):

$\begin{matrix} Cr (t) = \sum_{i = 1}^{m} {VNFIs}^{i} {{VNF}_{x}^{i}, p ❘ p = 1} \times r ({VNF}_{x}), x \leq k & (4) \end{matrix}$

$\begin{matrix} Ca (t) = \sum_{i = 1}^{m} ({{VNFIs}^{i} (t), p ❘ p = 1} - {{VNFIs}^{i} (t - 1), p ❘ p = 0}) \times a ({VNF}_{x}), x \leq k & (5) \end{matrix}$

$\begin{matrix} Cs (t) = {  \sum}_{i = 1}^{m} ({{VNFIs}^{i} (t) ❘ {VNF}_{x}} - {{VNFIs}^{i} (t - 1) ❘ {VNF}_{x}}) \times s ({VNF}_{x}), ⁠ x \leq k & (6) \end{matrix}$

In the formulas, m represents a number of physical service nodes, VNFIsⁱrepresents a VNFI set on the i^thphysical service node, VNF_xⁱrepresents the x^thVNF in the i^thVNFI set, k represents a total number of VNF types, and r(VNF_x) represents the running cost of the x^thVNF; {VNFIsⁱ(t), p|p=1} represents a VNF in an activated state at the time slot t, {VNFIsⁱ(t−1), p|p=0} represents a VNF that is not activated at the time slot t−1, and a(VNF_x) represents the activation cost of the x^thVNF; {VNFIsⁱ(t)|VNF_x} represents a deployment condition of the x^thVNF at the time slot t, {VNFIsⁱ(t−1)|VNF_x} represents a deployment condition of the x^thVNF at the time slot t−1, and s(VNF_x) represents an installation cost of the x^thVNF.

Step III: establishing an empirical playback pool and updating network parameters of the SFCR mapping learning neural network.

Specific procedures are as follows:

The state, action, reward value and next state acquired in the step II are stored into the empirical playback pool in a form of quad <s_t, a_t, r_t, s_t+1>. When the current network is updated, the current state s_iand action a_iare put into a current value network to obtain an actual value Q_i=Q(s_i, μ(s_i|θ^μ)|θ^Q), wherein θ^μis a current strategy network parameter, θ^Qis a current value network parameter, and Q() represents an action value function. W time period vectors are randomly sampled from the empirical playback pool Exp and sent into a target value network for training to obtain a target value Q′_i+1=Q (s_i+1, μ′(s_i|θ^μ′)|θ^Q′). A target return y_iis then calculated according to y_i=r_i+γQ′_i+1. Finally, the mapping parameters θ^μand θ^Qin the current network are updated according to the target return and a variance Loss=1|N×Σ_i(y_i−Q_i)²of the actual target value Q_i. The mapping parameters θ^μ′and θ^Q′of the target strategy network and the target value network are updated by setting a soft update coefficient τ and using a soft update algorithm, which can be represented by θ′=τθ′+(1−τ)θ′, wherein the soft update coefficient τ is usually 0.001.

In addition, VNF redeployment is performed once every 50 time periods. A number of currently traversed periods is recorded using a parameter t. Whether the current time slot t is a fold of 50 is determined. If yes, the step III is executed; otherwise, the step IV is executed.

Step IV: request rates Res and utilization rates Uses of different virtual network functions (VNFs), a number of currently deployed VNFs Va and a number of unactivated VNFs Slp are summarized based on historical SFCR mapping data stored in the empirical playback pool to redeploy the VNFs.

Specific procedures are as follows:

The SFCR mapping vectors within the previous W time periods are sampled from the empirical playback pool to form a sample set Temp ={s_t, a_t, r_t, s_t+1}.

Parameter arrays to be summarized are initialized to null: Res =(0, 0, 0 . . . ) Uses=(0, 0, 0 . . . ), Va=(0, 0, 0 . . . ) and Slp=(0, 0, 0 . . . ) . The sampled mapping vector groups of the previous W time periods are traversed, a currently traversed period number is recorded using a parameter t, statistical data corresponding to various VNFs are summarized, and the current traversed VNFs are recorded using parameter x. The summarizing formulas are shown in formulas (8), (9), (10) and (11):

$\begin{matrix} Res (x) = \frac{Sum (SRs (t), {VNF}_{x})}{Sum (SRs (t))} \times 100 %, x \leq k & (8) \end{matrix}$

$\begin{matrix} Uses (x) = \frac{Sum (G^{M} (t), {VNF}_{x})}{Sum (SRs (t))} \times 100 %, x \leq k & (9) \end{matrix}$

$\begin{matrix} Va (x) = Sum (SRs (t) ❘ {VNF}_{x}, p = 0) + Sum (SRs (t) ❘ {VNF}_{x}, p = 1), x \leq k & (10) \end{matrix}$

$\begin{matrix} Slp (x) = Sum (SRs (t) ❘ {VNF}_{x}, p = 0), x \leq k & (11) \end{matrix}$

In the formulas, Sum(SRs(t),VNF_x) represents a sum of request numbers for the x^thVNF at the time slot t from the network service function request set SRs(t) , and k is the total number of VNF types; Sum(SRs(t)) represents a sum of request numbers for all the VNFs in the network service function request set SRs(t) at the time slot t; Sum(G^M(t),VNF_x) represents a sum of mapping numbers of the x^thVNF in the service mapping topology graph G^M(t) at the time slot t; Sum(SRs(t)|VNF_x, p=0) represents a sum of numbers of dormancy states of the x^thVNF at the time slot t, and Sum(SRs(t)|VNF_x, p=1) represents a sum of numbers of activated states of the x^thVNF at the time slot t.

Average values of the data are recorded by an average request rate AvgRes, an average utilization rate AvgUses, an average number AvgVa of deployed VNFs and an average number AvgSlp of unactivated VNFs, so as to design a redeployment strategy. The calculation formulas are as follows:

$\begin{matrix} AvgRes = \frac{\sum_{x = 0}^{k} Res (x)}{k} & (12) \end{matrix}$

$\begin{matrix} Avg Uses = \frac{\sum_{x = 0}^{k} Uses (x)}{k} & (13) \end{matrix}$

$\begin{matrix} AvgVa = \frac{\sum_{x = 0}^{k} Va (x)}{k} & (14) \end{matrix}$

$\begin{matrix} AvgSlp = \frac{\sum_{x = 0}^{k} Slp (x)}{k} & (15) \end{matrix}$

Step V: designing a VNF redeployment strategy, and redeploying the VNFs in the basic physical network according to the data used in the step IV. Specific procedures are as follows:

TimSort merging sorting is performed on Uses according to an ascending order of the utilization rates of different VNFs in a mapping process, and TimSort can quickly complete the sorting. According to the deployment strategy of deployment after sorting, the VNFs with low utilization rate can be uninstalled first to reserve an installation space for the VNFs with high utilization rate, so as to accelerate the VNF redeployment progress. Four redeployment strategies are designed to redeploy different VNFs. For the x^thVNFs, the four strategies are as follows: (1) uninstalling: if a request rate is less than 70% of the average request rate AvgRes, a utilization rate is less than 70% of the average utilization rate AvgUses, a number of deployed VNFs is greater than 120% of the average number AvgVa of deployed VNFs, and a number of unactivated VNFs is greater than 110% of AvgSlp, as shown in formula (16), 10% of unactivated VNFIs are uninstalled; (2) installing: if a request rate is greater than 130% of AvgRes, a utilization rate is greater than 130% of AvgUses , a number of deployed VNFs is less than 80% of AvgVa, and a number of unactivated VNFs is zero, as shown in formula (17), an incremental deployment on the VNFIs is performed, wherein the number of deployed VNFs is 10% of the existing number Va(x); (3) activating: if a request rate is greater than 110% of AvgRes , a utilization rate is greater than 110% of AvgUses, and there are unactivated VNFIs, as shown in formula (18), 10% of sleeping VNFIs are activated; and (4) sleeping: if a request rate is less than 90% of AvgRes, a utilization rate is less than 90% of AvgUses, and there are activated VNFIs, as shown in formula (19), 10% of the activated VNFIs are made sleeping.

Res(x)≤AvgRes×0.7ANDUses(x)≤AvgUses×0.7AND Va(x)≥AvgVa×1.2ANDSlp(x)≥AvgSlp×1.1 (16)

Res(x)≥AvgRes×1.3ANDUses(x)≥AvgUses×1.3AND Va(x)≤AvgVa×0.8ANDSlp(x)==0 (17)

Res(x)≥AvgRes×1.1ANDUses(x)≥AvgUses×1.1ANDSlp(x)≠0 (18)

After mapping training and continuous redeployment and convergence of the VNFs according to the historical data, a test is performed in terms of the effective service utilization rate and the mapping rate.

The influence of ratio weights of two indexes, i.e., the effective service cost rate and the mapping rate, on optimization effects of the method disclosed herein is considered in a design experiment. There are totally three groups of experimental environments: (1) weight α₁=0.3, α₂=0.7; (2) α₁=0.5, α₂=0.5; (3) α₁=0.7, α₂=0.3, and the average effective service cost rate and the average mapping rate are taken as investigation targets. The weights α₁and α₂respectively represent the influence of the effective service cost rate and the mapping rate on the final optimization result, and a comparison experiment with the DDPG method and the DQN method is conducted. Results of the three methods in three experimental environments within 500 time periods are selected, and the test results are as shown in FIGS. 3A-4C. As can be seen from FIGS. 3A-3C, as the effective service cost rate weight α₁increases, the converged average effective service cost rate gradually increases. Compared with DDPG and DQN, the ISM-DRL method disclosed herein enhances the effect most significantly. The maximum 67% at the weight α₁=0.3 is increased to 84% at the weight α₁=0.7, which is increased by about 17%. The method has higher effective cost utilization efficiency. As can be seen from FIGS. 4A-4C, as the evaluated mapping ratio weight α₂decreases, the decrease of the ISM-DRL method provided by the weight after convergence is minimum. As the 75% at α₂=0.7 is decreased to the 50% at α₂=0.3, the method demonstrates better resistance to mapping interference.

In the present invention, the problem of mapping of service function chains is decomposed into SFCR mapping and VNF redeployment. The improved DDPG is used as an SFCR mapping learning framework. Improving the effective service cost rate and the average mapping rate is taken as an optimization target to approximately solve an optimal mapping strategy of a current network. The historical mapping data is acquired from the empirical playback pool. The request rates and the utilization rates of different VNFs, the number of deployed VNFs and the number of unactivated VNFs are calculated according to the historical mapping data. Four redeployment strategies are designed to redeploy the VNFs on the basic physical network. Therefore, the self adaptivity of the VNFs to service requests is improved, thus increasing the effective service cost rate and the mapping rate. Furthermore, the effective service cost rate is a ratio of the mapping cost actually used in the mapping process to the total cost.

The above mentioned contents are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent substitution, improvement, etc., made within the spirit and principle of the present invention shall all fall within the scope of protection of the present invention.

Claims

1. A method for a self-adaptive service function chain (SFC) mapping based on a deep reinforcement learning, comprising: step I: establishing a SFC mapping model, dividing an SFC mapping process into a three-layer structure comprising a network function service request set, a request mapping layer, and a basic physical network, and representing the three-layer structure with abstract parameters;step II: building a service function chain request (SFCR) mapping learning neural network, initializing parameters of the SFCR mapping learning neural network, and mapping the abstract parameters in the step Ito a state, an action, and a reward value involved in the SFCR mapping learning neural network;step III: establishing an empirical playback pool and updating network parameters of the SFCR mapping learning neural network;step IV: determining whether a current time slot t meets a redeployment requirement, and if not, returning to the step III, or otherwise, proceeding to step V;step V: summarizing request rates and utilization rates of different virtual network functions (VNFs), a number of currently deployed VNFs, and a number of unactivated VNFs based on historical SFCR mapping data stored in the empirical playback pool;and step VI: designing a VNF redeployment strategy, and redeploying the VNFs in the basic physical network according to data summarized in the step V.
2. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 1, wherein in the step I, the representing of the three-layer structure with the abstract parameters comprises: abstracting the network function service request set into a set SRs={SCFR1, SCFR2, SCFR3 . . . }, wherein SCFR1, SCFR2, and SCFR3 respectively represent first, the second, and the third SFCRs in the set SRs; and representing a fth service function chain request SFCRf with a directed weighted graph SRf=(Vf, Ef, df), wherein a virtual network function node set is Vf={v1f, v2f . . . vlf}, v1f represents a source node of the directed weighted graph SRf, vlf represents a destination node of the directed weighted graph SRf, l represents a number of network functions required by the f th service function chain request SFCRf, and v2f . . . vl−1f are middle virtual network function nodes; virtual links ei,jf between an ith virtual network function node vif and a jth virtual network function node vjf consist of a virtual link set Ef={ei,jf|i,j≤l}; df represents that the time of a service node resource and a bandwidth resource occupied by the fth service function chain request SFCRf is d; a central processing unit (CPU) resource required by normal running of the ith virtual network function node vif is vC(vif), an internal memory resource demand is vM(vif), and a bandwidth resource required by each virtual link ei,jf is vB(ei,jf);abstracting the basic physical network into a weighted undirected graph G={N, L} representation, wherein N={n1, n2 . . . nm} represents a set of physical service nodes n1, n2 . . . nm, m is a total number of the physical service nodes, L={la,b|a,b≤m} is a physical link set, and la,b represents physical links between two physical service nodes na and nb; various virtual network function instances (VNFIs) is allowed to be deployed on each physical service node, and a VNFI set of the physical service nodes na is denoted by VNFIsa={VNFs, p|p=0,1}; p=0 indicates that an xth VNF is not activated, and p=1 indicates the xth VNF is activated; x is in a range of 0 to k, and k represents a type of a VNF; a current remaining CPU resource of the physical service node na is C(na), an internal memory resource is M(na), a bandwidth resource of a physical link la,b of the physical service nodes na and nb is B(la,b); andabstracting the request mapping layer into an undirected graph GM=(Vf, N, vE), wherein the undirected graph GM represents a mapping topology graph of an SFCR in the basic physical network, Vf is a set of virtual network function nodes of the fth service function chain request SFCRf, N is a physical network service node set, and vE={Mvif, nj} represents a mapping link between the ith virtual network function node vif and the jth virtual network function node nj in the fth service function chain request SFCRf.
3. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 2, wherein in the virtual network function node set Vf, an order of the middle virtual network function nodes v2f . . . vl−1f is an order where SFC network flows or business flows pass through the network functions.
4. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 1, wherein in the step II, the initializing of the parameters of the SFCR mapping learning neural network comprises: initializing a mapping learning framework by setting the mapping topology graph GM to null, initializing the empirical playback pool to null, randomly initializing a current strategy network parameter θμ and a current value network parameter θQ, and respectively copying the current strategy network parameter and the current value network parameter to a target strategy network parameter θμ′ and a target value network parameter θQ′.
5. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 2, wherein in the step II, the initializing of the parameters of the SFCR mapping learning neural network comprises: initializing a mapping learning framework by setting the mapping topology graph GM to null, initializing the empirical playback pool to null, randomly initializing a current strategy network parameter θμ and a current value network parameter θQ, and respectively copying the current strategy network parameter and the current value network parameter to a target strategy network parameter θμ′ and a target value network parameter θQ′.
6. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 3, wherein in the step II, the initializing of the parameters of the SFCR mapping learning neural network comprises: initializing a mapping learning framework by setting the mapping topology graph GM to null, initializing the empirical playback pool to null, randomly initializing a current strategy network parameter θμ and a current value network parameter θQ, and respectively copying the current strategy network parameter and the current value network parameter to a target strategy network parameter θμ′ and a target value network parameter θQ′.
7. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 4, wherein in the step II, mapping the abstract parameters to the SFCR mapping learning neural network comprises: including a network state G(t) of a physical service node in the basic physical network and a network function service request set SRs(t) at the current time slot t in each state st={G(t),SRs(t)} in a state space S(t)={s1, s2 . . . st}, wherein the state st is an input of the SFCR mapping learning neural network;obtaining a mapping action at=μ(st|θμ) taken under each state st according to an action strategy function to form an action space A(t)={a1, a2 . . . at}, wherein μ() represents an action selection strategy, the mapping action is at={av, am, as}, av is a mapping action between a VNF and a physical service node, am is a mapping between a virtual link and a physical link, and as is a VNF activation and dormancy action;updating the mapping topology graph GM←(GM, at) based on a state of the mapping topology graph GM and a current mapping action at, and updating the network state G(t) of the physical service node and a network service function request set SRs(t) according to the updated mapping topology graph GM to obtain a next state st+1; andgenerating an instant return r(st, at) by each action, wherein reward values rt of the instant return form a reward space R(t)={r1, r2 . . . rt}.
8. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 7, wherein the instant return is r(st, at)=α1Ur(t)+α2avgM(t), weights α1, α2∈[0,1] and Ur(t) and avgM (t) are respectively an effective service cost rate and an average mapping rate within the current time slot t.
9. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 8, wherein the effective service cost rate is
10. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 4, wherein an implementation method of the step III comprises: storing an acquired state st, an action at, a reward value rt and a next state st+1 into the empirical playback pool in a form of a quad <st, at, rt, st+1>; updating network parameters by putting a current state si and a current action ai into a current value network to obtain Qi=Q(si, μ(si|θμ)|θQ), wherein θμ is a current strategy network parameter, θQ is a current value network parameter, and Q() represents an action value function; randomly sampling W time period vectors from the empirical playback pool, and sending the time period vectors into a target value network for training to obtain a target value Q′i+1=Q(si+1, μ′(si|θμ′)|θQ′), wherein θμ is a target strategy network parameter, and θQ′ is a target value network parameter; calculating a target return yi=ri+γQ′i+1; finally, updating the current strategy network parameter θμ and the current value network parameter θQ through the target return and a variance Loss=1/N×Σi(yi−i)2 of an actual value Qi; and updating the target strategy network parameter θμ′ and the target value network parameter θQ′ of a target strategy network A′ and the target value network Q′ by setting a soft update coefficient τ and using a soft update algorithm: θ′=τθ′+(1−τ)θ′.
11. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 7, wherein an implementation method of the step III comprises: storing an acquired state st, an action at, a reward value rt and a next state st+1 into the empirical playback pool in a form of a quad <st, at, rt, st+1>; updating network parameters by putting a current state 5, and a current action a, into a current value network to obtain Qi=Q(si, μ(si|θμ)|θQ), wherein θμ is a current strategy network parameter, θQ is a current value network parameter, and Q() represents an action value function; randomly sampling W time period vectors from the empirical playback pool, and sending the time period vectors into a target value network for training to obtain a target value Q′i+1=Q(si+1, μ′(si|θμ′)|θQ′), wherein θμ is a target strategy network parameter, and θQ′ is a target value network parameter; calculating a target return yi=ri+γQ′i+1; finally, updating the current strategy network parameter θμ and the current value network parameter θQ through the target return and a variance Loss=1/N ×Σi(yi−Qi)2 of an actual value Qi; and updating the target strategy network parameter θμ′ and the target value network parameter θQ′ of a target strategy network A′ and the target value network Q′ by setting a soft update coefficient τ and using a soft update algorithm: θ′=τθ′+(1−τ)θ′.
12. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 8, wherein an implementation method of the step III comprises: storing an acquired state s an action at, a reward value rt and a next state st+1 into the empirical playback pool in a form of a quad <st, at, rt, st+1>; updating network parameters by putting a current state 5, and a current action ai into a current value network to obtain Qi=Q(si, μ(si|θμ)|θμ)|θQ), wherein θμ is a current strategy network parameter, θQ is a current value network parameter, and Q() represents an action value function; randomly sampling W time period vectors from the empirical playback pool, and sending the time period vectors into a target value network for training to obtain a target value Q′i+1=Q(si, μ′(si|θμ′)|θQ′), wherein θμ′ is a target strategy network parameter, and θQ′ is a target value network parameter; calculating a target return yi=ri+γQ′i+1; finally, updating the current strategy network parameter θμ and the current value network parameter θQ through the target return and a variance Loss=1/N×Σi(yi−Qi)2 of an actual value Qi; and updating the target strategy network parameter θμ′ and the target value network parameter θQ′ of a target strategy network A′ and the target value network Q′ by setting a soft update coefficient τ and using a soft update algorithm: θ′=τθ′+(1−τ)θ′.
13. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 10, wherein the step V comprises summarizing SFCR mapping vectors sampled from the empirical playback pool in previous W time periods to obtain a quad {st, at, rt, st+1}, wherein st, st+1 respectively represent states at the current time slot t and the time slot t+1; at is the action, and rt is the reward value; initializing parameter arrays to be summarized to null: Res=(0, 0, 0 . . . ) Uses=(0,0 , 0 . . . ), Va=(0, 0, 0 . . . ) , and Slp=(0, 0, 0 . . . );traversing sampled mapping vector groups of the previous W time periods, recording a currently traversed period number using a parameter t, summarizing statistical data corresponding to various VNFs, recording current traversed VNFs using a parameter x, thus obtaining the request rates:
14. The method for the self-adaptive service function chain mapping based on the deep reinforcement learning according to claim 13, wherein the VNF redeployment strategy in the step VI comprises: (1) uninstalling: if a request rate is less than 70% of the average request rate AvgRes, a utilization rate is less than 70% of the average utilization rate AvgUses, a number of the deployed VNFs is greater than 120% of the average number AvgVa of the deployed VNFs, and a number of the unactivated VNFs is greater than 110% of AvgSlp, uninstalling 10% of the unactivated VNFIs; (2) installing: if the request rate is greater than 130% of AvgRes, the utilization rate is greater than 130% of AvgUses, the number of deployed VNFs is less than 80% of AvgVa, and the number of the unactivated VNFs is zero, performing an incremental deployment on the VNFIs, wherein the number of deployed VNFs is 10% of an existing number Va(x); (3) activating: if the request rate is greater than 110% of AvgRes, the utilization rate is greater than 110% of AvgUses, and there are the unactivated VNFIs, activating 10% of sleeping VNFIs; and (4) sleeping: if the request rate is less than 90% of AvgRes, the utilization rate is less than 90% of AvgUses, and there are activated VNFIs, making 10% of the activated VNFIs sleeping.

Priority Claims (1)

Number	Date	Country	Kind
202210562305.6	May 2022	CN	national

METHOD FOR SELF-ADAPTIVE SERVICE FUNCTION CHAIN MAPPING BASED ON DEEP REINFORCEMENT LEARNING

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)