Efficient parallelization and deployment method of multi-objective service function chain based on CPU + DPU platform

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application Ser. No. CN202211352904.1 filed on 1 Nov. 2022.

TECHNICAL FIELD

The present invention belongs to a service function chain orchestration technology, and particularly relates to an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform.

BACKGROUND

The traditional mode of completing data forwarding based on a central process unit (CPU) has a bottleneck, and a computing architecture only composed of CPU cannot meet diversified scenarios and business requirements. On one hand, as Moore's law is slowed down, the network bandwidth and the connection number are increased more widely and more densely than a data path, computing nodes at ends, edges and cloud are directly exposed under the increased data volume, and the computing power increasing speed of the CPU and the network bandwidth increasing speed show a “scissor difference”; and on the other hand, in a high concurrent forwarding work, it is difficult for the serial computing mode of the CPU to exert the maximum computing power.

The advent of network function virtualization (NFV) provides a novel way to design, coordinate, deploy, and standardize various mobile services to support increasingly complex and diverse service requests, thereby making SFC deployment more flexible and agile. A current service function chain (SFC) deployment system focuses on optimizing the network resource utilization ratio, and does not consider the diversity of service requirements and the degradation of service forwarding performance.

A cloud architecture based on a data processing unit (DPU) uninstalls and accelerates the network and storage system as required, compared with a graphics processing unit (GPU), a field programmable gate array (FPGA) and an application-specific integrated circuit (ASIC), the higher-cost host CPU computing power is liberated, and the ultimate performance can be achieved more economically and effectively. The architecture transformation caused by DPU improves the energy efficiency cost ratio of the whole data center resource pool and the profit cost ratio of public cloud manufacturers to a certain extent, and the DPU can cover more and more demands and scenarios in the future. In this regard, the research content of the present invention is to solve some problems faced by the current traditional computing architecture and the traditional SFC deployment system based on DPU.

SUMMARY

An objective of the present invention is: to solve the problems of reduced forwarding performance, diversified scenarios and requirements and the like of the current traditional computing architecture and the traditional SFC deployment system, the present invention provides an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform.

The technical solution is: an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform includes the following steps:

- (1) constructing a heterogeneous computing architecture for solving the multi-objective deployment problem, where the heterogeneous computing architecture comprises an orchestrator responsible for management and coordination, and a server based on a CPU+DPU structure;
- (2) converting serial SFCs into parallel SFCs by an SFC parallel strategy according to the independence among virtual network functions (VNF), and accordingly solving the SFC parallel problem;
- where an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:
  
  min(αC+βΔd)
- where α and β respectively represent weight coefficients of the additional resource consumption and delay,

$C = \sum_{μ = 1}^{❘ R ❘} \sum_{\forall B \in r_{μ}} 6 4 \times \frac{Φ_{B} - 1}{U}$

represents an additional resource consumption caused by copying and merging of the data packet, and

$Δ d = \sum_{μ = 1}^{❘ R ❘} 3 0 \times \frac{v_{μ}}{1 0^{9}}$

represents an additional delay caused by copying and merging of the data packet;

- an objective of the SFC parallel strategy is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel;
- (3) for the situation that a plurality of SFCs arrive at the same time at a certain moment, adopting a VNF topological order algorithm, deploying the VNFs according to a topological order obtained by the algorithm, and reducing the delay by combining the sharing and scaling characteristics of the VNFs,
- where the VNF topological order algorithm is used for constructing a VNF topological order for the plurality of SFCs and deploying the VNFs according to the order to reduce the delay;
- (4) adopting a DPU processing strategy for a service request with a high real-time requirement, where the DPU processing strategy is used for processing computing tasks including network protocol processing, data encryption and decryption and data compression to save a CPU computing resource and reduce the delay;
- in a case that there is a high-priority request in the request set that arrives at a certain moment, determining whether there is a VNF responsible for the data processing task in the request, using DPU for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and using CPU for processing in a case that there is no VNF responsible for the data processing task in the request; and
- (5) proposing an SFC heterogeneous deployment algorithm based on deep reinforcement learning, where the heterogeneous algorithm is capable of deploying the SFC respectively according to the number and situation of the requests arriving at different moments,
- according to the number and situation of the requests arriving at different moments, the SFC heterogeneous deployment algorithm based on deep reinforcement learning is processed with different strategies respectively, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy to achieve deployment of SFC.

Further, the heterogeneous computing architecture in step (1) is specifically described as follows:

- the architecture is composed of an orchestrator and a server, where the orchestrator includes an SFC deployment algorithm module, a resource management module and a driver module; and the server includes a CPU, a DPU and a bus for connecting the CPU and the DPU. The orchestrator is responsible for managing and deploying the arrived SFC, and the server composed of a heterogeneous computing structure is responsible for sequentially processing VNFs in different SFCs according to the deployment strategy conveyed by the orchestrator. The specific task of the orchestrator includes: receiving an SFC request from a network operator and operating the SFC deployment algorithm to determine which SFCs are accepted and how to place these SFCs. To consider different situations of different requests, the method respectively calls the parallel strategy, the VNF topological order strategy and the DPU processing strategy so as to obtain an optimal deployment scheme of each request, then a resource management module is invoked to manage resources, and finally, a driver module is invoked to transmit the deployment scheme to a server for placement, and the server completes the deployment of SFC by using the CPU or the DPU respectively according to the deployment scheme.

Further, the multi-objective SFC deployment problem in step (1) is specifically described as follows:

- an objective function of the SFC deployment problem is as follows:

$\min (f_{1} + f_{2} + f_{3}) s . t . C_{1}, C_{2}, C_{3} {\begin{matrix} f_{1} = \sum_{r_{μ} \in R} D_{μ} \\ f_{2} = \sum_{r_{μ} \in R} y_{r_{μ}} B_{μ^{τ} r} \\ f_{3} = C (τ) \end{matrix}$

For f₁, D_μ is a total response delay, that is,

D_μ=L_μ+P_μ+T_μ+W_q

- where L_μ=Σ_e_h_μ∈E_μΣ_e_j_∈Ex_e_h_μ^e^jD_e_jis a communication delay, P_μ=Σ_f_v_μ_∈F_μΣ_n_i_∈Nx_f_v_μⁿⁱ.

$\frac{1}{\frac{η_{m_{i}}^{μ} c_{m_{i}}}{w_{m_{i}}^{μ}} - λ_{μ} + ε}$

is a processing delay,

$T_{μ} = \sum_{f_{v}^{μ} \in F_{μ}} \frac{U}{v_{μ}}$

is a transmission delay, and W_q is an average queuing delay.

For f₂, Σ_r_μ_∈Ry_r_μB_μτ_ris a total throughput, the binary variable y_r_μ represents whether r_μ is accepted, B_μ is a minimum bandwidth of the SFC, and τ_r=l*Δ represents the survival time of the SFC.

For f₃, C(τ) represents a total deployment cost, that is,

C(τ)=SC(τ)+C_scale(τ)C_DPU(τ)

SC(τ) represents a total operation cost, that is, the sum of the cost of turning on the server

and the cost of successfully placing the VNF:

SC(τ)=Σ_n_i_∈NΣ_f_v_μ_∈F_μx_f_v_μⁿⁱζ_cC_f_v_μ+Σ_e_j_∈EΣ_e_h_μ_∈E_μx_e_h_μ^e^jζ_BB_μ+Σ_n_i_∈Nζ_O

x_f_vμⁿⁱrepresents whether VNF f_v^μ∈F_μ is deployed on a server node n_i∈N in the request r_μ∈R, x_e_h_μ^e^jrepresents whether a virtual link e_h^μ∈E_μ is mapped to a physical link e_j∈E in the request r_μ∈R, ζ_cand ζ_Brespectively represent the unit costs of the resource and the bandwidth, C_f_v_μ represents the resource requirement of VNF f_v^μ∈F_μ, and ζ_Orepresents the cost of turning on the server.

C_scale(τ)=Σ_n_i_∈NΣ_f_v_μ_∈F_μC^h·x_n_i_,h^f^v^μ represents a total scaling cost, C^hrepresents a horizontal scaling cost, and x_n_i_,h^f^v^μ represents whether VNF f_v^μ∈F_μ is horizontally scaled.

C_DPU(τ) represents the total use cost of DPU and is defined as follows:

C_DPU(τ)=Σ_n_i_∈NΣ_f_v_μ_∈F_μζ_c_Dx_n_i_,D^f^μC_f_v^μ+ζ_B_Dx_n_i_,D^f^v^μB_μ

where ζ_c_Dand ζ_B_Drepresent the unit costs of the resource and bandwidth during use of DPU, and x_n_i_,D^f^μ represents whether VNF f_v^μ∈F_μ is processed with DPU. The resource constraint is as follows:

$C_{1} : \forall n_{i} \in N, \sum_{f_{v}^{μ} \in F_{μ}} \sum_{l \in N^{f_{v}^{μ}}} s_{n_{i}, τ}^{f_{v}^{μ}} \cdot C_{f_{v}^{μ}} \leq C_{n_{i}}$

where s_n_i_,τ^f^v^μ represents the number of service examples of VNF f_v^μ∈F_μ on a node n. E and C_n_irepresents the size of resources (CPU and memory) of the node n_i∈N.

The bandwidth constraint is as follows:

C₂:∀e_j∈E,Σ_r_μ_∈RΣ_e_h_μ_∈E_μx_e_h_μ^e^j·a_r,τ·B_μ≤B_e_j

where a_r,τ represents whether the request r_μ∈R is still in the service, and B_e_jrepresents the bandwidth size of the node n_i∈N.

The delay constraint is as follows:

C₃: ∀_rμ∈R,D_μ≤D_μ^max

where D_μ^maxrepresents the size of a maximum end-to-end delay.

Further, the SFC parallel problem in step (2) is specifically described as follows:

in SFC, some VNFs may work independently without affecting other VNFs, so the serial SFCs can be converted into the parallel SFCs. However, not all VNFs in the SFC can work in parallel. In a case that two VNFs modify the content of the flow or violate a dependency constraint, the operations of the two VNFs are in conflict. Only in a case that the VNFs in the SFC are independent of each other, parallel processing can be performed among the VNFs; otherwise, the correctness of the network and service strategy may be destroyed.

According to the method of the present invention, the VNFs are divided into two types of monitors and shapers, where the monitors are responsible for monitoring the flow rate without any modification, and the shapers are used for processing and modifying the flow rate. Since the VNFs in the SFCs are applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship It is stipulated in the present invention that in a case that one VNF f_v^μ is before another VNF f_v+1^μ, f_v+1^μ depends on f_v^μ, denoted as f_v^μ<f_v+1^μ.

To process the data packet in parallel, two functions are required, that is, a copying function and a merging function.

When one data packet enters, the copying function will copy the data packet and send the data packet to the VNFs capable of being processed in parallel. After the data packet is processed, the copied data packet is merged by the merging function. The copying and merging of the data packet will cause additional resource consumption and delay. Therefore, in the SFC parallel problem, an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:

min(αC+βΔd)

where α and β respectively represent weight coefficients of the additional resource consumption and delay.

C represents the additional resource consumption caused by the copying and merging of the data packet, with the following formula:

$C = \sum_{μ = 1}^{❘ R ❘} \sum_{\forall B \in r_{μ}} 6 4 \times \frac{Φ_{B} - 1}{U}$

where B is one group of parallel branches, Φ_Brepresents the parallelism degree of B, and U represents the size of the data packet.

The additional delay Δd caused by the copying and merging of the data packet may be represented as:

$\forall B \in r_{μ} \in R, Δ d = \sum_{μ = 1}^{❘ R ❘} 30 \times \frac{v_{μ}}{1 0^{9}}$

where V_μ is the data quantity of the μth SFC in the request R.

For the SFC parallel problem, the flow constraint is introduced. O_cis used for representing a set of copying nodes in r_μ, and O_mrepresents a set of merging nodes in r_μ. O_c(n_i) and O_m(n_i) respectively represent the number of the copying nodes and the merging nodes in r_μ. For r_μ, except the copying nodes, the merging nodes, the source node and the target node, all intermediate nodes meet the flow conservation, that is,

∀r_μ∈R,∀n_h∈N,n_h∉{n_src,n_dst,O_c,O_m}: Σ_e_h_μ_∈E_μΣ_n_g_∈Nx_e_h_μⁿ^g^,n^h−Σ_e_h_μ_∈E_μΣ_n_k_∈Nx_e_h_μⁿ^h^,n^k=0

In a case that the source node is one copying node, it is necessary to meet the constraint condition:

∀r_μ∈R,∀n_src∈Oc: Σ_e_h_μ_∈E_μΣ_n_u_∈Nx_e_h_μⁿ^src^,n^u−Σ_e_h_μ_∈E_μΣ_n_v_∈Nx_e_h_μⁿ^v^,n^src=O_c(n_src)

In a case that the target node is one merging node, it is necessary to meet the following constraint condition:

∀r_μ∈R,∀n_dst∈Om: Σ_e_h_μ_∈E_μΣ_n_v_∈Nx_e_h_μⁿ^v^,n^dst−Σ_e_h_μ_∈E_μΣ_n_u_∈Nx_e_h_μⁿ^dst^,n^u=O_m(n_dst)

For the situation that other nodes are the copying nodes, it should meet the following formula:

∀r_μ∈R,∀n_h∈Oc,Σ_f_v_μ_∈F_μx_f_v_μⁿ^h=1: Σ_e_h_μ_∈E_μΣ_n_u_∈Nx_e_h_μⁿ^h^,n^u−Σ_e_h_μ_∈E_μΣ_n_v_∈Nx_e_h_μⁿ^v^,n^h=O_c(n_h)−O_m(n_h)

For the situation that other nodes are the merging nodes, it should meet the following formula:

∀r_μ∈R,∀n_h∈Om,Σ_f_v_μ_∈F_μx_f_v_μⁿ^h=1: Σ_e_h_μ_∈E_μΣ_n_v_∈Nx_e_h_μⁿ^v^,n^h−Σ_e_h_μ_∈E_μΣ_n_u_∈Nx_e_h_μⁿ^h^,n^u=O_m(n_h)−O_c(n_h)

Further, an objective of the SFC parallel strategy in step (2) is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel. The specific algorithm process of the SFC parallel strategy is as follows:

21) initializing a branch chain set B, a main chain S and a monitor set M;

22) traversing r_μ: in a case that f_i^μ∈r_μ is monitor, firstly initializing a branch chain b E B, then adding f_i^μ into b and M; and in a case that f_i^μ∈r_μ is shaper, adding f_i^μ into a main chain S, at this time, searching monitor on which f_i^μ depends on in M, for each such monitor, for example, k∈M, having a branch chain that takes k as an end point at present, then pointing k to f_i^μ so as to extend the branch chain to take f_i^μ as an end point, and removing k from M;

23) invoking a path search algorithm to find all path sets PATH executable in parallel; and

24) returning to the branch chain set B, the main chain S and the path set PATH.

Further, the objective of the VNF topological order algorithm in step (3) is to find a VNF topological order about a plurality of SFCs, and the delay can be reduced by deploying the VNF according to the order. The specific algorithm process is as follows:

31) initializing f as a source node;

32) traversing a request set R arriving at the same time, invoking an algorithm 1 to obtain the branch chain set B, the main chain S and the path set PATH, evaluating a path with a maximum delay in all the paths according to the path set PATH, and adding the path into a set C;

33) traversing C, and creating a directed weighted graph graph=(F, ω), where F is a set of VNFs;

34) invoking a minimum feedback arc set algorithm to solve a minimum feedback arc set of graph, verifying whether the solved topological order meets the dependency relationship of the VNFs among different chains, returning to the topological order in a case that the solved topological order meets the dependency relationship of the VNFs among different chains, otherwise, returning to False(that is, the dependency condition is violated, and the algorithm cannot be used).

Further, the DPU processing strategy in step (4) is specifically described as follows:

an objective of the DPU processing strategy is to take over the data processing tasks which CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, thereby saving the CPU computing resource and reducing the delay.

In a case that there is a high-priority request in the request set that arrives at a certain moment, it is determined whether there is a VNF responsible for the data processing tasks, such as network protocol processing, data encryption and decryption and data compression, in the request. DPU is used for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and CPU is for processing in a case that there is no VNF responsible for the data processing task in the request.

Further, according to the number and situation of the requests arriving at different moments, the objective of the SFC heterogeneous deployment algorithm based on deep reinforcement learning in step (5) is processed with different strategies respectively, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy to achieve deployment of SFC better. The specific algorithm process is as follows:

51) deleting an overtime request by a system first, dividing the arrived requests Rby a priority judgment apparatus according to the real-time performance, dividing the request with high real-time performance into a high-priority R_high, and dividing the requests with low real-time performance into a low-priority R_low;

52) initializing a time slot T;

53) according to the numbers of R_high and R_low, determining which strategy to adopt

to process the SFC;

54) constructing and training a neural network model, taking the status of the current physical network, the characteristic of the request being processed and the above information as an input, and outputting the deployment strategy of each VNF through the computation of the neural network; and

55) updating the network status.

The beneficial effects are as follows: compared with the prior art, the method of the present invention adopts DPU to solve the current traditional computing architecture and the traditional SFC deployment system, liberates the data processing tasks which CPU is not good at, such as the network protocol processing, data encryption and decryption and data compression by combining CPU and DPU, saves the computing resource and optimizes the processing efficiency. Based on the implementation of the method, the present invention provides VNF topological order algorithm, a DPU processing strategy and an SFC parallel strategy; a novel way is provided to design, coordinate, deploy, and standardize various mobile services to support increasingly complex and diverse service requests, thereby making SFC deployment more flexible and agile; and the diversity of the service requirement is considered, and the service forwarding performance is improved.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic flowchart of a method according to the present invention;

FIG. 2 is a system architecture diagram of a method according to the present invention;

FIG. 3 is a dependency relationship diagram of VNF according to the present invention;

FIG. 4 is a diagram of copy and merging function according to the present invention;

FIG. 5 is a flowchart of an SFC deployment algorithm according to the present invention;

FIG. 6 is a comparison diagram of an average delay of each batch under different requests;

FIG. 7 is a comparison diagram of an average delay of each batch under different arrival rates;

FIG. 8 is a comparison diagram of a request acceptance rate under different requests;

FIG. 9 is a comparison diagram of a request acceptance rate under different arrival rates;

FIG. 10 is a comparison diagram of an average cost of each SFC under different requests;

FIG. 11 is a comparison diagram of an average cost of each SFC under different arrival rates;

FIG. 12 is a comparison diagram of an average reward of each SFC under different requests;

FIG. 13 is a comparison diagram of an average reward of each SFC under different arrival rates;

FIG. 14 is a diagram of a DPU usage under different requests; and

FIG. 15 is a diagram of a DPU usage under different arrival rates.

DETAILED DESCRIPTION OF THE EMBODIMENTS

To describe the technical solution disclosed by the present invention in detail, the present invention is further described below with reference to the accompanying drawings and embodiments.

The present invention provides an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform, which is mainly used for solving the problems of reduced forwarding performance, diversified scenarios and requirements and the like faced by the current traditional computing architecture and the traditional SFC deployment system.

Infrastructures such as cloud computing, a data center and an intelligent computing center rapidly expand the capacity, the network bandwidth is developed from 10G to 25G, 40G, 100G, 200G, even 400G. The traditional mode of completing data forwarding based on CPU has a bottleneck. To improve the data forwarding performance of CPU, Intel launches a data plane development kit (DPDK) accelerating scheme, and the I/O data processing performance is improved by bypassing a kernel protocol stack and in a user status kernel binding polling mode, thereby greatly increasing the packet forwarding rate. However, under the trend of large bandwidth, the CPU overhead of this scheme is hard to ignore. At the 25G bandwidth rate, most data centers only require 25% or even more of CPU overhead to meet the data I/O requirement of the business. In addition, with the popularization of an artificial intelligence scenario, more and more AI computer tasks on cloud propose more extreme requirements for the delay performance of the network and storage I/O; and the high-performance network, such as remote direct data access (RDMA) and nonvolatile memory host controller interface specification (NVMe), and the storage protocol are hard to meet the multi-tenant requirement scenarios of cloud computing under the traditional network card architecture. Under this background, DPU comes into being to solve many problems of the I/O performance bottleneck in the post-moore era and the development limitation of the virtualization technology.

DPU is a novel data processing unit integrating network acceleration in the 5G era. In essence, DPU is classified computing, which uninstalls data processing/preprocessing from CPU, and at the same time distributes computing power closer to a place where data occurs, thus reducing communication traffic. RDMA, network function, storage function, security function and virtualization function are fused in the DPU. The DPU may be used for taking over the data processing tasks which the CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, and taking the transmission and computing requirements into account. In addition, the DPU may also play a role of a connection hub. One end is connected to local resources such as CPU, GPU, a solid hard disk (SSD) and FPGA acceleration card, and one end is connected to network resources such as a switch/router. In general, the DPU improves the network transmission efficiency and releases the CPU computing power resource, thereby driving the whole data center to reduce cost and enhance efficiency. Compared with the traditional network card, the DPU increases the cost of a single part, but the introduction of the DPU liberates the host CPU computing power with higher cost and releases more saleable resources. Therefore the architecture transformation caused by the DPU increases the energy efficiency cost ratio of the whole data center resource pool and the revenue cost ratio of public cloud vendors to a certain extent. The DPU may cover more and more requirements and scenarios in the future. Meanwhile, the introduction of the NFV causes the SFC deployment to face the problems of low forwarding efficiency, diversified service requirements and the like. Under this opportunity background, the present invention studies the heterogeneous computing architecture composed of CPU and DPU, and the architecture is used for solving the SFC deployment problem.

The implementation process of the technical solution provided by the present invention is described below in detail.

The method of the present invention deploys the SFC by taking CPU+DPU as a computing architecture. An orchestrator and a server are mainly included. The orchestrator is responsible for receiving an SFC request from a network operator and running an SFC deployment algorithm to determine which SFCs are accepted and how to place these accepted SFCs. The server composed of a heterogeneous computing structure is responsible for deploying the SFC respectively by CPU or DPU according to the deployment strategy conveyed by the orchestrator.

The main implementation process of the method of the present invention is shown in FIG. 1. Based on the above technical solution, further detailed description is performed in this embodiment, specifically including the following steps:

- (1) construct a heterogeneous computing architecture for solving the multi-objective deployment problem, where the architecture includes an orchestrator responsible for management and coordination, and a server based on a CPU+DPU structure;
- (2) propose an SFC parallel problem according to the independence among VNFs, and propose an SFC parallel strategy and convert the serial SFCs into parallel SFCs;
- (3) for the situation that a plurality of SFCs arrive at the same time at a certain moment, propose a VNF topological order algorithm, deploy the VNFs according to a topological order obtained by the algorithm, and significantly reduce the delay by combining the sharing and scaling characteristics of the VNFs;
- (4) for a service request with a high real-time requirement, propose a DPU processing strategy, that is, using DPU for rapid processing according to the arrival situation and number of the requests; and
- (5) propose an SFC heterogeneous deployment algorithm based on deep reinforcement learning, where the heterogeneous algorithm can deploy the SFC according to the number and situation of the requests arriving at different moments.

The implementation process is described below in detail.

1. Construct a Heterogeneous Computing Architecture

As shown in FIG. 2, the heterogeneous computing architecture is composed of an orchestrator and a server, where the orchestrator includes an SFC deployment algorithm module, a resource management module and a driver module; and the server includes a CPU, a DPU and a bus for connecting the CPU and the DPU. The orchestrator is responsible for managing and deploying the arrived SFC, and the server composed of a heterogeneous computing structure is responsible for sequentially processing VNFs in different SFCs according to the deployment strategy conveyed by the orchestrator. The specific task of the orchestrator includes: receiving an SFC request from a network operator and running the SFC deployment algorithm to determine which SFCs are accepted and how to place these SFCs. To consider different situations of different requests, the method respectively calls the parallel strategy, the VNF topological order strategy and the DPU processing strategy so as to obtain an optimal deployment scheme of each request, then a resource management module is invoked to manage resources, and finally, a driver module is invoked to transmit the deployment scheme to a server for placement, and the server completes the deployment of SFC by using the CPU or the DPU respectively according to the deployment scheme.

2. Propose a Multi-Objective SFC Deployment Problem by Combining with the Reality and the Proposed Computing Architecture.

An objective function of the SFC deployment problem is as follows:

$\min (f_{1} + f_{2} + f_{3}) s . t . C_{1}, C_{2}, C_{3} {\begin{matrix} f_{1} = \sum_{r_{μ} \in R} D_{μ} \\ f_{2} = \sum_{r_{μ} \in R} y_{r_{μ}} B_{μ^{τ} r} \\ f_{3} = C (τ) \end{matrix}$

for f₁, D_μ is a total response delay, that is,

D_μ=L_μ+P_μ+T_μ+W_q

where L_μ=Σ_e_h_μ_∈E_μΣ_e_j_∈Ex_e_h_μ^e^jD_e_jis a communication delay, P_μ=Σ_f_v_μ_∈F_μΣ_n_i_∈Nx_f_v_μⁿⁱ.

$\frac{1}{\frac{η_{m_{i}}^{μ} c_{m_{i}}}{w_{m_{i}}^{μ}} - λ_{μ} + ε}$

is a processing delay,

$T_{μ} = \sum_{f_{v}^{μ} \in F_{μ}} \frac{U}{v_{μ}}$

is a transmission delay, and W_q is an average queuing delay.

For f₃, C(τ) represents a total deployment cost, that is,

C(τ)=SC(τ)+C_scale(τ)C_DPU(τ)

SC(τ) represents a total operation cost, that is, the sum of the cost of turning on the server and the cost of successfully placing the VNF:

SC(τ)=Σ_n_i_∈NΣ_f_v_μ_∈F_μx_f_v_μⁿⁱζ_cC_f_v_μ+Σ_e_j_∈EΣ_e_h_μ_∈E_μx_e_h_μ^e^jζ_BB_μ+Σ_n_i_∈Nζ_O

x_f_v_μⁿⁱrepresents whether VNF f_v^μ∈F_μ is deployed on a server node n_i∈N in the request r_μ∈R, x_e_h_μ^e^jrepresents whether a virtual link e_h^μ∈E_μ is mapped to a physical link e_j∈E in the request r_μ∈R, ζ_cand ζ_Brespectively represent the unit costs of the resource and the bandwidth, C_f_v_μ represents the resource requirement of VNF f_v^μ∈F_μ, and ζ_Orepresents the cost of turning on the server.

C_DPU(τ) represents the total use cost of DPU and is defined as follows:

C_DPU(τ)=Σ_n_i_∈NΣ_f_v_μ_∈F_μζ_c_Dx_n_i_,D^f^μC_f_v_μ+ζ_B_Dx_n_i_,D^f^v^μB_μ

where ζ_c_Dand ζ_B_Drepresent the unit costs of the resource and bandwidth during use of DPU, and x_n_i_,D^f^μ represents whether VNF f_v^μ∈F_μ is processed with DPU.

The resource constraint is as follows:

$C_{1} : \forall n_{i} \in N, \sum_{f_{v}^{μ} \in F_{μ}} \sum_{l \in N^{f_{v}^{μ}}} s_{n_{i}, τ}^{f_{v}^{μ}} \cdot C_{f_{v}^{μ}} \leq C_{n_{i}}$

where s_n_i_,τ^f^v^μ represents the number of service examples of VNF f_v^μ∈F_μ on a node n_i∈N, and C_n_irepresents the size of resources (CPU and memory) of the node n_i∈N.

The bandwidth constraint is as follows:

C₂: ∀e_j∈E,Σ_r_μ_∈RΣ_e_h_μ_∈E_μx_e_h_μ^e^j·a_r,τ·B_μ≤B_e_j

where a_r,τ represents whether the request r_μ∈R is still in the service, and B_e_jrepresents the bandwidth size of the node n_i∈N.

The delay constraint is as follows:

C₃:∀r_μ∈R,D_μ≤D_μ^max

where D_μ^maxrepresents the size of a maximum end-to-end delay.

3. Propose an SFC Parallel Problem by Combining with Reality

In SFC, some VNFs may work independently without affecting other VNFs, so the serial SFCs can be converted into the parallel SFCs. However, not all VNFs in the SFC can work in parallel. In a case that two VNFs modify the content of the flow or violate a dependency constraint, the operations of the two VNFs are in conflict. Only in a case that the VNFs in the SFC are independent of each other, parallel processing can be performed among the VNFs; otherwise, the correctness of the network and service strategy may be destroyed.

According to the present invention, the VNFs are divided into two types of monitors and shapers, where the monitors are responsible for monitoring the flow rate without any modification, and the shapers are used for processing and modifying the flow rate. Since the VNFs in the SFCs are applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship It is stipulated in the present invention that in a case that one VNF V_f^μ is before another VNF f_v+1^μ, f_v+1^μ depends on f_v^μ, denoted as f_v^μ<f_v+1^μ. A dependency relationship among different VNFs is shown in FIG. 3.

To process the data packet in parallel, two functions are required: 1) a copying function, and 2) a merging function. As shown in FIG. 4, when one data packet enters, the copying function will copy the data packet and send the data packet to the VNFs capable of being processed in parallel. After the data packet is processed, the copied data packet is merged by the merging function. The copying and merging of the data packet will cause additional resource consumption and delay. Therefore, in the SFC parallel problem, an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:

min(αC+βΔd)

where α and β respectively represent weight coefficients of the additional resource consumption and delay.

C represents the additional resource consumption caused by the copying and merging of the data packet, with the following formula:

$C = \sum_{μ = 1}^{❘ R ❘} \sum_{\forall B \in r_{μ}} 6 4 \times \frac{Φ_{B} - 1}{U}$

where B is one group of parallel branches, Φ_Brepresents the parallelism degree of B, and U represents the size of the data packet.

The additional delay Δd caused by the copying and merging of the data packet may be represented as:

$\forall B \in r_{μ} \in R, Δ d = \sum_{μ = 1}^{❘ R ❘} 30 \times \frac{v_{μ}}{1 0^{9}}$

where V_μ is the data quantity of the μth SFC in the request R.

For the SFC parallel problem, the flow constraint is introduced. O_cis used for representing a set of copying nodes in r_μ, and O_mrepresents a set of merging nodes in r_μ. O_c(n_i) and O_m(n_i) respectively represent the number of the copying nodes and the merging nodes in For except the copying nodes, the merging nodes, the source node and the target node, all intermediate nodes meet the flow conservation, that is,

∀r_μ∈R,∀n_h∈N,n_h∉{n_src,n_dst,O_c,O_m}: Σ_e_h_μ_∈E_μΣ_n_g_∈Nx_e_h_μⁿ^g^,n^h−Σ_e_h_μ_∈E_μΣ_n_k_∈Nx_e_h_μⁿ^h^,n^k=0

4. Design an SFC Parallel Strategy

An objective of the SFC parallel strategy is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel. The specific algorithm process of the SFC parallel strategy is as follows:

- 1) initialize a branch chain set B, a main chain S and a monitor set M;
- 2) traverse r_μ: in a case that f_i^μ∈r_μ is monitor, firstly initialize a branch chain b∈B, then adding f_i^μ into b and M; and in a case that f_i^μ∈r_μ is shaper, add f_i^μ into a main chain S, at this time, search monitor on which f_i^μ depends on in M, for each such monitor, for example, k∈M, having a branch chain that takes k as an end point at present, then point k to f_i^μ so as to extend the branch chain to take f_i^μ as an end point, and remove k from M;
- 3) invoke a path search algorithm to find all path sets PATH executable in parallel; and
- 4) return to the branch chain set B, the main chain S and the path set PATH.

5. Design a VNF Topological Order Strategy

An objective of the VNF topological order algorithm is to find a VNF topological order about a plurality of SFCs, and the delay can be reduced by deploying the VNF according to the order. The specific algorithm process is as follows:

- 1) initialize f as a source node;
- 2) traverse a request set R arriving at the same time, invoke an algorithm 1 to obtain the branch chain set B, the main chain S and the path set PATH, solve a path with a maximum delay in all the paths according to the path set PATH, and add the path into a set C;
- 3) traverse C, and create a directed weighted graph graph=(F, ω), where F is a set of VNFs;
- 4) invoke a minimum feedback arc set algorithm to solve a minimum feedback arc set of graph, verify whether the solved topological order meets the dependency relationship of the VNFs among different chains, return to the topological order in a case that the solved topological order meets the dependency relationship of the VNFs among different chains, otherwise, return to False(that is, the dependency condition is violated, and the algorithm cannot be used).

6. Design a DPU Processing Strategy

An objective of the DPU processing strategy is to take over the data processing tasks which CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, thereby saving the CPU computing resource and reducing the delay.

In a case that there is a high-priority request in request set that arrives at a certain moment, it is determined whether there is a VNF responsible for the data processing tasks, such as network protocol processing, data encryption and decryption and data compression, in the request. DPU is used for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and CPU is for processing in a case that there is no VNF responsible for the data processing task in the request.

7. Design an SFC Heterogeneous Deployment Algorithm Based on Deep Reinforcement Learning According to the Above Strategy and Architecture

An objective of the SFC heterogeneous deployment algorithm based on deep reinforcement learning is to respectively adopt different strategies for processing according to the number and situation of requests arriving at different moments, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy, thereby better deploying the SFC. The specific algorithm process is as follows:

- 1) delete an overtime request by a system first, divide the arrived requests R by a priority judgment apparatus according to the real-time performance, divide the request with high real-time performance into a high-priority R_high, and divide the requests with low real-time performance into a low-priority R_low;
- 2) initialize a time slot T;
- 3) according to the numbers of R_high and R_low, determine which strategy to adopt to process the SFC, as shown in FIG. 5 for details;
- 4) construct and train a neural network model, take the status of the current physical network, the characteristic of the request being processed and the above information as an input, and output the deployment strategy of each VNF through the computation of the neural network; and
- 5) update the network status.

8. Achieve Deployment of SFC According to a Deployment Strategy

The orchestrator in the heterogeneous computing architecture calls the driver module to transfer the deployment scheme to the server for placement, and the server respectively uses CPU or DPU to complete the best deployment of the SFC according to the deployment scheme, thereby reducing the delay and cost.

In this embodiment, to verify the actual effect of the present invention (PSHD), a stimulated comparison experiment is performed with other two algorithms (BASE and FCPM) by taking the request number as a control variable. Since BASE and PSHD have the same objective, one group of experiments taking the request arrival rate as the control variable is performed in this embodiment, thereby proving the effectiveness of the present invention.

FIG. 6 compares the average delay of each batch of three algorithms in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. As can be seen from the figure, the delay of the method of the present invention is always minimum, and with the increase of the number of the requests, the delay gap between the method of the present invention and other algorithms is increasing, indicating that the more the number of the requests, the better the delay reduction performance of the method of the present invention, which is 37.73% and 34.26% lower than the delays of BASE and FCPM, respectively. FIG. 7 also shows that the delay of the method of the present invention is always the lowest in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3. With the increase of the arrival rate of the requests, it can be seen that the delay gap between the method of the present invention and BASE is larger and larger, indicating that the more the number of the requests arriving at the same moment, the better the delay reduction performance of the method of the present invention, which is 47.04% lower than the delay of BASE. In addition, PD in FIG. 6 and FIG. 7 shows the delay trend in a case that a VNF topological order strategy is not adopted. It can be seen that the delay gap between PD and the method of the present invention is gradually increasing, which proves the effectiveness of reducing the delay by the VNF topological order strategy.

In this embodiment, compared with the most advanced method, the request acceptance rate of the method of the present invention is discussed. FIG. 8 describes the result in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. FIG. 9 describes the result in a case that the number of the service nodes is 100 and the arrival rate of the requests changes from 0.5 to 3. As shown in FIG. 8, the request acceptance rate of the method of the present invention is the highest, with an average of 0.728, followed by the BASE algorithm with an average of 0.668, and the acceptance rate of FCPM is the lowest and is 0.517. FIG. 9 also proves the request acceptance rate of the method of the present invention is the highest, with an average of 0.719. It can be seen from the figure that with the increase of the arrival rate, the reduction trend of the request acceptance rate of BASE is obviously greater than that of the method of the present invention, which provides that the more the requests arriving at a certain moment, the worse the deployment ability of BASE, and the stronger the deployment ability of the method of the present invention.

This embodiment compares the average delay of each SFC of three algorithms in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. As shown in FIG. 10, the cost of the method of the present invention is the highest and is increased with the increase of the number of the requests, because the method of the present invention introduces the heterogeneous architecture of CPU+DPU, that is, the DPU is used for rapid processing according to the situation, and the use cost of the DPU is higher. FIG. 11 also shows that in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3, the cost of the method of the present invention is always the highest, with an average of 8.61, and the average cost of BASE is 4.55. The increase of the cost of the method of the present invention brings the increase of the request acceptance rate and the reduction of the delay, as shown in FIG. 6 to FIG. 9, and as shown in FIG. 12 to FIG. 13, the reward of the method of the present invention is always the highest. The above proves that sacrificing part of the cost can bring better benefits.

Finally, in this embodiment, the DPU usage is compared in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300, and in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3. As shown in FIG. 14 and FIG. 15, the DPU usage is increased with the increase of the number of the requests/the arrival rate of the requests, and the increase of the DPU usage brings the increase of the cost, as shown in FIG. 10 to FIG. 11, higher request acceptance rate compared with other algorithms, as shown in FIG. 8 to FIG. 9, and lower delay, as shown in FIG. 6 to FIG. 7. It can be seen from the reward comparison of the three algorithms in FIG. 12, the performance of the method of the present invention is much higher than those of other algorithms.

Number	Name	Date	Kind
20190104071	Kobayashi	Apr 2019	A1
20190199649	Kobayashi	Jun 2019	A1
20200026575	Guim Bernat	Jan 2020	A1

Number	Date	Country
110022230	Jul 2019	CN
113411207	Sep 2021	CN
113918277	Jan 2022	CN

Efficient parallelization and deployment method of multi-objective service function chain based on CPU + DPU platform

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

US Referenced Citations (3)

Foreign Referenced Citations (3)

Non-Patent Literature Citations (2)

Entry
Meigen Huang, Tao Wang, Liang Liu, Ruiqin Pang, and Huan Du; Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization Computer Science, Issue S1, 404-408 Publication date: Jun. 15, 2020.
Weilin Zhou; Yuan Yang; Weiming Xu; Review of Research on Network Function Virtualization Technology Computer Research and Development, Issue 04, p. 675-688 Publication date: Apr. 15, 2018.