Efficient parallelization and deployment method of multi-objective service function chain based on CPU + DPU platform

Information

  • Patent Grant
  • 11936758
  • Patent Number
    11,936,758
  • Date Filed
    Wednesday, October 11, 2023
    a year ago
  • Date Issued
    Tuesday, March 19, 2024
    10 months ago
Abstract
An efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform solves the problem of multi-objective deployment by constructing a heterogeneous computing architecture composed of an orchestrator and a server based on a CPU+DPU structure; the orchestrator is responsible for receiving an SFC request from a network operator; an SFC deployment algorithm based on deep reinforcement learning is operated, including a parallel strategy, a VNF topological order strategy and a DPU processing strategy to obtain an optimal deployment scheme of each request; then a resource management module is invoked to manage resources; and finally, a driver module is invoked to transmit the deployment scheme to a server for placement, and the server completes the deployment of SFC by using the CPU or the DPU respectively according to the deployment scheme.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application Ser. No. CN202211352904.1 filed on 1 Nov. 2022.


TECHNICAL FIELD

The present invention belongs to a service function chain orchestration technology, and particularly relates to an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform.


BACKGROUND

The traditional mode of completing data forwarding based on a central process unit (CPU) has a bottleneck, and a computing architecture only composed of CPU cannot meet diversified scenarios and business requirements. On one hand, as Moore's law is slowed down, the network bandwidth and the connection number are increased more widely and more densely than a data path, computing nodes at ends, edges and cloud are directly exposed under the increased data volume, and the computing power increasing speed of the CPU and the network bandwidth increasing speed show a “scissor difference”; and on the other hand, in a high concurrent forwarding work, it is difficult for the serial computing mode of the CPU to exert the maximum computing power.


The advent of network function virtualization (NFV) provides a novel way to design, coordinate, deploy, and standardize various mobile services to support increasingly complex and diverse service requests, thereby making SFC deployment more flexible and agile. A current service function chain (SFC) deployment system focuses on optimizing the network resource utilization ratio, and does not consider the diversity of service requirements and the degradation of service forwarding performance.


A cloud architecture based on a data processing unit (DPU) uninstalls and accelerates the network and storage system as required, compared with a graphics processing unit (GPU), a field programmable gate array (FPGA) and an application-specific integrated circuit (ASIC), the higher-cost host CPU computing power is liberated, and the ultimate performance can be achieved more economically and effectively. The architecture transformation caused by DPU improves the energy efficiency cost ratio of the whole data center resource pool and the profit cost ratio of public cloud manufacturers to a certain extent, and the DPU can cover more and more demands and scenarios in the future. In this regard, the research content of the present invention is to solve some problems faced by the current traditional computing architecture and the traditional SFC deployment system based on DPU.


SUMMARY

An objective of the present invention is: to solve the problems of reduced forwarding performance, diversified scenarios and requirements and the like of the current traditional computing architecture and the traditional SFC deployment system, the present invention provides an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform.


The technical solution is: an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform includes the following steps:

    • (1) constructing a heterogeneous computing architecture for solving the multi-objective deployment problem, where the heterogeneous computing architecture comprises an orchestrator responsible for management and coordination, and a server based on a CPU+DPU structure;
    • (2) converting serial SFCs into parallel SFCs by an SFC parallel strategy according to the independence among virtual network functions (VNF), and accordingly solving the SFC parallel problem;
    • where an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:

      min(αC+βΔd)
    • where α and β respectively represent weight coefficients of the additional resource consumption and delay,






C
=







μ
=
1




"\[LeftBracketingBar]"

R


"\[RightBracketingBar]"












B


r
μ





6

4
×



Φ
B

-
1

U







represents an additional resource consumption caused by copying and merging of the data packet, and







Δ

d

=







μ
=
1




"\[LeftBracketingBar]"

R


"\[RightBracketingBar]"




3

0
×


v
μ


1


0
9









represents an additional delay caused by copying and merging of the data packet;

    • an objective of the SFC parallel strategy is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel;
    • (3) for the situation that a plurality of SFCs arrive at the same time at a certain moment, adopting a VNF topological order algorithm, deploying the VNFs according to a topological order obtained by the algorithm, and reducing the delay by combining the sharing and scaling characteristics of the VNFs,
    • where the VNF topological order algorithm is used for constructing a VNF topological order for the plurality of SFCs and deploying the VNFs according to the order to reduce the delay;
    • (4) adopting a DPU processing strategy for a service request with a high real-time requirement, where the DPU processing strategy is used for processing computing tasks including network protocol processing, data encryption and decryption and data compression to save a CPU computing resource and reduce the delay;
    • in a case that there is a high-priority request in the request set that arrives at a certain moment, determining whether there is a VNF responsible for the data processing task in the request, using DPU for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and using CPU for processing in a case that there is no VNF responsible for the data processing task in the request; and
    • (5) proposing an SFC heterogeneous deployment algorithm based on deep reinforcement learning, where the heterogeneous algorithm is capable of deploying the SFC respectively according to the number and situation of the requests arriving at different moments,
    • according to the number and situation of the requests arriving at different moments, the SFC heterogeneous deployment algorithm based on deep reinforcement learning is processed with different strategies respectively, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy to achieve deployment of SFC.


Further, the heterogeneous computing architecture in step (1) is specifically described as follows:

    • the architecture is composed of an orchestrator and a server, where the orchestrator includes an SFC deployment algorithm module, a resource management module and a driver module; and the server includes a CPU, a DPU and a bus for connecting the CPU and the DPU. The orchestrator is responsible for managing and deploying the arrived SFC, and the server composed of a heterogeneous computing structure is responsible for sequentially processing VNFs in different SFCs according to the deployment strategy conveyed by the orchestrator. The specific task of the orchestrator includes: receiving an SFC request from a network operator and operating the SFC deployment algorithm to determine which SFCs are accepted and how to place these SFCs. To consider different situations of different requests, the method respectively calls the parallel strategy, the VNF topological order strategy and the DPU processing strategy so as to obtain an optimal deployment scheme of each request, then a resource management module is invoked to manage resources, and finally, a driver module is invoked to transmit the deployment scheme to a server for placement, and the server completes the deployment of SFC by using the CPU or the DPU respectively according to the deployment scheme.


Further, the multi-objective SFC deployment problem in step (1) is specifically described as follows:

    • an objective function of the SFC deployment problem is as follows:







min

(


f
1

+

f
2

+

f
3


)





s
.
t
.


C
1


,

C
2

,

C
3





{





f
1

=







r
μ


R




D
μ









f
2

=







r
μ


R





y

r
μ




B


μ
τ


r











f
3

=

C

(
τ
)










For f1, Dμ is a total response delay, that is,

Dμ=Lμ+Pμ+Tμ+Wq

    • where Lμehμ∈EμΣej∈ExehμejDej is a communication delay, Pμfvμ∈FμΣni∈Nxfvμni.






1




η

m
i

μ



c

m
i




w

m
i

μ


-

λ
μ

+
ε






is a processing delay,







T
μ

=








f
v
μ



F
μ





U

v
μ








is a transmission delay, and Wq is an average queuing delay.


For f2, Σrμ∈RyrμBμτr is a total throughput, the binary variable yrμ represents whether rμ is accepted, Bμ is a minimum bandwidth of the SFC, and τr=l*Δ represents the survival time of the SFC.


For f3, C(τ) represents a total deployment cost, that is,

C(τ)=SC(τ)+Cscale(τ)CDPU(τ)


SC(τ) represents a total operation cost, that is, the sum of the cost of turning on the server


and the cost of successfully placing the VNF:

SC(τ)=Σni∈NΣfvμ∈FμxfvμniζcCfvμej∈EΣehμ∈EμxehμejζBBμni∈NζO


xfni represents whether VNF fvμ∈Fμ is deployed on a server node ni∈N in the request rμ∈R, xehμej represents whether a virtual link ehμ∈Eμ is mapped to a physical link ej∈E in the request rμ∈R, ζc and ζB respectively represent the unit costs of the resource and the bandwidth, Cfvμ represents the resource requirement of VNF fvμ∈Fμ, and ζO represents the cost of turning on the server.


Cscale(τ)=Σni∈NΣfvμ∈FμCh·xni,hfvμ represents a total scaling cost, Ch represents a horizontal scaling cost, and xni,hfvμ represents whether VNF fvμ∈Fμ is horizontally scaled.


CDPU(τ) represents the total use cost of DPU and is defined as follows:

CDPU(τ)=Σni∈NΣfvμ∈FμζcDxni,DfμCfvμBDxni,DfvμBμ

where ζcD and ζBD represent the unit costs of the resource and bandwidth during use of DPU, and xni,Dfμ represents whether VNF fvμ∈Fμ is processed with DPU. The resource constraint is as follows:








C
1

:





n
i


N



,








f
v
μ



F
μ










l


N

f
v
μ







s


n
i

,
τ


f
v
μ


·

C

f
v
μ







C

n
i







where snifvμ represents the number of service examples of VNF fvμ∈Fμ on a node n. E and Cni represents the size of resources (CPU and memory) of the node ni∈N.


The bandwidth constraint is as follows:

C2:∀ej∈E,Σrμ∈RΣehμ∈Eμxehμej·ar,τ·Bμ≤Bej


where ar,τ represents whether the request rμ∈R is still in the service, and Bej represents the bandwidth size of the node ni∈N.


The delay constraint is as follows:

C3: ∀rμ∈R,Dμ≤Dμmax


where Dμmax represents the size of a maximum end-to-end delay.


Further, the SFC parallel problem in step (2) is specifically described as follows:


in SFC, some VNFs may work independently without affecting other VNFs, so the serial SFCs can be converted into the parallel SFCs. However, not all VNFs in the SFC can work in parallel. In a case that two VNFs modify the content of the flow or violate a dependency constraint, the operations of the two VNFs are in conflict. Only in a case that the VNFs in the SFC are independent of each other, parallel processing can be performed among the VNFs; otherwise, the correctness of the network and service strategy may be destroyed.


According to the method of the present invention, the VNFs are divided into two types of monitors and shapers, where the monitors are responsible for monitoring the flow rate without any modification, and the shapers are used for processing and modifying the flow rate. Since the VNFs in the SFCs are applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship It is stipulated in the present invention that in a case that one VNF fvμ is before another VNF fv+1μ, fv+1μ depends on fvμ, denoted as fvμ<fv+1μ.


To process the data packet in parallel, two functions are required, that is, a copying function and a merging function.


When one data packet enters, the copying function will copy the data packet and send the data packet to the VNFs capable of being processed in parallel. After the data packet is processed, the copied data packet is merged by the merging function. The copying and merging of the data packet will cause additional resource consumption and delay. Therefore, in the SFC parallel problem, an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:

min(αC+βΔd)


where α and β respectively represent weight coefficients of the additional resource consumption and delay.


C represents the additional resource consumption caused by the copying and merging of the data packet, with the following formula:






C
=







μ
=
1




"\[LeftBracketingBar]"

R


"\[RightBracketingBar]"












B


r
μ





6

4
×



Φ
B

-
1

U






where B is one group of parallel branches, ΦB represents the parallelism degree of B, and U represents the size of the data packet.


The additional delay Δd caused by the copying and merging of the data packet may be represented as:









B


r
μ


R


,


Δ

d

=







μ
=
1




"\[LeftBracketingBar]"

R


"\[RightBracketingBar]"




30
×


v
μ


1


0
9









where Vμ is the data quantity of the μth SFC in the request R.


For the SFC parallel problem, the flow constraint is introduced. Oc is used for representing a set of copying nodes in rμ, and Om represents a set of merging nodes in rμ. Oc(ni) and Om(ni) respectively represent the number of the copying nodes and the merging nodes in rμ. For rμ, except the copying nodes, the merging nodes, the source node and the target node, all intermediate nodes meet the flow conservation, that is,

rμ∈R,∀nh∈N,nh∉{nsrc,ndst,Oc,Om}: Σehμ∈EμΣng∈Nxehμng,nh−Σehμ∈EμΣnk∈Nxehμnh,nk=0


In a case that the source node is one copying node, it is necessary to meet the constraint condition:

rμ∈R,∀nsrc∈Oc: Σehμ∈EμΣnu∈Nxehμnsrc,nu−Σehμ∈EμΣnv∈Nxehμnv,nsrc=Oc(nsrc)


In a case that the target node is one merging node, it is necessary to meet the following constraint condition:

rμ∈R,∀ndst∈Om: Σehμ∈EμΣnv∈Nxehμnv,ndst−Σehμ∈EμΣnu∈Nxehμndst,nu=Om(ndst)


For the situation that other nodes are the copying nodes, it should meet the following formula:

rμ∈R,∀nh∈Oc,Σfvμ∈Fμxfvμnh=1: Σehμ∈EμΣnu∈Nxehμnh,nu−Σehμ∈EμΣnv∈Nxehμnv,nh=Oc(nh)−Om(nh)


For the situation that other nodes are the merging nodes, it should meet the following formula:

rμ∈R,∀nh∈Om,Σfvμ∈Fμxfvμnh=1: Σehμ∈EμΣnv∈Nxehμnv,nh−Σehμ∈EμΣnu∈Nxehμnh,nu=Om(nh)−Oc(nh)


Further, an objective of the SFC parallel strategy in step (2) is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel. The specific algorithm process of the SFC parallel strategy is as follows:


21) initializing a branch chain set B, a main chain S and a monitor set M;


22) traversing rμ: in a case that fiμ∈rμ is monitor, firstly initializing a branch chain b E B, then adding fiμ into b and M; and in a case that fiμ∈rμ is shaper, adding fiμ into a main chain S, at this time, searching monitor on which fiμ depends on in M, for each such monitor, for example, k∈M, having a branch chain that takes k as an end point at present, then pointing k to fiμ so as to extend the branch chain to take fiμ as an end point, and removing k from M;


23) invoking a path search algorithm to find all path sets PATH executable in parallel; and


24) returning to the branch chain set B, the main chain S and the path set PATH.


Further, the objective of the VNF topological order algorithm in step (3) is to find a VNF topological order about a plurality of SFCs, and the delay can be reduced by deploying the VNF according to the order. The specific algorithm process is as follows:


31) initializing f as a source node;


32) traversing a request set R arriving at the same time, invoking an algorithm 1 to obtain the branch chain set B, the main chain S and the path set PATH, evaluating a path with a maximum delay in all the paths according to the path set PATH, and adding the path into a set C;


33) traversing C, and creating a directed weighted graph graph=(F, ω), where F is a set of VNFs;


34) invoking a minimum feedback arc set algorithm to solve a minimum feedback arc set of graph, verifying whether the solved topological order meets the dependency relationship of the VNFs among different chains, returning to the topological order in a case that the solved topological order meets the dependency relationship of the VNFs among different chains, otherwise, returning to False(that is, the dependency condition is violated, and the algorithm cannot be used).


Further, the DPU processing strategy in step (4) is specifically described as follows:


an objective of the DPU processing strategy is to take over the data processing tasks which CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, thereby saving the CPU computing resource and reducing the delay.


In a case that there is a high-priority request in the request set that arrives at a certain moment, it is determined whether there is a VNF responsible for the data processing tasks, such as network protocol processing, data encryption and decryption and data compression, in the request. DPU is used for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and CPU is for processing in a case that there is no VNF responsible for the data processing task in the request.


Further, according to the number and situation of the requests arriving at different moments, the objective of the SFC heterogeneous deployment algorithm based on deep reinforcement learning in step (5) is processed with different strategies respectively, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy to achieve deployment of SFC better. The specific algorithm process is as follows:


51) deleting an overtime request by a system first, dividing the arrived requests Rby a priority judgment apparatus according to the real-time performance, dividing the request with high real-time performance into a high-priority R_high, and dividing the requests with low real-time performance into a low-priority R_low;


52) initializing a time slot T;


53) according to the numbers of R_high and R_low, determining which strategy to adopt


to process the SFC;


54) constructing and training a neural network model, taking the status of the current physical network, the characteristic of the request being processed and the above information as an input, and outputting the deployment strategy of each VNF through the computation of the neural network; and


55) updating the network status.


The beneficial effects are as follows: compared with the prior art, the method of the present invention adopts DPU to solve the current traditional computing architecture and the traditional SFC deployment system, liberates the data processing tasks which CPU is not good at, such as the network protocol processing, data encryption and decryption and data compression by combining CPU and DPU, saves the computing resource and optimizes the processing efficiency. Based on the implementation of the method, the present invention provides VNF topological order algorithm, a DPU processing strategy and an SFC parallel strategy; a novel way is provided to design, coordinate, deploy, and standardize various mobile services to support increasingly complex and diverse service requests, thereby making SFC deployment more flexible and agile; and the diversity of the service requirement is considered, and the service forwarding performance is improved.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a schematic flowchart of a method according to the present invention;



FIG. 2 is a system architecture diagram of a method according to the present invention;



FIG. 3 is a dependency relationship diagram of VNF according to the present invention;



FIG. 4 is a diagram of copy and merging function according to the present invention;



FIG. 5 is a flowchart of an SFC deployment algorithm according to the present invention;



FIG. 6 is a comparison diagram of an average delay of each batch under different requests;



FIG. 7 is a comparison diagram of an average delay of each batch under different arrival rates;



FIG. 8 is a comparison diagram of a request acceptance rate under different requests;



FIG. 9 is a comparison diagram of a request acceptance rate under different arrival rates;



FIG. 10 is a comparison diagram of an average cost of each SFC under different requests;



FIG. 11 is a comparison diagram of an average cost of each SFC under different arrival rates;



FIG. 12 is a comparison diagram of an average reward of each SFC under different requests;



FIG. 13 is a comparison diagram of an average reward of each SFC under different arrival rates;



FIG. 14 is a diagram of a DPU usage under different requests; and



FIG. 15 is a diagram of a DPU usage under different arrival rates.





DETAILED DESCRIPTION OF THE EMBODIMENTS

To describe the technical solution disclosed by the present invention in detail, the present invention is further described below with reference to the accompanying drawings and embodiments.


The present invention provides an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform, which is mainly used for solving the problems of reduced forwarding performance, diversified scenarios and requirements and the like faced by the current traditional computing architecture and the traditional SFC deployment system.


Infrastructures such as cloud computing, a data center and an intelligent computing center rapidly expand the capacity, the network bandwidth is developed from 10G to 25G, 40G, 100G, 200G, even 400G. The traditional mode of completing data forwarding based on CPU has a bottleneck. To improve the data forwarding performance of CPU, Intel launches a data plane development kit (DPDK) accelerating scheme, and the I/O data processing performance is improved by bypassing a kernel protocol stack and in a user status kernel binding polling mode, thereby greatly increasing the packet forwarding rate. However, under the trend of large bandwidth, the CPU overhead of this scheme is hard to ignore. At the 25G bandwidth rate, most data centers only require 25% or even more of CPU overhead to meet the data I/O requirement of the business. In addition, with the popularization of an artificial intelligence scenario, more and more AI computer tasks on cloud propose more extreme requirements for the delay performance of the network and storage I/O; and the high-performance network, such as remote direct data access (RDMA) and nonvolatile memory host controller interface specification (NVMe), and the storage protocol are hard to meet the multi-tenant requirement scenarios of cloud computing under the traditional network card architecture. Under this background, DPU comes into being to solve many problems of the I/O performance bottleneck in the post-moore era and the development limitation of the virtualization technology.


DPU is a novel data processing unit integrating network acceleration in the 5G era. In essence, DPU is classified computing, which uninstalls data processing/preprocessing from CPU, and at the same time distributes computing power closer to a place where data occurs, thus reducing communication traffic. RDMA, network function, storage function, security function and virtualization function are fused in the DPU. The DPU may be used for taking over the data processing tasks which the CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, and taking the transmission and computing requirements into account. In addition, the DPU may also play a role of a connection hub. One end is connected to local resources such as CPU, GPU, a solid hard disk (SSD) and FPGA acceleration card, and one end is connected to network resources such as a switch/router. In general, the DPU improves the network transmission efficiency and releases the CPU computing power resource, thereby driving the whole data center to reduce cost and enhance efficiency. Compared with the traditional network card, the DPU increases the cost of a single part, but the introduction of the DPU liberates the host CPU computing power with higher cost and releases more saleable resources. Therefore the architecture transformation caused by the DPU increases the energy efficiency cost ratio of the whole data center resource pool and the revenue cost ratio of public cloud vendors to a certain extent. The DPU may cover more and more requirements and scenarios in the future. Meanwhile, the introduction of the NFV causes the SFC deployment to face the problems of low forwarding efficiency, diversified service requirements and the like. Under this opportunity background, the present invention studies the heterogeneous computing architecture composed of CPU and DPU, and the architecture is used for solving the SFC deployment problem.


The implementation process of the technical solution provided by the present invention is described below in detail.


The method of the present invention deploys the SFC by taking CPU+DPU as a computing architecture. An orchestrator and a server are mainly included. The orchestrator is responsible for receiving an SFC request from a network operator and running an SFC deployment algorithm to determine which SFCs are accepted and how to place these accepted SFCs. The server composed of a heterogeneous computing structure is responsible for deploying the SFC respectively by CPU or DPU according to the deployment strategy conveyed by the orchestrator.


The main implementation process of the method of the present invention is shown in FIG. 1. Based on the above technical solution, further detailed description is performed in this embodiment, specifically including the following steps:

    • (1) construct a heterogeneous computing architecture for solving the multi-objective deployment problem, where the architecture includes an orchestrator responsible for management and coordination, and a server based on a CPU+DPU structure;
    • (2) propose an SFC parallel problem according to the independence among VNFs, and propose an SFC parallel strategy and convert the serial SFCs into parallel SFCs;
    • (3) for the situation that a plurality of SFCs arrive at the same time at a certain moment, propose a VNF topological order algorithm, deploy the VNFs according to a topological order obtained by the algorithm, and significantly reduce the delay by combining the sharing and scaling characteristics of the VNFs;
    • (4) for a service request with a high real-time requirement, propose a DPU processing strategy, that is, using DPU for rapid processing according to the arrival situation and number of the requests; and
    • (5) propose an SFC heterogeneous deployment algorithm based on deep reinforcement learning, where the heterogeneous algorithm can deploy the SFC according to the number and situation of the requests arriving at different moments.


The implementation process is described below in detail.


1. Construct a Heterogeneous Computing Architecture


As shown in FIG. 2, the heterogeneous computing architecture is composed of an orchestrator and a server, where the orchestrator includes an SFC deployment algorithm module, a resource management module and a driver module; and the server includes a CPU, a DPU and a bus for connecting the CPU and the DPU. The orchestrator is responsible for managing and deploying the arrived SFC, and the server composed of a heterogeneous computing structure is responsible for sequentially processing VNFs in different SFCs according to the deployment strategy conveyed by the orchestrator. The specific task of the orchestrator includes: receiving an SFC request from a network operator and running the SFC deployment algorithm to determine which SFCs are accepted and how to place these SFCs. To consider different situations of different requests, the method respectively calls the parallel strategy, the VNF topological order strategy and the DPU processing strategy so as to obtain an optimal deployment scheme of each request, then a resource management module is invoked to manage resources, and finally, a driver module is invoked to transmit the deployment scheme to a server for placement, and the server completes the deployment of SFC by using the CPU or the DPU respectively according to the deployment scheme.


2. Propose a Multi-Objective SFC Deployment Problem by Combining with the Reality and the Proposed Computing Architecture.


An objective function of the SFC deployment problem is as follows:







min

(


f
1

+

f
2

+

f
3


)





s
.
t
.


C
1


,

C
2

,

C
3





{





f
1

=







r
μ


R




D
μ









f
2

=







r
μ


R





y

r
μ




B


μ
τ


r











f
3

=

C

(
τ
)










for f1, Dμ is a total response delay, that is,

Dμ=Lμ+Pμ+Tμ+Wq


where Lμehμ∈EμΣej∈ExehμejDej is a communication delay, Pμfvμ∈FμΣni∈Nxfvμni.






1




η

m
i

μ



c

m
i




w

m
i

μ


-

λ
μ

+
ε






is a processing delay,







T
μ

=








f
v
μ



F
μ





U

v
μ








is a transmission delay, and Wq is an average queuing delay.


For f2, Σrμ∈RyrμBμτr is a total throughput, the binary variable yrμ represents whether rμ is accepted, Bμ is a minimum bandwidth of the SFC, and τr=l*Δ represents the survival time of the SFC.


For f3, C(τ) represents a total deployment cost, that is,

C(τ)=SC(τ)+Cscale(τ)CDPU(τ)


SC(τ) represents a total operation cost, that is, the sum of the cost of turning on the server and the cost of successfully placing the VNF:

SC(τ)=Σni∈NΣfvμ∈FμxfvμniζcCfvμej∈EΣehμ∈EμxehμejζBBμni∈NζO


xfvμni represents whether VNF fvμ∈Fμ is deployed on a server node ni∈N in the request rμ∈R, xehμej represents whether a virtual link ehμ∈Eμ is mapped to a physical link ej∈E in the request rμ∈R, ζc and ζB respectively represent the unit costs of the resource and the bandwidth, Cfvμ represents the resource requirement of VNF fvμ∈Fμ, and ζO represents the cost of turning on the server.


Cscale(τ)=Σni∈NΣfvμ∈FμCh·xni,hfvμ represents a total scaling cost, Ch represents a horizontal scaling cost, and xni,hfvμ represents whether VNF fvμ∈Fμ is horizontally scaled.


CDPU(τ) represents the total use cost of DPU and is defined as follows:

CDPU(τ)=Σni∈NΣfvμ∈FμζcDxni,DfμCfvμBDxni,DfvμBμ


where ζcD and ζBD represent the unit costs of the resource and bandwidth during use of DPU, and xni,Dfμ represents whether VNF fvμ∈Fμ is processed with DPU.


The resource constraint is as follows:








C
1

:





n
i


N



,








f
v
μ



F
μ










l


N

f
v
μ







s


n
i

,
τ


f
v
μ


·

C

f
v
μ







C

n
i







where snifvμ represents the number of service examples of VNF fvμ∈Fμ on a node ni∈N, and Cni represents the size of resources (CPU and memory) of the node ni∈N.


The bandwidth constraint is as follows:

C2: ∀ej∈E,Σrμ∈RΣehμ∈Eμxehμej·ar,τ·Bμ≤Bej


where ar,τ represents whether the request rμ∈R is still in the service, and Bej represents the bandwidth size of the node ni∈N.


The delay constraint is as follows:

C3:∀rμ∈R,Dμ≤Dμmax


where Dμmax represents the size of a maximum end-to-end delay.


3. Propose an SFC Parallel Problem by Combining with Reality


In SFC, some VNFs may work independently without affecting other VNFs, so the serial SFCs can be converted into the parallel SFCs. However, not all VNFs in the SFC can work in parallel. In a case that two VNFs modify the content of the flow or violate a dependency constraint, the operations of the two VNFs are in conflict. Only in a case that the VNFs in the SFC are independent of each other, parallel processing can be performed among the VNFs; otherwise, the correctness of the network and service strategy may be destroyed.


According to the present invention, the VNFs are divided into two types of monitors and shapers, where the monitors are responsible for monitoring the flow rate without any modification, and the shapers are used for processing and modifying the flow rate. Since the VNFs in the SFCs are applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship It is stipulated in the present invention that in a case that one VNF Vfμ is before another VNF fv+1μ, fv+1μ depends on fvμ, denoted as fvμ<fv+1μ. A dependency relationship among different VNFs is shown in FIG. 3.


To process the data packet in parallel, two functions are required: 1) a copying function, and 2) a merging function. As shown in FIG. 4, when one data packet enters, the copying function will copy the data packet and send the data packet to the VNFs capable of being processed in parallel. After the data packet is processed, the copied data packet is merged by the merging function. The copying and merging of the data packet will cause additional resource consumption and delay. Therefore, in the SFC parallel problem, an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:

min(αC+βΔd)


where α and β respectively represent weight coefficients of the additional resource consumption and delay.


C represents the additional resource consumption caused by the copying and merging of the data packet, with the following formula:






C
=







μ
=
1




"\[LeftBracketingBar]"

R


"\[RightBracketingBar]"












B


r
μ





6

4
×



Φ
B

-
1

U






where B is one group of parallel branches, ΦB represents the parallelism degree of B, and U represents the size of the data packet.


The additional delay Δd caused by the copying and merging of the data packet may be represented as:









B


r
μ


R


,


Δ

d

=







μ
=
1




"\[LeftBracketingBar]"

R


"\[RightBracketingBar]"




30
×


v
μ


1


0
9









where Vμ is the data quantity of the μth SFC in the request R.


For the SFC parallel problem, the flow constraint is introduced. Oc is used for representing a set of copying nodes in rμ, and Om represents a set of merging nodes in rμ. Oc(ni) and Om(ni) respectively represent the number of the copying nodes and the merging nodes in For except the copying nodes, the merging nodes, the source node and the target node, all intermediate nodes meet the flow conservation, that is,

rμ∈R,∀nh∈N,nh∉{nsrc,ndst,Oc,Om}: Σehμ∈EμΣng∈Nxehμng,nh−Σehμ∈EμΣnk∈Nxehμnh,nk=0


In a case that the source node is one copying node, it is necessary to meet the constraint condition:

rμ∈R,∀nsrc∈Oc: Σehμ∈EμΣnu∈Nxehμnsrc,nu−Σehμ∈EμΣnv∈Nxehμnv,nsrc=Oc(nsrc)


In a case that the target node is one merging node, it is necessary to meet the following constraint condition:

rμ∈R,∀ndst∈Om: Σehμ∈EμΣnv∈Nxehμnv,ndst−Σehμ∈EμΣnu∈Nxehμndst,nu=Om(ndst)


For the situation that other nodes are the copying nodes, it should meet the following formula:

rμ∈R,∀nh∈Oc,Σfvμ∈Fμxfvμnh=1: Σehμ∈EμΣnu∈Nxehμnh,nu−Σehμ∈EμΣnv∈Nxehμnv,nh=Oc(nh)−Om(nh)


For the situation that other nodes are the merging nodes, it should meet the following formula:

rμ∈R,∀nh∈Om,Σfvμ∈Fμxfvμnh=1: Σehμ∈EμΣnv∈Nxehμnv,nh−Σehμ∈EμΣnu∈Nxehμnh,nu=Om(nh)−Oc(nh)


4. Design an SFC Parallel Strategy


An objective of the SFC parallel strategy is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel. The specific algorithm process of the SFC parallel strategy is as follows:

    • 1) initialize a branch chain set B, a main chain S and a monitor set M;
    • 2) traverse rμ: in a case that fiμ∈rμ is monitor, firstly initialize a branch chain b∈B, then adding fiμ into b and M; and in a case that fiμ∈rμ is shaper, add fiμ into a main chain S, at this time, search monitor on which fiμ depends on in M, for each such monitor, for example, k∈M, having a branch chain that takes k as an end point at present, then point k to fiμ so as to extend the branch chain to take fiμ as an end point, and remove k from M;
    • 3) invoke a path search algorithm to find all path sets PATH executable in parallel; and
    • 4) return to the branch chain set B, the main chain S and the path set PATH.


5. Design a VNF Topological Order Strategy


An objective of the VNF topological order algorithm is to find a VNF topological order about a plurality of SFCs, and the delay can be reduced by deploying the VNF according to the order. The specific algorithm process is as follows:

    • 1) initialize f as a source node;
    • 2) traverse a request set R arriving at the same time, invoke an algorithm 1 to obtain the branch chain set B, the main chain S and the path set PATH, solve a path with a maximum delay in all the paths according to the path set PATH, and add the path into a set C;
    • 3) traverse C, and create a directed weighted graph graph=(F, ω), where F is a set of VNFs;
    • 4) invoke a minimum feedback arc set algorithm to solve a minimum feedback arc set of graph, verify whether the solved topological order meets the dependency relationship of the VNFs among different chains, return to the topological order in a case that the solved topological order meets the dependency relationship of the VNFs among different chains, otherwise, return to False(that is, the dependency condition is violated, and the algorithm cannot be used).


6. Design a DPU Processing Strategy


An objective of the DPU processing strategy is to take over the data processing tasks which CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, thereby saving the CPU computing resource and reducing the delay.


In a case that there is a high-priority request in request set that arrives at a certain moment, it is determined whether there is a VNF responsible for the data processing tasks, such as network protocol processing, data encryption and decryption and data compression, in the request. DPU is used for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and CPU is for processing in a case that there is no VNF responsible for the data processing task in the request.


7. Design an SFC Heterogeneous Deployment Algorithm Based on Deep Reinforcement Learning According to the Above Strategy and Architecture


An objective of the SFC heterogeneous deployment algorithm based on deep reinforcement learning is to respectively adopt different strategies for processing according to the number and situation of requests arriving at different moments, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy, thereby better deploying the SFC. The specific algorithm process is as follows:

    • 1) delete an overtime request by a system first, divide the arrived requests R by a priority judgment apparatus according to the real-time performance, divide the request with high real-time performance into a high-priority R_high, and divide the requests with low real-time performance into a low-priority R_low;
    • 2) initialize a time slot T;
    • 3) according to the numbers of R_high and R_low, determine which strategy to adopt to process the SFC, as shown in FIG. 5 for details;
    • 4) construct and train a neural network model, take the status of the current physical network, the characteristic of the request being processed and the above information as an input, and output the deployment strategy of each VNF through the computation of the neural network; and
    • 5) update the network status.


8. Achieve Deployment of SFC According to a Deployment Strategy


The orchestrator in the heterogeneous computing architecture calls the driver module to transfer the deployment scheme to the server for placement, and the server respectively uses CPU or DPU to complete the best deployment of the SFC according to the deployment scheme, thereby reducing the delay and cost.


In this embodiment, to verify the actual effect of the present invention (PSHD), a stimulated comparison experiment is performed with other two algorithms (BASE and FCPM) by taking the request number as a control variable. Since BASE and PSHD have the same objective, one group of experiments taking the request arrival rate as the control variable is performed in this embodiment, thereby proving the effectiveness of the present invention.



FIG. 6 compares the average delay of each batch of three algorithms in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. As can be seen from the figure, the delay of the method of the present invention is always minimum, and with the increase of the number of the requests, the delay gap between the method of the present invention and other algorithms is increasing, indicating that the more the number of the requests, the better the delay reduction performance of the method of the present invention, which is 37.73% and 34.26% lower than the delays of BASE and FCPM, respectively. FIG. 7 also shows that the delay of the method of the present invention is always the lowest in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3. With the increase of the arrival rate of the requests, it can be seen that the delay gap between the method of the present invention and BASE is larger and larger, indicating that the more the number of the requests arriving at the same moment, the better the delay reduction performance of the method of the present invention, which is 47.04% lower than the delay of BASE. In addition, PD in FIG. 6 and FIG. 7 shows the delay trend in a case that a VNF topological order strategy is not adopted. It can be seen that the delay gap between PD and the method of the present invention is gradually increasing, which proves the effectiveness of reducing the delay by the VNF topological order strategy.


In this embodiment, compared with the most advanced method, the request acceptance rate of the method of the present invention is discussed. FIG. 8 describes the result in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. FIG. 9 describes the result in a case that the number of the service nodes is 100 and the arrival rate of the requests changes from 0.5 to 3. As shown in FIG. 8, the request acceptance rate of the method of the present invention is the highest, with an average of 0.728, followed by the BASE algorithm with an average of 0.668, and the acceptance rate of FCPM is the lowest and is 0.517. FIG. 9 also proves the request acceptance rate of the method of the present invention is the highest, with an average of 0.719. It can be seen from the figure that with the increase of the arrival rate, the reduction trend of the request acceptance rate of BASE is obviously greater than that of the method of the present invention, which provides that the more the requests arriving at a certain moment, the worse the deployment ability of BASE, and the stronger the deployment ability of the method of the present invention.


This embodiment compares the average delay of each SFC of three algorithms in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. As shown in FIG. 10, the cost of the method of the present invention is the highest and is increased with the increase of the number of the requests, because the method of the present invention introduces the heterogeneous architecture of CPU+DPU, that is, the DPU is used for rapid processing according to the situation, and the use cost of the DPU is higher. FIG. 11 also shows that in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3, the cost of the method of the present invention is always the highest, with an average of 8.61, and the average cost of BASE is 4.55. The increase of the cost of the method of the present invention brings the increase of the request acceptance rate and the reduction of the delay, as shown in FIG. 6 to FIG. 9, and as shown in FIG. 12 to FIG. 13, the reward of the method of the present invention is always the highest. The above proves that sacrificing part of the cost can bring better benefits.


Finally, in this embodiment, the DPU usage is compared in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300, and in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3. As shown in FIG. 14 and FIG. 15, the DPU usage is increased with the increase of the number of the requests/the arrival rate of the requests, and the increase of the DPU usage brings the increase of the cost, as shown in FIG. 10 to FIG. 11, higher request acceptance rate compared with other algorithms, as shown in FIG. 8 to FIG. 9, and lower delay, as shown in FIG. 6 to FIG. 7. It can be seen from the reward comparison of the three algorithms in FIG. 12, the performance of the method of the present invention is much higher than those of other algorithms.

Claims
  • 1. An efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform, comprising the following steps: (1) constructing a heterogeneous computing architecture for solving a multi-objective deployment problem, wherein the heterogeneous computing architecture comprises an orchestrator responsible for management and coordination, and a server based on a CPU+DPU structure;(2) converting serial SFCs into parallel SFCs by an SFC parallel strategy according to the independence among VNFs, and accordingly solving an SFC parallel problem;wherein an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet: min(αC+βΔd)
  • 2. The efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform according to claim 1, wherein the heterogeneous computing architecture in step (1) is composed of an orchestrator and a server, the orchestrator comprises an SFC deployment algorithm module, a resource management module and a driver module; the server comprises a CPU, a DPU and a bus for connecting the CPU and the DPU;the orchestrator is responsible for managing and deploying the arrived SFC, and the server composed of a heterogeneous computing structure is responsible for sequentially processing VNFs in different SFCs according to the deployment strategy conveyed by the orchestrator; wherein a specific task of the orchestrator comprises: receiving an SFC request from a network operator, and running the SFC deployment algorithm to determine which SFCs are accepted and how to place these SFCs; andfor different situations of different requests, the method respectively invokes the parallel strategy, the VNF topological order strategy and the DPU processing strategy so as to obtain an optimal deployment scheme of each request, then a resource management module is invoked to manage resources, and finally, a driver module is invoked to transmit the deployment scheme to a server for placement, and the server completes the deployment of the SFC by using the CPU or the DPU respectively according to the deployment scheme.
  • 3. The efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform according to claim 1, wherein the multi-objective deployment problem in step (1) is specifically described as follows: an objective function of the SFC deployment problem is as follows:
  • 4. The efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform according to claim 2, wherein the multi-objective deployment problem in step (1) is specifically described as follows: an objective function of the SFC deployment problem is as follows:
  • 5. The efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform according to claim 1, wherein the SFC parallel problem in step (2) is specifically described as follows: The serial SFCs which work independently without affecting other VNFs are converted into parallel SFCs; since the VNFs in the SFCs is applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship; the method stipulates that in a case that one VNF fvμ is before another VNF fv+1μ, fv+1μ depends on fvμ, denoted as fvμ<fv+1μ;to process the data packets in parallel, two functions of copying and merging are required; when one data packet enters, the copying function will copy and send the data packet to the VNFs capable of being processed in parallel; and after the data packet is processed, the copied data packet is merged by the merging function, and the copying and merging of the data packet will cause additional resource consumption and delay.
  • 6. The efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform according to claim 1, wherein for the SFC parallel problem, a flow constraint is introduced, Oc is used for representing a set of copying nodes in rμ, Om represents a set of merging nodes in rμ, Oc(ni) and Om(ni) respectively represent the number of the copying nodes and the merging nodes in rμ; and for SFC rμ, all intermediate nodes meet flow conservation except the copying nodes, the merging node, the source node and the target node.
Priority Claims (1)
Number Date Country Kind
202211352904.1 Nov 2022 CN national
US Referenced Citations (3)
Number Name Date Kind
20190104071 Kobayashi Apr 2019 A1
20190199649 Kobayashi Jun 2019 A1
20200026575 Guim Bernat Jan 2020 A1
Foreign Referenced Citations (3)
Number Date Country
110022230 Jul 2019 CN
113411207 Sep 2021 CN
113918277 Jan 2022 CN
Non-Patent Literature Citations (2)
Entry
Meigen Huang, Tao Wang, Liang Liu, Ruiqin Pang, and Huan Du; Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization Computer Science, Issue S1, 404-408 Publication date: Jun. 15, 2020.
Weilin Zhou; Yuan Yang; Weiming Xu; Review of Research on Network Function Virtualization Technology Computer Research and Development, Issue 04, p. 675-688 Publication date: Apr. 15, 2018.