This application claims priority to Chinese Patent Application Ser. No. CN202211352904.1 filed on 1 Nov. 2022.
The present invention belongs to a service function chain orchestration technology, and particularly relates to an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform.
The traditional mode of completing data forwarding based on a central process unit (CPU) has a bottleneck, and a computing architecture only composed of CPU cannot meet diversified scenarios and business requirements. On one hand, as Moore's law is slowed down, the network bandwidth and the connection number are increased more widely and more densely than a data path, computing nodes at ends, edges and cloud are directly exposed under the increased data volume, and the computing power increasing speed of the CPU and the network bandwidth increasing speed show a “scissor difference”; and on the other hand, in a high concurrent forwarding work, it is difficult for the serial computing mode of the CPU to exert the maximum computing power.
The advent of network function virtualization (NFV) provides a novel way to design, coordinate, deploy, and standardize various mobile services to support increasingly complex and diverse service requests, thereby making SFC deployment more flexible and agile. A current service function chain (SFC) deployment system focuses on optimizing the network resource utilization ratio, and does not consider the diversity of service requirements and the degradation of service forwarding performance.
A cloud architecture based on a data processing unit (DPU) uninstalls and accelerates the network and storage system as required, compared with a graphics processing unit (GPU), a field programmable gate array (FPGA) and an application-specific integrated circuit (ASIC), the higher-cost host CPU computing power is liberated, and the ultimate performance can be achieved more economically and effectively. The architecture transformation caused by DPU improves the energy efficiency cost ratio of the whole data center resource pool and the profit cost ratio of public cloud manufacturers to a certain extent, and the DPU can cover more and more demands and scenarios in the future. In this regard, the research content of the present invention is to solve some problems faced by the current traditional computing architecture and the traditional SFC deployment system based on DPU.
An objective of the present invention is: to solve the problems of reduced forwarding performance, diversified scenarios and requirements and the like of the current traditional computing architecture and the traditional SFC deployment system, the present invention provides an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform.
The technical solution is: an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform includes the following steps:
represents an additional resource consumption caused by copying and merging of the data packet, and
represents an additional delay caused by copying and merging of the data packet;
Further, the heterogeneous computing architecture in step (1) is specifically described as follows:
Further, the multi-objective SFC deployment problem in step (1) is specifically described as follows:
For f1, Dμ is a total response delay, that is,
Dμ=Lμ+Pμ+Tμ+
is a processing delay,
is a transmission delay, and
For f2, Σr
For f3, C(τ) represents a total deployment cost, that is,
C(τ)=SC(τ)+Cscale(τ)CDPU(τ)
SC(τ) represents a total operation cost, that is, the sum of the cost of turning on the server
and the cost of successfully placing the VNF:
SC(τ)=Σn
xf
Cscale(τ)=Σn
CDPU(τ) represents the total use cost of DPU and is defined as follows:
CDPU(τ)=Σn
where ζc
where sn
The bandwidth constraint is as follows:
C2:∀ej∈E,Σr
where ar,τ represents whether the request rμ∈R is still in the service, and Be
The delay constraint is as follows:
C3: ∀rμ∈R,Dμ≤Dμmax
where Dμmax represents the size of a maximum end-to-end delay.
Further, the SFC parallel problem in step (2) is specifically described as follows:
in SFC, some VNFs may work independently without affecting other VNFs, so the serial SFCs can be converted into the parallel SFCs. However, not all VNFs in the SFC can work in parallel. In a case that two VNFs modify the content of the flow or violate a dependency constraint, the operations of the two VNFs are in conflict. Only in a case that the VNFs in the SFC are independent of each other, parallel processing can be performed among the VNFs; otherwise, the correctness of the network and service strategy may be destroyed.
According to the method of the present invention, the VNFs are divided into two types of monitors and shapers, where the monitors are responsible for monitoring the flow rate without any modification, and the shapers are used for processing and modifying the flow rate. Since the VNFs in the SFCs are applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship It is stipulated in the present invention that in a case that one VNF fvμ is before another VNF fv+1μ, fv+1μ depends on fvμ, denoted as fvμ<fv+1μ.
To process the data packet in parallel, two functions are required, that is, a copying function and a merging function.
When one data packet enters, the copying function will copy the data packet and send the data packet to the VNFs capable of being processed in parallel. After the data packet is processed, the copied data packet is merged by the merging function. The copying and merging of the data packet will cause additional resource consumption and delay. Therefore, in the SFC parallel problem, an objective function of the SFC parallel problem is to minimize additional resource consumption and delay caused by copying and merging of the data packet:
min(αC+βΔd)
where α and β respectively represent weight coefficients of the additional resource consumption and delay.
C represents the additional resource consumption caused by the copying and merging of the data packet, with the following formula:
where B is one group of parallel branches, ΦB represents the parallelism degree of B, and U represents the size of the data packet.
The additional delay Δd caused by the copying and merging of the data packet may be represented as:
where Vμ is the data quantity of the μth SFC in the request R.
For the SFC parallel problem, the flow constraint is introduced. Oc is used for representing a set of copying nodes in rμ, and Om represents a set of merging nodes in rμ. Oc(ni) and Om(ni) respectively represent the number of the copying nodes and the merging nodes in rμ. For rμ, except the copying nodes, the merging nodes, the source node and the target node, all intermediate nodes meet the flow conservation, that is,
∀rμ∈R,∀nh∈N,nh∉{nsrc,ndst,Oc,Om}: Σe
In a case that the source node is one copying node, it is necessary to meet the constraint condition:
∀rμ∈R,∀nsrc∈Oc: Σe
In a case that the target node is one merging node, it is necessary to meet the following constraint condition:
∀rμ∈R,∀ndst∈Om: Σe
For the situation that other nodes are the copying nodes, it should meet the following formula:
∀rμ∈R,∀nh∈Oc,Σf
For the situation that other nodes are the merging nodes, it should meet the following formula:
∀rμ∈R,∀nh∈Om,Σf
Further, an objective of the SFC parallel strategy in step (2) is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel. The specific algorithm process of the SFC parallel strategy is as follows:
21) initializing a branch chain set B, a main chain S and a monitor set M;
22) traversing rμ: in a case that fiμ∈rμ is monitor, firstly initializing a branch chain b E B, then adding fiμ into b and M; and in a case that fiμ∈rμ is shaper, adding fiμ into a main chain S, at this time, searching monitor on which fiμ depends on in M, for each such monitor, for example, k∈M, having a branch chain that takes k as an end point at present, then pointing k to fiμ so as to extend the branch chain to take fiμ as an end point, and removing k from M;
23) invoking a path search algorithm to find all path sets PATH executable in parallel; and
24) returning to the branch chain set B, the main chain S and the path set PATH.
Further, the objective of the VNF topological order algorithm in step (3) is to find a VNF topological order about a plurality of SFCs, and the delay can be reduced by deploying the VNF according to the order. The specific algorithm process is as follows:
31) initializing f as a source node;
32) traversing a request set R arriving at the same time, invoking an algorithm 1 to obtain the branch chain set B, the main chain S and the path set PATH, evaluating a path with a maximum delay in all the paths according to the path set PATH, and adding the path into a set C;
33) traversing C, and creating a directed weighted graph graph=(F, ω), where F is a set of VNFs;
34) invoking a minimum feedback arc set algorithm to solve a minimum feedback arc set of graph, verifying whether the solved topological order meets the dependency relationship of the VNFs among different chains, returning to the topological order in a case that the solved topological order meets the dependency relationship of the VNFs among different chains, otherwise, returning to False(that is, the dependency condition is violated, and the algorithm cannot be used).
Further, the DPU processing strategy in step (4) is specifically described as follows:
an objective of the DPU processing strategy is to take over the data processing tasks which CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, thereby saving the CPU computing resource and reducing the delay.
In a case that there is a high-priority request in the request set that arrives at a certain moment, it is determined whether there is a VNF responsible for the data processing tasks, such as network protocol processing, data encryption and decryption and data compression, in the request. DPU is used for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and CPU is for processing in a case that there is no VNF responsible for the data processing task in the request.
Further, according to the number and situation of the requests arriving at different moments, the objective of the SFC heterogeneous deployment algorithm based on deep reinforcement learning in step (5) is processed with different strategies respectively, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy to achieve deployment of SFC better. The specific algorithm process is as follows:
51) deleting an overtime request by a system first, dividing the arrived requests Rby a priority judgment apparatus according to the real-time performance, dividing the request with high real-time performance into a high-priority R_high, and dividing the requests with low real-time performance into a low-priority R_low;
52) initializing a time slot T;
53) according to the numbers of R_high and R_low, determining which strategy to adopt
to process the SFC;
54) constructing and training a neural network model, taking the status of the current physical network, the characteristic of the request being processed and the above information as an input, and outputting the deployment strategy of each VNF through the computation of the neural network; and
55) updating the network status.
The beneficial effects are as follows: compared with the prior art, the method of the present invention adopts DPU to solve the current traditional computing architecture and the traditional SFC deployment system, liberates the data processing tasks which CPU is not good at, such as the network protocol processing, data encryption and decryption and data compression by combining CPU and DPU, saves the computing resource and optimizes the processing efficiency. Based on the implementation of the method, the present invention provides VNF topological order algorithm, a DPU processing strategy and an SFC parallel strategy; a novel way is provided to design, coordinate, deploy, and standardize various mobile services to support increasingly complex and diverse service requests, thereby making SFC deployment more flexible and agile; and the diversity of the service requirement is considered, and the service forwarding performance is improved.
To describe the technical solution disclosed by the present invention in detail, the present invention is further described below with reference to the accompanying drawings and embodiments.
The present invention provides an efficient parallelization and deployment method of a multi-objective service function chain based on a CPU+DPU platform, which is mainly used for solving the problems of reduced forwarding performance, diversified scenarios and requirements and the like faced by the current traditional computing architecture and the traditional SFC deployment system.
Infrastructures such as cloud computing, a data center and an intelligent computing center rapidly expand the capacity, the network bandwidth is developed from 10G to 25G, 40G, 100G, 200G, even 400G. The traditional mode of completing data forwarding based on CPU has a bottleneck. To improve the data forwarding performance of CPU, Intel launches a data plane development kit (DPDK) accelerating scheme, and the I/O data processing performance is improved by bypassing a kernel protocol stack and in a user status kernel binding polling mode, thereby greatly increasing the packet forwarding rate. However, under the trend of large bandwidth, the CPU overhead of this scheme is hard to ignore. At the 25G bandwidth rate, most data centers only require 25% or even more of CPU overhead to meet the data I/O requirement of the business. In addition, with the popularization of an artificial intelligence scenario, more and more AI computer tasks on cloud propose more extreme requirements for the delay performance of the network and storage I/O; and the high-performance network, such as remote direct data access (RDMA) and nonvolatile memory host controller interface specification (NVMe), and the storage protocol are hard to meet the multi-tenant requirement scenarios of cloud computing under the traditional network card architecture. Under this background, DPU comes into being to solve many problems of the I/O performance bottleneck in the post-moore era and the development limitation of the virtualization technology.
DPU is a novel data processing unit integrating network acceleration in the 5G era. In essence, DPU is classified computing, which uninstalls data processing/preprocessing from CPU, and at the same time distributes computing power closer to a place where data occurs, thus reducing communication traffic. RDMA, network function, storage function, security function and virtualization function are fused in the DPU. The DPU may be used for taking over the data processing tasks which the CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, and taking the transmission and computing requirements into account. In addition, the DPU may also play a role of a connection hub. One end is connected to local resources such as CPU, GPU, a solid hard disk (SSD) and FPGA acceleration card, and one end is connected to network resources such as a switch/router. In general, the DPU improves the network transmission efficiency and releases the CPU computing power resource, thereby driving the whole data center to reduce cost and enhance efficiency. Compared with the traditional network card, the DPU increases the cost of a single part, but the introduction of the DPU liberates the host CPU computing power with higher cost and releases more saleable resources. Therefore the architecture transformation caused by the DPU increases the energy efficiency cost ratio of the whole data center resource pool and the revenue cost ratio of public cloud vendors to a certain extent. The DPU may cover more and more requirements and scenarios in the future. Meanwhile, the introduction of the NFV causes the SFC deployment to face the problems of low forwarding efficiency, diversified service requirements and the like. Under this opportunity background, the present invention studies the heterogeneous computing architecture composed of CPU and DPU, and the architecture is used for solving the SFC deployment problem.
The implementation process of the technical solution provided by the present invention is described below in detail.
The method of the present invention deploys the SFC by taking CPU+DPU as a computing architecture. An orchestrator and a server are mainly included. The orchestrator is responsible for receiving an SFC request from a network operator and running an SFC deployment algorithm to determine which SFCs are accepted and how to place these accepted SFCs. The server composed of a heterogeneous computing structure is responsible for deploying the SFC respectively by CPU or DPU according to the deployment strategy conveyed by the orchestrator.
The main implementation process of the method of the present invention is shown in
The implementation process is described below in detail.
1. Construct a Heterogeneous Computing Architecture
As shown in
2. Propose a Multi-Objective SFC Deployment Problem by Combining with the Reality and the Proposed Computing Architecture.
An objective function of the SFC deployment problem is as follows:
for f1, Dμ is a total response delay, that is,
Dμ=Lμ+Pμ+Tμ+
where Lμ=Σe
is a processing delay,
is a transmission delay, and
For f2, Σr
For f3, C(τ) represents a total deployment cost, that is,
C(τ)=SC(τ)+Cscale(τ)CDPU(τ)
SC(τ) represents a total operation cost, that is, the sum of the cost of turning on the server and the cost of successfully placing the VNF:
SC(τ)=Σn
xf
Cscale(τ)=Σn
CDPU(τ) represents the total use cost of DPU and is defined as follows:
CDPU(τ)=Σn
where ζc
The resource constraint is as follows:
where sn
The bandwidth constraint is as follows:
C2: ∀ej∈E,Σr
where ar,τ represents whether the request rμ∈R is still in the service, and Be
The delay constraint is as follows:
C3:∀rμ∈R,Dμ≤Dμmax
where Dμmax represents the size of a maximum end-to-end delay.
3. Propose an SFC Parallel Problem by Combining with Reality
In SFC, some VNFs may work independently without affecting other VNFs, so the serial SFCs can be converted into the parallel SFCs. However, not all VNFs in the SFC can work in parallel. In a case that two VNFs modify the content of the flow or violate a dependency constraint, the operations of the two VNFs are in conflict. Only in a case that the VNFs in the SFC are independent of each other, parallel processing can be performed among the VNFs; otherwise, the correctness of the network and service strategy may be destroyed.
According to the present invention, the VNFs are divided into two types of monitors and shapers, where the monitors are responsible for monitoring the flow rate without any modification, and the shapers are used for processing and modifying the flow rate. Since the VNFs in the SFCs are applied to each data packet flow necessarily according to a specific order, the VNFs form a dependency relationship It is stipulated in the present invention that in a case that one VNF Vfμ is before another VNF fv+1μ, fv+1μ depends on fvμ, denoted as fvμ<fv+1μ. A dependency relationship among different VNFs is shown in
To process the data packet in parallel, two functions are required: 1) a copying function, and 2) a merging function. As shown in
min(αC+βΔd)
where α and β respectively represent weight coefficients of the additional resource consumption and delay.
C represents the additional resource consumption caused by the copying and merging of the data packet, with the following formula:
where B is one group of parallel branches, ΦB represents the parallelism degree of B, and U represents the size of the data packet.
The additional delay Δd caused by the copying and merging of the data packet may be represented as:
where Vμ is the data quantity of the μth SFC in the request R.
For the SFC parallel problem, the flow constraint is introduced. Oc is used for representing a set of copying nodes in rμ, and Om represents a set of merging nodes in rμ. Oc(ni) and Om(ni) respectively represent the number of the copying nodes and the merging nodes in For except the copying nodes, the merging nodes, the source node and the target node, all intermediate nodes meet the flow conservation, that is,
∀rμ∈R,∀nh∈N,nh∉{nsrc,ndst,Oc,Om}: Σe
In a case that the source node is one copying node, it is necessary to meet the constraint condition:
∀rμ∈R,∀nsrc∈Oc: Σe
In a case that the target node is one merging node, it is necessary to meet the following constraint condition:
∀rμ∈R,∀ndst∈Om: Σe
For the situation that other nodes are the copying nodes, it should meet the following formula:
∀rμ∈R,∀nh∈Oc,Σf
For the situation that other nodes are the merging nodes, it should meet the following formula:
∀rμ∈R,∀nh∈Om,Σf
4. Design an SFC Parallel Strategy
An objective of the SFC parallel strategy is to identify a VNF in the chain according to a dependency relationship among the VNFs to find all chains executable in parallel. The specific algorithm process of the SFC parallel strategy is as follows:
5. Design a VNF Topological Order Strategy
An objective of the VNF topological order algorithm is to find a VNF topological order about a plurality of SFCs, and the delay can be reduced by deploying the VNF according to the order. The specific algorithm process is as follows:
6. Design a DPU Processing Strategy
An objective of the DPU processing strategy is to take over the data processing tasks which CPU is not good at, such as network protocol processing, data encryption and decryption and data compression, thereby saving the CPU computing resource and reducing the delay.
In a case that there is a high-priority request in request set that arrives at a certain moment, it is determined whether there is a VNF responsible for the data processing tasks, such as network protocol processing, data encryption and decryption and data compression, in the request. DPU is used for rapid processing in a case that there is a VNF responsible for the data processing task in the request, and CPU is for processing in a case that there is no VNF responsible for the data processing task in the request.
7. Design an SFC Heterogeneous Deployment Algorithm Based on Deep Reinforcement Learning According to the Above Strategy and Architecture
An objective of the SFC heterogeneous deployment algorithm based on deep reinforcement learning is to respectively adopt different strategies for processing according to the number and situation of requests arriving at different moments, that is, a parallel strategy, a VNF topological order strategy and a DPU processing strategy, thereby better deploying the SFC. The specific algorithm process is as follows:
8. Achieve Deployment of SFC According to a Deployment Strategy
The orchestrator in the heterogeneous computing architecture calls the driver module to transfer the deployment scheme to the server for placement, and the server respectively uses CPU or DPU to complete the best deployment of the SFC according to the deployment scheme, thereby reducing the delay and cost.
In this embodiment, to verify the actual effect of the present invention (PSHD), a stimulated comparison experiment is performed with other two algorithms (BASE and FCPM) by taking the request number as a control variable. Since BASE and PSHD have the same objective, one group of experiments taking the request arrival rate as the control variable is performed in this embodiment, thereby proving the effectiveness of the present invention.
In this embodiment, compared with the most advanced method, the request acceptance rate of the method of the present invention is discussed.
This embodiment compares the average delay of each SFC of three algorithms in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300. As shown in
Finally, in this embodiment, the DPU usage is compared in a case that the number of the service nodes is 12 and the number of the requests changes from 50 to 300, and in a case that the number of the requests is 100 and the arrival rate of the requests changes from 0.5 to 3. As shown in
Number | Date | Country | Kind |
---|---|---|---|
202211352904.1 | Nov 2022 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
20190104071 | Kobayashi | Apr 2019 | A1 |
20190199649 | Kobayashi | Jun 2019 | A1 |
20200026575 | Guim Bernat | Jan 2020 | A1 |
Number | Date | Country |
---|---|---|
110022230 | Jul 2019 | CN |
113411207 | Sep 2021 | CN |
113918277 | Jan 2022 | CN |
Entry |
---|
Meigen Huang, Tao Wang, Liang Liu, Ruiqin Pang, and Huan Du; Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization Computer Science, Issue S1, 404-408 Publication date: Jun. 15, 2020. |
Weilin Zhou; Yuan Yang; Weiming Xu; Review of Research on Network Function Virtualization Technology Computer Research and Development, Issue 04, p. 675-688 Publication date: Apr. 15, 2018. |