This application claims the benefit of India patent application No. 202341060929, filed 11 Sep. 2023, which is incorporated by reference herein in its entirety.
This disclosure relates to computer networks and, more specifically, to computer networks having at least a portion of energy requirements met by renewable energy sources.
In a typical cloud data center environment, there is a large collection of interconnected servers that provide computing and/or storage capacity to run various applications. For example, a data center may comprise a facility that hosts applications and services for subscribers, e.g., customers of the data center. The data center may, for example, host all of the infrastructure equipment, such as networking and storage systems, redundant power supplies, and environmental controls. In a typical data center, clusters of storage servers and application servers (compute nodes) are interconnected via high-speed switch fabric provided by one or more tiers of physical network switches and routers. More sophisticated data centers provide infrastructure spread throughout the world with subscriber support equipment located in various physical hosting facilities.
As data centers become larger, energy usage by the data centers increases. Some large data centers require a significant amount of power (e.g., around 100 megawatts), which is enough to power a large number of homes (e.g., around 80,000). Data centers may also run application workloads that are compute and data intensive, such as crypto mining and machine learning applications, that consume a significant amount of energy. As energy use has risen, customers of data centers and data center providers themselves have become more concerned about meeting energy requirements through the use of renewable (e.g., green) energy sources, as opposed to non-renewable, carbon emitting, fossil fuel-based (e.g., non-green) energy sources. As such, some service level agreements (SLAs) associated with data center services include green energy goals or requirements or limitations on the use of non-green energy sources.
In general, techniques are described for power management to address concerns regarding the use of non-green energy sources. In a cluster, every compute node may run a network function. Such network functions may include switching, routing, or security functions (e.g., firewall functions). Such network functions may use direct polling of network interfaces to determine when a packet is available for application of the network function. Direct polling is achieved by allocating a dedicated number of processor (e.g., CPU) cores to the polling action. Direct polling provides high performance, but requires the dedicated processor cores to run at full capacity (e.g., 100% busy) irrespective of traffic load. For example, the dedicated processors cores continue to operate even when there are no packets on the network interfaces for application of the network function. A direct poll when there is no available packet may be referred to as an empty poll. Empty polls waste processor cycles and therefore, waste power.
As such, it may be desirable to have an adaptive power optimizer to reduce processor cycles depending on traffic behavior and traffic criticality. For example, a power optimizer may predict when packets are expected to arrive and may adjust a processor core frequency, pause or resume a processor core, adjust a processor core's internal power saver state, or the like, such that usage is reduced when packets are not expected. Reduction of power usage by processor cores may improve a data center, cluster, or service's ability to meet power related SLA requirements.
In one example, this disclosure describes a computing system including one or more memories and one or more processors communicatively coupled to the one or more memories, the one or more processors being configured to: obtain workload metrics from a plurality of nodes of a cluster; obtain network function metrics from the plurality of nodes of the cluster; and for each node of the plurality of nodes: execute at least one machine learning model to predict a measure of criticality of traffic of the node; determine, based on the measure of criticality of traffic of the node, a power mode for at least one processing core of the node; and recommend or apply the power mode to the at least one processing core of the node.
In another example, this disclosure describes a method including: obtaining, by one or more processors, workload metrics from a plurality of nodes of a cluster; obtaining, by the one or more processors, network function metrics from the plurality of nodes of the cluster; and for each node of the plurality of nodes: executing, by the one or more processors, at least one machine learning model to predict a measure of criticality of traffic of the node; determining, by the one or more processors and based on the measure of criticality of traffic of the node, a power mode for at least one processing core of the node; and recommending or applying, by the one or more processors, the power mode to the at least one processing core of the node.
In another example, this disclosure describes computer-readable media storing instructions which, when executed, cause one or more processors to: obtain workload metrics from a plurality of nodes of a cluster; obtain network function metrics from the plurality of nodes of the cluster; and for each node of the plurality of nodes: execute at least one machine learning model to predict a measure of criticality of traffic of the node; determine, based on the measure of criticality of traffic of the node, a power mode for at least one processing core of the node; and recommend or apply the power mode to the at least one processing core of the node.
The details of one or more examples of this disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Like reference characters denote like elements throughout the description and figures.
Although customer sites 11 and public network 4 are illustrated and described primarily as edge networks of service provider network 7, in some examples, one or more of customer sites 11 and public network 4 may be tenant networks within data center 10 or another data center. For example, data center 10 may host multiple tenants (customers) each associated with one or more virtual private networks (VPNs), each of which may implement one of customer sites 11.
Service provider network 7 offers packet-based connectivity to attached customer sites 11, data center 10, and public network 4. Service provider network 7 may represent a network that is owned and operated by a service provider to interconnect a plurality of networks. Service provider network 7 may implement Multi-Protocol Label Switching (MPLS) forwarding and in such instances may be referred to as an MPLS network or MPLS backbone. In some instances, service provider network 7 represents a plurality of interconnected autonomous systems, such as the Internet, that offers services from one or more service providers.
In some examples, data center 10 may represent one of many geographically distributed network data centers. As illustrated in the example of
In this example, data center 10 includes storage and/or compute servers interconnected via switch fabric 14 provided by one or more tiers of physical network switches and routers, with servers 12A-12X (herein, “servers 12”) depicted as coupled to top-of-rack (TOR) switches 16A-16N. Servers 12 may also be referred to herein as “hosts” or “host devices.” Data center 10 may include many additional servers coupled to other TOR switches 16 of the data center 10.
Switch fabric 14 in the illustrated example includes interconnected top-of-rack (or other “leaf”) switches 16A-16N (collectively, “TOR switches 16”) coupled to a distribution layer of chassis (or “spine” or “core”) switches 18A-18M (collectively, “chassis switches 18”). Although not shown, data center 10 may also include, for example, one or more non-edge switches, routers, hubs, gateways, security devices such as firewalls, intrusion detection, and/or intrusion prevention devices, servers, computer terminals, laptops, printers, databases, wireless mobile devices such as cellular phones or personal digital assistants, wireless access points, bridges, cable modems, application accelerators, or other network devices.
In this example, TOR switches 16 and chassis switches 18 provide servers 12 with redundant (multi-homed) connectivity to IP fabric 20 and service provider network 7. Chassis switches 18 aggregate traffic flows and provides connectivity between TOR switches 16. TOR switches 16 may be network devices that provide layer 2 (MAC) and/or layer 3 (e.g., IP) routing and/or switching functionality. TOR switches 16 and chassis switches 18 may each include one or more processors and a memory and can execute one or more software processes. Chassis switches 18 are coupled to IP fabric 20, which may perform layer 3 routing to route network traffic between data center 10 and customer sites 11 by service provider network 7. The switching architecture of data center 10 is merely an example. Other switching architectures may have more or fewer switching layers, for instance.
Each of servers 12 may be a compute node, an application server, a storage server, or other type of server. For example, each of servers 12 may represent a computing device, such as an x86 processor-based server, configured to operate according to techniques described herein. Servers 12 may provide Network Function Virtualization Infrastructure (NFVI) for a Network Function Virtualization (NFV) architecture.
Servers 12 host endpoints for one or more virtual networks that operate over the physical network represented here by IP fabric 20 and switch fabric 14. Although described primarily with respect to a data center-based switching network, other physical networks, such as service provider network 7, may underlay the one or more virtual networks.
In some examples, servers 12 each may include at least one network interface card (NIC) of NICs 13A-13X (collectively, “NICs 13”), which each include at least one port with which to exchange packets send and receive packets over a communication link. For example, server 12A includes NIC 13A. NICs 13 provide connectivity between the server and the switch fabric. In some examples, NIC 13 includes an additional processing unit in the NIC itself to offload at least some of the processing from the host CPU (e.g., the CPU of the server that includes the NIC) to the NIC, such as for performing policing and other advanced functionality, known as the “datapath.”
In some examples, each of NICs 13 provides one or more virtual hardware components for virtualized input/output (I/O). A virtual hardware component for I/O may be a virtualization of a physical NIC 13 (the “physical function”). For example, in Single Root I/O Virtualization (SR-IOV), which is described in the Peripheral Component Interface Special Interest Group SR-IOV specification, the PCIe Physical Function of the network interface card (or “network adapter”) is virtualized to present one or more virtual network interface cards as “virtual functions” for use by respective endpoints executing on the server 12. In this way, the virtual network endpoints may share the same PCIe physical hardware resources and the virtual functions are examples of virtual hardware components. As another example, one or more servers 12 may implement Virtio, a para-virtualization framework available, e.g., for the Linux Operating System, that provides emulated NIC functionality as a type of virtual hardware component. As another example, one or more servers 12 may implement Open vSwitch to perform distributed virtual multilayer switching between one or more virtual NICs (vNICs) for hosted virtual machines, where such vNICs may also represent a type of virtual hardware component. In some instances, the virtual hardware components are virtual I/O (e.g., NIC) components. In some instances, the virtual hardware components are SR-IOV virtual functions and may provide SR-IOV with Data Plane Development Kit (DPDK)-based direct process user space access.
In some examples, including the illustrated example of
In some examples, NICs 13 each include a processing unit to offload aspects of the datapath. The processing unit in the NIC may be, e.g., a multi-core ARM processor with hardware acceleration provided by a Data Processing Unit (DPU), Field Programmable Gate Array (FPGA), and/or an ASIC. NICs 13 may alternatively be referred to as SmartNICs or GeniusNICs.
Edge services controller 28 may manage the operations of the edge services platform within NIC 13s in part by orchestrating services (e.g., services 233 as shown in
Edge services controller 28 may communicate information describing services available on NICs 13, a topology of NIC fabric 13, or other information about the edge services platform to an orchestration system (not shown) or network controller 24. Example orchestration systems include OpenStack, vCenter by VMWARE, or System Center by MICROSOFT. Example network controllers 24 include a controller for Contrail by JUNIPER NETWORKS or Tungsten Fabric. Additional information regarding a controller 24 operating in conjunction with other devices of data center 10 or other software-defined network is found in International Application Number PCT/US2013/044378, filed Jun. 5, 2013, and entitled “PHYSICAL PATH DETERMINATION FOR VIRTUAL NETWORK PACKET FLOWS;” and in U.S. patent application Ser. No. 14/226,509, filed Mar. 26, 2014, and entitled “Tunneled Packet Aggregation for Virtual Networks,” each of which is incorporated by reference as if fully set forth herein.
In some examples, network controller 24 or edge services controller 28 may determine the energy efficiency and/or usage of data center 10 and/or the energy efficiency and/or usage of data center 10 when deploying an application workload, and may invoke one or more actions to improve energy efficiency (e.g., save energy) of data center 10. In some examples, network controller 24 or edge services controller 28 determines the energy efficiency and/or usage of data center 10 for workloads running on servers 12 and/or NICs 13. In some examples, network controller 24, edge services controller 28, and/or other device(s) of
In some examples, network controller 24 or edge services controller 28 may implement an application and traffic aware machine learning-based power manager according to the techniques of this disclosure. As further described below, power manager 32 may obtain workload metrics and network function metrics from a plurality of nodes of the cluster. Power manager 32 may execute at least one machine learning model to predict a corresponding measure of criticality of traffic of each node of the plurality of nodes. Power manager 32 may determine, based on the corresponding measure of criticality of traffic of each node, a corresponding power mode for at least one processing core of each node. Power manager 32 may recommend or apply the corresponding power mode to the at least one processing core of each node. For example, any of servers 12 or NICs 13 may implement a node in the cluster.
For example, network controller 24 may obtain workload metrics from a plurality of nodes of a cluster, such as server 12A and NIC 13D which may execute workloads of a cluster. Network controller 24 may obtain network function metrics from the plurality of nodes of the cluster. For each node of the plurality of nodes (e.g., for server 12A and for NIC 13D), network controller 24 may execute at least one machine learning model to predict a measure of criticality of traffic of the node. For example, network controller 24 may predict a measure of criticality for traffic of server 12A and predict a measure of criticality for traffic of NIC 13D. Network controller 24 may determine, based on the measure of criticality of traffic of the node, a power mode for at least one processing core of the node. For example, network controller may determine a first power mode for at least one processing core of server 12A and determine a second power mode for at least one processing core of NIC 13D. Network controller 24 may recommend or apply the power mode to the at least one processing core of the node. For example, network controller 24 may recommend or apply the first power mode to the at least one processing core of server 12A and recommend or apply the second power mode to the at least one processing core of NIC 13D.
Microprocessor 210 may include one or more processors each including an independent execution unit (“processing core”) to perform instructions that conform to an instruction set architecture. Execution units may be implemented as separate integrated circuits (ICs) or may be combined within one or more multi-core processors (or “many-core” processors) that are each implemented using a single IC (i.e., a chip multiprocessor).
Disk 246 represents computer readable storage media that includes volatile and/or non-volatile, removable and/or non-removable media implemented in any method or technology for storage of information such as processor-readable instructions, data structures, program modules, or other data. Computer readable storage media includes, but is not limited to, random access memory (RAM), read-only memory (ROM), EEPROM, flash memory, CD-ROM, digital versatile discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by microprocessor 210.
Memory device 244 includes one or more computer-readable storage media, which may include random-access memory (RAM) such as various forms of dynamic RAM (DRAM), e.g., DDR2/DDR3 SDRAM, or static RAM (SRAM), flash memory, or any other form of fixed or removable storage medium that can be used to carry or store desired program code and program data in the form of instructions or data structures and that can be accessed by a computer. Memory device 244 provides a physical address space composed of addressable memory locations.
Network interface card (NIC) 230 includes one or more interfaces 232 configured to exchange packets using links of an underlying physical network. Interfaces 232 may include a port interface card having one or more network ports. NIC 230 also include an on-card memory 227 to, e.g., store packet data. Direct memory access transfers between the NIC 230 and other devices coupled to bus 242 may read/write from/to the memory 227.
Memory device 244, NIC 230, storage disk 246, and microprocessor 210 provide an operating environment for a software stack that executes a hypervisor 214 and one or more virtual machines 228 managed by hypervisor 214.
In general, a virtual machine provides a virtualized/guest operating system for executing applications in an isolated virtual environment. Because a virtual machine is virtualized from physical hardware of the host server, executing applications are isolated from both the hardware of the host and other virtual machines.
An alternative to virtual machines is the virtualized container, such as those provided by the open-source DOCKER Container application. Like a virtual machine, each container is virtualized and may remain isolated from the host machine and other containers. However, unlike a virtual machine, each container may omit an individual operating system and provide only an application suite and application-specific libraries. A container is executed by the host machine as an isolated user-space instance and may share an operating system and common libraries with other containers executing on the host machine. Thus, containers may require less processing power, storage, and network resources than virtual machines. As used herein, containers may also be referred to as virtualization engines, virtual private servers, silos, or jails. In some instances, the techniques described herein with respect to containers and virtual machines or other virtualization components.
While virtual network endpoints in
Computing device 200 executes a hypervisor 214 to manage virtual machines 228 of user space 245. Example hypervisors include Kernel-based Virtual Machine (KVM) for the Linux kernel, Xen, ESXi available from VMWARE, Windows Hyper-V available from MICROSOFT, and other open-source and proprietary hypervisors. Hypervisor 214 may represent a virtual machine manager (VMM).
Virtual machines 228 may host one or more applications, such as virtual network function instances. In some examples, a virtual machine 228 may host one or more VNF instances, where each of the VNF instances is configured to apply a network function to packets.
Hypervisor 214 includes a physical driver 225 to use the physical function provided by network interface card 230. In some cases, network interface card 230 may also implement SR-IOV to enable sharing the physical network function (I/O) among the virtual machines. Each port of NIC 230 may be associated with a different physical function. The shared virtual devices, also known as virtual functions, provide dedicated resources such that each of virtual machines 228 (and corresponding guest operating systems) may access dedicated resources of NIC 230, which therefore appears to each of the virtual machines as a dedicated NIC. Virtual functions may represent lightweight PCIe functions that share physical resources with the physical function and with other virtual functions. NIC 230 may have thousands of available virtual functions according to the SR-IOV standard, but for I/O-intensive applications the number of configured virtual functions is typically much smaller.
Virtual machines 228 include respective virtual NICs 229 presented directly into the virtual machine 228 guest operating system, thereby offering direct communication between NIC 230 and the virtual machine 228 via bus 242, using the virtual function assigned for the virtual machine. This may reduce hypervisor 214 overhead involved with software-based, VIRTIO and/or vSwitch implementations in which hypervisor 214 memory address space of memory device 244 stores packet data and packet data copying from the NIC 230 to the hypervisor 214 memory address space and from the hypervisor 214 memory address space to the virtual machines 228 memory address space consumes cycles of microprocessor 210.
NIC 230 may further include a hardware-based Ethernet bridge 234 which may include an embedded switch 234. Ethernet bridge 234 may perform layer 2 forwarding between virtual functions and physical functions of NIC 230. Ethernet bridge 234 thus in some cases provides hardware acceleration, via bus 242, of inter-virtual machine packet forwarding and of packet forwarding between hypervisor 214, which accesses the physical function via physical driver 225, and any of the virtual machines. Ethernet bridge 234 may be physically separate from processing unit 25.
Computing device 200 may be coupled to a physical network switch fabric that includes an overlay network that extends switch fabric from physical switches to software or “virtual” routers of physical servers coupled to the switch fabric, including virtual router 220. Virtual routers may be processes or threads, or a component thereof, executed by the physical servers, e.g., servers 12 of
In the example computing device 200 of
In general, each virtual machine 228 may be assigned a virtual address for use within a corresponding virtual network, where each of the virtual networks may be associated with a different virtual subnet provided by virtual router 220. A virtual machine 228 may be assigned its own virtual layer three (L3) IP address, for example, for sending and receiving communications but may be unaware of an IP address of the computing device 200 on which the virtual machine is executing. In this way, a “virtual address” is an address for an application that differs from the logical address for the underlying, physical computer system, e.g., computing device 200.
In one implementation, computing device 200 includes a virtual network (VN) agent (not shown) that controls the overlay of virtual networks for computing device 200 and that coordinates the routing of data packets within computing device 200. In general, a VN agent communicates with a virtual network controller for the multiple virtual networks, which generates commands to control routing of packets. A VN agent may operate as a proxy for control plane messages between virtual machines 228 and virtual network controller, such as controller 24. For example, a virtual machine may request to send a message using its virtual address via the VN agent, and VN agent may in turn send the message and request that a response to the message be received for the virtual address of the virtual machine that originated the first message. In some cases, a virtual machine 228 may invoke a procedure or function call presented by an application programming interface of VN agent, and the VN agent may handle encapsulation of the message as well, including addressing.
In one example, network packets, e.g., layer three (L3) IP packets or layer two (L2) Ethernet packets generated or consumed by the instances of applications executed by virtual machine 228 within the virtual network domain may be encapsulated in another packet (e.g., another IP or Ethernet packet) that is transported by the physical network. The packet transported in a virtual network may be referred to herein as an “inner packet” while the physical network packet may be referred to herein as an “outer packet” or a “tunnel packet.” Encapsulation and/or de-capsulation of virtual network packets within physical network packets may be performed by virtual router 220. This functionality is referred to herein as tunneling and may be used to create one or more overlay networks. Besides IPinIP, other example tunneling protocols that may be used include IP over Generic Route Encapsulation (GRE), VxLAN, Multiprotocol Label Switching (MPLS) over GRE, MPLS over User Datagram Protocol (UDP), etc.
As noted above, a virtual network controller may provide a logically centralized controller for facilitating operation of one or more virtual networks. The virtual network controller may, for example, maintain a routing information base, e.g., one or more routing tables that store routing information for the physical network as well as one or more overlay networks. Virtual router 220 of hypervisor 214 implements a network forwarding table (NFT) 222A-222N for N virtual networks for which virtual router 220 operates as a tunnel endpoint. In general, each NFT 222 stores forwarding information for the corresponding virtual network and identifies where data packets are to be forwarded and whether the packets are to be encapsulated in a tunneling protocol, such as with a tunnel header that may include one or more headers for different layers of the virtual network protocol stack. Each of NFTs 222 may be an NFT for a different routing instance (not shown) implemented by virtual router 220.
An edge services platform leverages processing unit 25 of NIC 230 to augment the processing and networking functionality of computing device 200. Processing unit 25 includes processing circuitry 231 to execute services orchestrated by edge services controller 28. Processing circuitry 231 may represent any combination of processing cores, ASICs, FPGAs, or other integrated circuits and programmable hardware. In an example, processing circuitry may include a System-on-Chip (SoC) having, e.g., one more cores, a network interface for high-speed packet processing, one or more acceleration engines for specialized functions (e.g., security/cryptography, machine learning, storage), programmable logic, integrated circuits, and so forth. Such SoCs may be referred to as data processing units (DPUs). DPUs may be examples of processing unit 25.
In the example NIC 230, processing unit 25 executes an operating system kernel 237 and a user space 241 for services. Kernel may be a Linux kernel, a Unix or BSD kernel, a real-time operating system (OS) kernel, or other kernel for managing hardware resources of processing unit 25 and managing user space 241.
Services 233 may include network, security, storage, data processing, co-processing, machine learning or other services, such as energy efficiency services, in accordance with techniques described in this disclosure. Processing unit 25 may execute services 233 and edge service platform (ESP) agent 236 as processes and/or within virtual execution elements such as containers or virtual machines. As described elsewhere herein, services 233 may augment the processing power of the host processors (e.g., microprocessor 210) by, e.g., enabling the computing device 200 to offload packet processing, security, or other operations that would otherwise be executed by the host processors.
Processing unit 25 executes edge service platform (ESP) agent 236 to exchange data and control data with an edge services controller for the edge service platform. While shown in user space 241, ESP agent 236 may be a kernel module 237 in some instances.
As an example, ESP agent 236 may collect and send, to the ESP controller, telemetry data generated by services 233, the telemetry data describing traffic in the network, computing device 200 or network resource availability, resource availability of resources of processing unit 25 (such as memory or core utilization), and/or resource energy usage. As another example, ESP agent 236 may receive, from the ESP controller, service code to execute any of services 233, service configuration to configure any of services 233, packets or other data for injection into the network.
Edge services controller 28 manages the operations of processing unit 25 by, e.g., orchestrating and configuring services 233 that are executed by processing unit 25; deploying services 233; NIC 230 addition, deletion and replacement within the edge services platform; monitoring of services 233 and other resources on NIC 230; and management of connectivity between various services 233 running on NIC 230. Example resources on NIC 230 include memory 227 and processing circuitry 231. In some examples, edge services controller 28 may invoke one or more actions to improve energy usage of data center 10 via managing the operations of processing unit 25. In some examples, edge services controller 28 may set a target green quotient for processing unit 25 that causes processing unit 25 to select or adjust a particular routing or tunnel protocol, particular algorithm, maximum transmission unit (MTU) size, interface, and/or any of services 233.
In some examples, virtual machine(s) 228 may execute a number of different workloads, for example, workloads of a plurality of services. Power manager 32 may obtain telemetry data, including workload metrics and network function metrics, of computing device 200 to determine power mode(s) which power manager 32 may recommend or apply to one or more processing cores of processing circuitry 231.
An application and traffic aware machine learning based power manager is now described. As discussed above, direct polling of network interfaces provides high performance for a network function, but requires the dedicated processor cores to run at full capacity (e.g., 100% busy) irrespective of traffic load. Existing solutions that attempt to address the possible waste of power by such dedicated processor cores attempt to provide optimization based on an empty poll count. However, such solutions introduce latency and may result in dropped packets. Such solutions may not have an awareness of the cluster network itself and may lack awareness about service criticality and hence, traffic criticality at every compute node. Knowledge of service criticality may be important, as a same power reduction policy may not be optimal for every node at any time. For example, the criticality of a traffic passing (e.g., switching, routing, security, etc.) network function varies based on the services running (or scheduled to run) on the compute node.
According to the techniques of this disclosure, a power manager may selectively apply power optimizations to network functions running at every compute node. Such power optimizations may include lowering processor core frequencies, pausing processor cores, moving processor cores to internal inactive states, etc. Reduction of power usage by processor cores may improve a data center, cluster, or service's ability to meet power related SLA requirements or meet service provider power related goals.
Cloud native network functions like routers and/or firewalls provide high performance and advanced routing functionalities in a cluster fabric. Currently, it is common to use DPDK (Data Plane Development Kit) for a data plane to achieve high network performance using continuous direct polling of Ethernet devices bypassing a kernel network stack. The direct polling of network devices by DPDK based user space applications may be performed using Poll Mode Drivers (PMDs). For example, a dedicated number of processor cores may be assigned to poll the network devices. The processor cores assigned to listen for incoming packets may poll the network packet receiving queues at 100% processor utilization. Due to this polling nature, DPDK-based network functions consume constant power irrespective of incoming traffic rate they are receiving. From a sustainability perspective, in highly scaled scenarios, this kind of redundant power consumption may lead to a high carbon footprint when data centers are powered by nonrenewable (non-green) energy sources and may lead to a failure in achieving sustainability goals of service providers or SLA requirements. In cases of data centers powered using limited available green or renewable energy sources, the cloud native network functions may not be able to meet scaling SLA requirements. So, it may be desirable to have an adaptive power manager that may be configured to save redundant power usage by DPDK-based cloud native network functions, thereby helping service providers to accomplish their sustainability goals without compromising the network traffic SLA requirements.
A power manager, such as cloud native router (CNR) virtual router (vrouter) power manager 302 may manage power used by node 300. In some examples, CNR vrouter power manager 302 may implement the techniques of this disclosure, such as managing power in an application and traffic aware manner, and may be an example of power manager 32.
Forwarding core 406 may poll 422 receive queue 410 for packets. Similarly, forwarding core 408 may poll 424 receive queue 412 for packets. Receive queues 410 and 412 may be receive queues coupled to or part of NIC 414. For example, traffic received by NIC 414 may be sent to receive queue 410 and/or receive queue 412.
For example, in the example that receive queue 410 has traffic for forwarding, forwarding core 406 may send a poll status callback 426 to DPDK power library 404, which may indicate N packets are to be forwarded. In the example that receive queue 412 does not have traffic for forwarding, forwarding core 408 may send a poll status callback 428 which may indicate that 0 packers are to be forwarded. In other words, there is an empty poll.
DPDK power library 404 may send poll statistics 430 to CNR power manager 402. In some examples, poll statistics 430 may indicate that there are N packets on receive queue 410 and 0 packets on receive queue 412. CNR power manager 402 may send a power command 432. Power command 432 may be indicative of an instruction to set a power savings scheme for forwarding core 408. For example, such a power savings scheme may include pausing forwarding core 408 or lowering a frequency of forwarding core 408. DPDK power library 404 may implement the power savings scheme 434 such as pausing forwarding core for a time “T”.
Each of NF 512A-NF 512C may include a network interface for interfacing with a power controller or manager (not shown in
Workload metrics collector 614 may collect metrics from each of nodes 602A-602C regarding workloads operating on such nodes. For example, workload metrics collector 614 may collect metrics regarding which workloads are operating at which times. For example, workload metrics collector 614 may collect metrics indicating that S11, S12, S21, and S33 are operating on node 602A. Workload metrics collector 614 may collect similar metrics from nodes 602B and 602C regarding workloads operating on nodes 602B and 602C.
Network function metrics collector 616 may collect metrics regarding NFs 612A-612C. For example, network function metrics collector 616 may collect traffic metrics (e.g., speed, latency, etc.), poll statistics of poll mode drivers (not shown), and/or the like. Workload metrics collector 614 and network function metrics collector 616 may store collected metrics in database 618.
Cluster power manager 620 may include workload criticality calculator 622, network traffic criticality calculator 624, and power mode recommender 626. Workload criticality calculator 622 may determine criticality of the workloads operating on nodes 602A-602C (e.g., S11, S12, S21, S22, and S31-S35). Network traffic criticality calculator 624 may determine the criticality of traffic of a node of the 3 node cluster. Representative examples of how workload criticality calculator 622 may determine criticality of the workloads and how network traffic criticality calculator 624 may determine the criticality of traffic are set forth below.
Power mode recommender 626 may recommend power savings measures which may be implemented in the 3 node cluster based on the workload criticality determined by workload criticality calculator 622 and the network traffic criticality calculator 624. Node network function power mode resource 630 may poll for power mode information from NFs 612A-612C and may selectively apply power savings measures to any of the processors of NFs 612A-612C.
Referring back to
In some examples, power saving actions may be taken, based on the predicted traffic and service workload criticality levels, for a certain duration. The criticality levels may help in determining the CPU frequency and CPU C-states to apply to save power. A CPU C-state is a core power state that defines a degree to which a processor may be idle or functions of the processor may be put to sleep. The criticality of traffic and service workloads may be predicted for a user configured duration, via machine learning techniques, using traffic and service workload telemetry data. For example, a machine learning pipeline may predict criticality of traffic of cluster node-1 as 3 on a scale of 1 to 10 for the next duration window of 1 hour.
For example, network traffic criticality calculator 624 may analyze the collected metrics in database 618 and predict the service workload and traffic criticality for each of a plurality of workloads. Network traffic criticality calculator 624 may learn that a service workload, for example, workload S3 will be more critical under one or more certain conditions, such as when the service workload has a higher number of standby replicas in the next duration ‘t’ to maintain high availability and/or is invoked by high number of other services in next duration ‘t.’ Workload criticality may be determined as:
Workload Criticality (t)=Linear_Function (WAF, WDF)
Where: WAF is a workload availability factor for the next t time (e.g., hours, days, minutes, etc.) and WDF is a workload dependency factor for the next t time (e.g., hours, days, minutes, etc.).
For example, for each workload in the cluster (e.g., workloads S11, S12, S21, S22, and S31-S35, cluster power manager 620 may determine a workload dependency factor (WDF) and a workload availability factor (WAF). In some examples, cluster power manager 620 may determine the WDF for a workload as WDF=Number of workloads depending on the workload/Total number of workloads in the cluster. In some examples, cluster power manager 620 may determine the WAF for a workload as WAF=Number of workloads depending on the workload/Average number of workloads depending on any workload.
In some examples, the WAF and WDF are based on predictions for a next time period t, rather than current dependency and availability. For example, workload criticality calculator 622 may execute one or more machine learning models to predict a number of workloads depending on the workload, a total number of workloads in the cluster, a number of workloads depending on the workload, and/or an average number of workloads depending on any workload. In such examples, workload criticality calculator 622 may determine WAF as WAF=predicted number of workloads depending on the workload/predicted total number of workloads in the cluster. In such examples, workload criticality calculator 622 may determine WDF as WDF=predicted number of workloads depending on the workload/predicted average number of workloads depending on any workload.
When a workload is critical, generally a standby copy or replica is kept to meet 99.99% availability. Not all workloads will have standby replicas.
Workload criticality calculator 622 may calculate the criticality of each workload. For example, workload criticality calculator 622 may determine workload criticality (WC) of a particular workload as WC=dWC+B1*WDF+B2*WAF
where: dWC is a default workload criticality, B1 is a dependency factor coefficient, and B2 is an availability factor coefficient (constant, usually configured value).
B1 and B2 may be constants and may be user configurable. In some examples, B1 and B2 are configured to a value between 0 and 1, inclusive.
For example, if B1 is configured as 0.5 and B2 is configured as 0.2, this means that: a 2 unit increase in WDF will increment the workload's criticality by 1 unit and a 5 unit increase in WAF will increment the workload's criticality by 1 unit.
Network traffic criticality calculator 624 may learn that traffic passing through a router instance will be more critical for the configured duration of t time, such as when an average criticality of service workloads participating in the traffic is high, an average traffic speed is high, an average traffic latency is low, and/or when an average empty poll count is low.
Node traffic criticality may be determined as:
Where: Cw is an average workload criticality for next duration t, Ts is a predicted traffic speed/bandwidth requirement for next duration t, Tl is a predicted traffic latency requirement for next duration t, and Ec is an average empty poll count.
Network traffic criticality calculator 624 may determine the traffic criticality of a node. The traffic criticality of the network function of a node may be determined primarily, by factoring the criticality of all workloads running on the node. For example, network traffic criticality calculator 624 may employ an aggregation like average to find the workload criticality of all workloads running on a particular node. For example, in every geographic zone, there may be a node which will have stand-by replicas scheduled to provide better availability for a workload.
When more critical workloads are scheduled to run on a particular node, the network function of that node typically carries more critical data. In such a case, any power optimization techniques applied to processors of that network function should be milder than may be applied to processors of network functions handling less critical traffic.
Network traffic criticality calculator 624 may also consider forecasted bandwidth and latency of the traffic passing through the particular network function. For example, network traffic criticality calculator 624 may determine node traffic criticality (NTC) as
Where Default TC is a default traffic criticality value and B1-B3 are coefficient constants that may be user configurable.
As discussed above, network function metrics collector 616 may collect traffic bandwidth and latency metrics of each of NFs 612A-612C in a time series manner. Network traffic criticality calculator may include one more machine learning models which may be configured to predict or forecast traffic bandwidth and/or traffic latency. The forecasted traffic bandwidth and the forecasted traffic latency may be used to determine the node traffic criticality. In some examples, the one or more machine learning models may be trained using historical traffic bandwidth and/or traffic latency data.
Network traffic criticality calculator 624 may send the predicted criticality values to a power mode recommender 626. Power mode recommender 626 may determine any processor optimization techniques to recommend be used for any given processor of NFs 612A-612C. Such power optimization techniques may have different optimization levels. For example, one power optimization technique may be to lower the processor core frequency range. With this technique, the lower the frequency is made, the lower the power consumption by the processor core. Another power optimization technique is to pause a processor core. A processor core may be paused for a specific duration. The higher the pause duration is, the lower the power consumption by the processor core. Another power optimization technique is to utilize processor core idle or sleep states, such as states CO-Cn. In general, the higher the state number in which the processor core is place, the lower the power consumption by the processor core.
In some examples, power mode recommender 626 may look up, e.g., in memory, a user configured map or table of traffic criticality to power optimization levels. For example, a user may configure a traffic criticality to optimization level mapping. In some examples, the mapping may depend on a processor type and/or vendor support. In other words, in some examples, different power optimization techniques (or levels) may be used for different processor types, even if the different processor types have a same associated traffic criticality. In some examples, power mode recommender 626 may analyze the node configurations of nodes 602A-602C and capabilities of the cluster and determine the power saving actions to be taken. For example, based on the criticality values of traffic and service workloads, power mode recommender 626 may suggest a change to a processor core frequency, a processor pause duration, and/or a change in a processor power savings state, such as a “C-state” for any one or more of the processors of NFs 612A-612C.
For example, for a network function having higher TC (Traffic Criticality), power mode recommender 626 may recommend a relatively higher frequency, a relatively lower pause duration, and/or a relatively lower order CPU core inactivity state. For the network function having lower TC (Traffic Criticality), power mode recommender 626 may a relatively lower frequency, a relatively higher pause duration, and/or a relatively higher order CPU core inactivity state.
Power mode recommender 626 may recommend an optimization level for a network function and node network function power mode resource (NNFPMR) 630 may update the network function's power optimization level to implement the recommendation.
In some examples, during low activity durations, workloads may be scaled down and/or availability replicas may be reduced. In low activity durations, cluster power manager 620 may perform a periodic calculation of workload criticality and traffic criticality to ensure that processor cores of low critical network functions start saving processor cycles and therefore, power. The periodic calculation of workload criticality and traffic criticality, and the updating of power optimization techniques, may be automatic and may not require any administrative work or static policy.
Network controller 24 may start cluster power manager (802). For example, network controller 24 may start cluster power manager 620. Network controller 24 may obtain a workload dependency table (804). For example, an administrator may enter a workload dependency table which may be indicative of any dependencies any workload may have.
Network controller 24 may determine workload dependency factors (806). For example, network controller 24 may determine workload dependency factors based on the workload dependency table. Network controller 24 may obtain workload replica and schedule metrics (808). For example, network controller 24 may obtain workload replica and schedule metrics from database 618, such as a number of workload replicas that exist at what times, when replicas are created, when replicas are destroyed, and/or the like. In some examples, an administrator may enter such information.
Network controller 24 may determine workload availability factors (810). Network controller 24 may determine workload criticalities (812).
Network controller 24 may obtain traffic metrics of node network functions (814). For example, network controller 24 may obtain traffic metrics from database 618, such as traffic speed, throughput, latency, and/or the like. In some examples, an administrator may enter such information.
Network controller 24 may forecast traffic bandwidth and latency (816). For example, network controller 24 may execute a machine learning model to forecast traffic bandwidth and latency. Network controller 24 may obtain average workload criticalities for each node (818).
Network controller 24 may determine traffic criticalities for each node network functions (820). Network controller 24 may obtain traffic criticality to power mode map (822). For example, network controller 24 may load a user generated traffic criticality to power optimization map from memory (e.g., database 618).
Network controller 24 may find a power mode for a network function (824). For example, network controller 24 may determine a power mode based on the traffic criticality via looking up the traffic criticality in the traffic criticality to power mode map. Network controller 24 may apply the power mode to the network function (826). For example, network controller 24 may implement the power mode in one or more processor cores of a network function.
Network controller 24 may obtain workload metrics from a plurality of nodes of a cluster (902). For example, network controller 24 may obtain workload metrics from nodes 602A-602C. Workload metrics may include at least one of a workload dependency count (e.g., how many workloads depend on a given workload) or a workload service availability configuration (e.g., how many stand-by replicas for a workload are on a node, how many replicas for a workload are on a node, etc.).
Network controller 24 may obtain network function metrics from the plurality of nodes of the cluster (904). For example, network controller 24 may obtain network function metrics from nodes 602A-602C. Network function metrics may include traffic speed of the node, traffic bandwidth of the node, traffic latency of the node, or poll statistics of the node.
For each node of the plurality of nodes (905) (network controller may complete the following steps for every node of the cluster), network controller 24 may execute at least one machine learning model to predict a measure of criticality of traffic of the node (906). For example, network controller 24 may execute a machine learning model trained to predict criticality of traffic of the node to predict the measure of criticality of the traffic of the node.
For each node of the plurality of nodes, network controller 24 may determine, based on the measure of criticality of traffic of the node, a power mode for at least one processing core of the node (908). For example, network controller 24 may determine a power mode, based on the measure of criticality of traffic of the node, to be applied to at least one processing core of the node.
Network controller 24 may recommend or apply the power mode to the at least one processing core of the node (910). For example, network controller 24 may recommend the power mode to another device or a user to determine whether to apply the power mode or to apply the power mode to the at least one processing core. In another example, network controller 24 may apply the power mode to the at least one processing core.
In some examples, network controller 24 may automatically and periodically perform the techniques of this disclosure. For example, network controller 24 may automatically, on a periodic basis, and for each node of the plurality of nodes: execute the at least one machine learning model to predict an updated measure of criticality of traffic of the node; determine, based on the updated measure of criticality of traffic of the node, an updated power mode for the at least one processing core of the node; and recommend or apply the updated power mode to the at least one processing core of the node.
In some examples, applying the power mode includes at least one of pausing operation of the at least one processor core, changing a frequency of the at least one processor core, or moving the at least one processor core into a different processor power state. In some examples, the power mode is limited to a predetermined amount of time. For example, the frequency of a processor core is changed for a time t, a processor core is paused for a time t, or a processor core is moved into a different processor power state for a time t. In some examples, after the time t, operation of the processor core may revert to a state prior to the application of the power mode.
In some examples, the measure of criticality of traffic of the node is based on at least one of: a determined workload criticality of one or more workloads of the node, a predicted bandwidth of the node, or a predicted traffic latency of the node. In some examples, the determined workload criticality is based at least in part on a workload availability factor and a workload dependency factor. In some examples, network controller 24 may determine the workload criticality. In some examples, network controller 24 may determine the workload availability factor and the workload dependency factor by executing the at least one machine learning model. In some examples, network controller 24 may determine the workload availability factor and the workload dependency factor based on the workload metrics. For example, rather than determining the workload availability factor and the workload dependency factor based on predicted workload metrics, network controller 24 may determine the workload availability factor and the workload dependency factor based on current workload metrics.
In some examples, to determine the power mode, network controller 24 may determine a power mode based on a mapping of the measure of criticality of traffic of the node to the power mode.
The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. Various features described as modules, units or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices or other hardware devices. In some cases, various features of electronic circuitry may be implemented as one or more integrated circuit devices, such as an integrated circuit chip or chipset.
If implemented in hardware, this disclosure may be directed to an apparatus such as a processor or an integrated circuit device, such as an integrated circuit chip or chipset. Alternatively, or additionally, if implemented in software or firmware, the techniques may be realized at least in part by a computer-readable data storage medium comprising instructions that, when executed, cause a processor to perform one or more of the methods described above. For example, the computer-readable data storage medium may store such instructions for execution by a processor.
A computer-readable medium may form part of a computer program product, which may include packaging materials. A computer-readable medium may comprise a computer data storage medium such as random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), Flash memory, magnetic or optical data storage media, and the like. In some examples, an article of manufacture may comprise one or more computer-readable storage media.
In some examples, the computer-readable storage media may comprise non-transitory media. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in RAM or cache).
The code or instructions may be software and/or firmware executed by processing circuitry including one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, functionality described in this disclosure may be provided within software modules or hardware modules.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202341060929 | Sep 2023 | IN | national |