Embodiments described herein generally relate to data processing, memory usage, network communication, and communication system implementations of distributed computing, including the implementations with the use of networked processing units such as infrastructure processing units (IPUs) or data processing units (DPUs).
System architectures are moving to highly distributed multi-edge and multi-tenant deployments. Deployments may have different limitations in terms of power and space. Deployments also may use different types of compute, acceleration, and storage technologies in order to overcome these power and space limitations. Deployments also are typically interconnected in tiered and/or peer-to-peer fashion, in an attempt to create a network of connected devices and edge appliances that work together.
Edge computing, at a general level, has been described as systems that provide the transition of compute and storage resources closer to endpoint devices at the edge of a network (e.g., consumer computing devices, user equipment, etc.). As compute and storage resources are moved closer to endpoint devices, a variety of advantages have been promised such as reduced application latency, improved service capabilities, improved compliance with security or data privacy requirements, improved backhaul bandwidth, improved energy consumption, and reduced cost. However, many deployments of edge computing technologies—especially complex deployments for use by multiple tenants—have not been fully adopted.
In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. Some embodiments are illustrated by way of example, and not limitation, in the figures of the accompanying drawings in which:
Various approaches for memory pooling in an edge computing setting are discussed herein. Existing approaches for memory pooling are not able to dynamically and effectively allocate (and re-allocate) resources and memory regions in edge computing settings. The following applies the concept of interleaving to allow the distributed storage and retrieval of data among disaggregated memory locations in an edge layer. The following also applies estimation and prediction techniques to ensure that memory requests in the edge layer can be properly fulfilled from the distributed memory pool.
These approaches introduce a number of interfaces and logic to enable memory resources to be discovered and allocated into a pool in a highly distributed/disaggregated environment. The interfaces and logic, in various examples, perform estimation and prediction based on telemetry and network conditions. Telemetry can be continuously collected and analyzed to obtain accurate information on network conditions at any given time. Further, the memory resources can connect to accelerator or other compute capabilities, using compute express link (CXL) and other high-speed interconnect technologies. This provides a particular benefit for settings such as when operating base stations that use hot-pluggable accelerators or compute equipment.
The disclosed mechanisms provide a simplified abstraction for memory pooling, including determining how and when to use interleaving and how to improve the overall latency of memory storage and retrieval. As a result, memory operations can be parallelized and significantly sped up. Further, the following approaches are adaptable to a variety of use cases, including the use of “tiers” to service memory requests and workloads associated with a particular service requirement. As will be understood, in an edge computing setting, different workloads will have different requirements, in terms of latency, with or without affecting the service level agreements (SLAs). Here, by organizing memory scheduling and pooling, orchestration and satisfying service agreement requirements can be more effectively accomplished.
In various examples, the logic that is used to configure the memory pooling and interleaving is managed by a network switch or other network-based component. For instance, a network switch can evaluate telemetry information to ensure that data across memory resources can be accessed with less latency—even during variations in use cases. The network switch can select different memory elements (and portions of different memory elements) in a pool, based on a memory bandwidth required, or by the use of tiers (e.g., high/medium/low). The network switch may also select and reserve fabric resources (e.g., memory bandwidth) for connecting to servers, and directly perform reading/writing to memory ranges.
Accordingly, the following describes coordinated, intelligent components to configure a right combination of memory and compute resources for servicing client workloads and increasing speed. While many of the techniques may be implemented by a switch, orchestrator, or controller, the techniques are also suited for use by networked processing units such as infrastructure processing units (IPUs, such as respective IPUs operating as a memory owner and remote memory consumer).
Additional implementation details of the memory pool interleaving techniques in an edge computing network, effected via a network switch or IPUs are provided among provided in
Distributed Edge Computing and Networked Processing Units
The edge cloud 110 is generally defined as involving compute that is located closer to endpoints 160 (e.g., consumer and producer data sources) than the cloud 130, such as autonomous vehicles 161, user equipment 162, business and industrial equipment 163, video capture devices 164, drones 165, smart cities and building devices 166, sensors and IoT devices 167, etc. Compute, memory, network, and storage resources that are offered at the entities in the edge cloud 110 can provide ultra-low or improved latency response times for services and functions used by the endpoint data sources as well as reduce network backhaul traffic from the edge cloud 110 toward cloud 130 thus improving energy consumption and overall network usages among other benefits.
Compute, memory, and storage are scarce resources, and generally decrease depending on the edge location (e.g., fewer processing resources being available at consumer end point devices than at a base station or a central office data center). As a general design principle, edge computing attempts to minimize the number of resources needed for network services, through the distribution of more resources that are located closer both geographically and in terms of in-network access time.
Further in the network, a network edge tier 230 operates servers including form factors optimized for extreme conditions (e.g., outdoors). A data center edge tier 240 operates additional types of edge nodes such as servers, and includes increasingly powerful or capable hardware and storage technologies. Still further in the network, a core data center tier 250 and a public cloud tier 260 operate compute equipment with the highest power consumption and largest configuration of processors, acceleration, storage/memory devices, and highest throughput network.
In each of these tiers, various forms of Intel® processor lines are depicted for purposes of illustration; it will be understood that other brands and manufacturers of hardware will be used in real-world deployments. Additionally, it will be understood that additional features or functions may exist among multiple tiers. One such example is connectivity and infrastructure management that enable a distributed IPU architecture, that can potentially extend across all of tiers 210, 220, 230, 240, 250, 260. Other relevant functions that may extend across multiple tiers may relate to security features, domain or group functions, and the like.
With these variations and service features in mind, edge computing within the edge cloud 110 may provide the ability to serve and respond to multiple applications of the use cases in real-time or near real-time and meet ultra-low latency requirements. As systems have become highly-distributed, networking has become one of the fundamental pieces of the architecture that allow achieving scale with resiliency, security, and reliability. Networking technologies have evolved to provide more capabilities beyond pure network routing capabilities, including to coordinate quality of service, security, multi-tenancy, and the like. This has also been accelerated by the development of new smart network adapter cards and other type of network derivatives that incorporated capabilities such as ASICs (application-specific integrated circuits) or FPGAs (field programmable gate arrays) to accelerate some of those functionalities (e.g., remote attestation).
In these contexts, networked processing units have begun to be deployed at network cards (e.g., smart NICs), gateways, and the like, which allow direct processing of network workloads and operations. One example of a networked processing unit is an infrastructure processing unit (IPU), which is a programmable network device that can be extended to provide compute capabilities with far richer functionalities beyond pure networking functions. Another example of a network processing unit is a data processing unit (DPU), which offers programmable hardware for performing infrastructure and network processing operations. The following discussion refers to functionality applicable to an IPU configuration, such as that provided by an Intel® line of IPU processors. However, it will be understood that functionality will be equally applicable to DPUs and other types of networked processing units provided by ARM®, Nvidia®, and other hardware OEMs.
The main compute platform 420 is composed by typical elements that are included with a computing node, such as one or more CPUs 424 that may or may not be connected via a coherent domain (e.g., via Ultra Path Interconnect (UPI) or another processor interconnect); one or more memory units 425; one or more additional discrete devices 426 such as storage devices, discrete acceleration cards (e.g., a field-programmable gate array (FPGA), a visual processing unit (VPU), etc.); a baseboard management controller 421; and the like. The compute platform 420 may operate one or more containers 422 (e.g., with one or more microservices), within a container runtime 423 (e.g., Docker containerd). The IPU 410 operates as a networking interface and is connected to the compute platform 420 using an interconnect (e.g., using either PCIe or CXL). The IPU 410, in this context, can be observed as another small compute device that has its own: (1) Processing cores (e.g., provided by low-power cores 417), (2) operating system (OS) and cloud native platform 414 to operate one or more containers 415 and a container runtime 416; (3) Acceleration functions provided by an ASIC 411 or FPGA 412; (4) Memory 418; (5) Network functions provided by network circuitry 413; etc.
From a system design perspective, this arrangement provides important functionality. The IPU 410 is seen as a discrete device from the local host (e.g., the OS running in the compute platform CPUs 424) that is available to provide certain functionalities (networking, acceleration etc.). Those functionalities are typically provided via Physical or Virtual PCIe functions. Additionally, the IPU 410 is seen as a host (with its own IP etc.) that can be accessed by the infrastructure to setup an OS, run services, and the like. The IPU 410 sees all the traffic going to the compute platform 420 and can perform actions—such as intercepting the data or performing some transformation—as long as the correct security credentials are hosted to decrypt the traffic. Traffic going through the IPU goes to all the layers of the Open Systems Interconnection model (OSI model) stack (e.g., from physical to application layer). Depending on the features that the IPU has, processing may be performed at the transport layer only. However, if the IPU has capabilities to perform traffic intercept, then the IPU also may be able to intercept traffic at the traffic layer (e.g., intercept CDN traffic and process it locally).
Some of the use cases being proposed for IPUs and similar networked processing units include: to accelerate network processing; to manage hosts (e.g., in a data center); or to implement quality of service policies. However, most of functionalities today are focused at using the IPU at the local appliance level and within a single system. These approaches do not address how the IPUs could work together in a distributed fashion or how system functionalities can be divided among the IPUs on other parts of the system. Accordingly, the following introduces enhanced approaches for enabling and controlling distributed functionality among multiple networked processing units. This enables the extension of current IPU functionalities to work as a distributed set of IPUs that can work together to achieve stronger features such as, resiliency, reliability, etc.
Distributed Architectures of IPUs
With the first deployment model, the IPU 514 directly receives data from use cases 502A. The IPU 514 operates one or more containers with microservices to perform processing of the data. As an example, a small gateway (e.g., a NUC type of appliance) may connect multiple cameras to an edge system that is managed or connected by the IPU 514. The IPU 514 may process data as a small aggregator of sensors that runs on the far edge, or may perform some level of inline or preprocessing and that sends payload to be further processed by the IPU or the system that the IPU connects.
With the second deployment model, the intermediate processing device 512 provided by the gateway or NUC receives data from use cases 502B. The intermediate processing device 512 includes various processing elements (e.g., CPU cores, GPUs), and may operate one or more microservices for servicing workloads from the use cases 502B. However, the intermediate processing device 512 invokes the IPU 514 to complete processing of the data.
In either the first or the second deployment model, the IPU 514 may connect with a local compute platform, such as that provided by a CPU 516 (e.g., Intel® Xeon CPU) operating multiple microservices. The IPU may also connect with a remote compute platform, such as that provided at a data center by CPU 540 at a remote server. As an example, consider a microservice that performs some analytical processing (e.g., face detection on image data), where the CPU 516 and the CPU 540 provide access to this same microservice. The IPU 514, depending on the current load of the CPU 516 and the CPU 540, may decide to forward the images or payload to one of the two CPUs. Data forwarding or processing can also depend on other factors such as SLA for latency or performance metrics (e.g., perf/watt) in the two systems. As a result, the distributed IPU architecture may accomplish features of load balancing.
The IPU in the computing environment 510 may be coordinated with other network-connected IPUs. In an example, a Service and Infrastructure orchestration manager 530 may use multiple IPUs as a mechanism to implement advanced service processing schemes for the user stacks. This may also enable implementing of system functionalities such as failover, load balancing etc.
In a distributed architecture example, IPUs can be arranged in the following non-limiting configurations. As a first configuration, a particular IPU (e.g., IPU 514) can work with other IPUs (e.g., IPU 520) to implement failover mechanisms. For example, an IPU can be configured to forward traffic to service replicas that runs on other systems when a local host does not respond.
As a second configuration, a particular IPU (e.g., IPU 514) can work with other IPUs (e.g., IPU 520) to perform load balancing across other systems. For example, consider a scenario where CDN traffic targeted to the local host is forwarded to another host in case that I/O or compute in the local host is scarce at a given moment.
As a third configuration, a particular IPU (e.g., IPU 514) can work as a power management entity to implement advanced system policies. For example, consider a scenario where the whole system (e.g., including CPU 516) is placed in a C6 state (a low-power/power-down state available to a processor) while forwarding traffic to other systems (e.g., IPU 520) and consolidating it.
As will be understood, fully coordinating a distributed IPU architecture requires numerous aspects of coordination and orchestration. The following examples of system architecture deployments provide discussion of how edge computing systems may be adapted to include coordinated IPUs, and how such deployments can be orchestrated to use IPUs at multiple locations to expand to the new envisioned functionality.
Distributed IPU Functionality
An arrangement of distributed IPUs offers a set of new functionalities to enable IPUs to be service focused.
In the block diagram of
Peer Discovery. In an example, each IPU is provided with Peer Discovery logic to discover other IPUs in the distributed system that can work together with it. Peer Discovery logic may use mechanisms such as broadcasting to discover other IPUs that are available on a network. The Peer Discovery logic is also responsible to work with the Peer Attestation and Authentication logic to validate and authenticate the peer IPU's identity, determine whether they are trustworthy, and whether the current system tenant allows the current IPU to work with them. To accomplish this, an IPU may perform operations such as: retrieve a proof of identity and proof of attestation; connect to a trusted service running in a trusted server; or, validate that the discovered system is trustworthy. Various technologies (including hardware components or standardized software implementations) that enable attestation, authentication, and security may be used with such operations.
Peer Attestation. In an example, each IPU provides interfaces to other IPUs to enable attestation of the IPU itself. IPU Attestation logic is used to perform an attestation flow within a local IPU in order to create the proof of identity that will be shared with other IPUs. Attestation here may integrate previous approaches and technologies to attest a compute platform. This may also involve the use of trusted attestation service 640 to perform the attestation operations.
Functionality Discovery. In an example, a particular IPU includes capabilities to discover the functionalities that peer IPUs provide. Once the authentication is done, the IPU can determine what functionalities that the peer IPUs provide (using the IPU Peer Discovery Logic) and store a record of such functionality locally. Examples of properties to discover can include: (i) Type of IPU and functionalities provided and associated KPIs (e.g. performance/watt, cost etc.); (ii) Available functionalities as well as possible functionalities to execute under secure enclaves (e.g., enclaves provided by Intel® SGX or TDX technologies); (iii) Current services that are running on the IPU and on the system that can potentially accept requests forwarded from this IPU; or (iv) Other interfaces or hooks that are provided by an IPU, such as: Access to remote storage; Access to a remote VPU; Access to certain functions. In a specific example, service may be described by properties such as: UUID; Estimated performance KPIs in the host or IPU; Average performance provided by the system during the N units of time (or any other type of indicator); and like properties.
Service Management. The IPU includes functionality to manage services that are running either on the host compute platform or in the IPU itself. Managing (orchestration) services includes performance service and resource orchestration for the services that can run on the IPU or that the IPU can affect. Two type of usage models are envisioned:
External Orchestration Coordination. The IPU may enable external orchestrators to deploy services on the IPU compute capabilities. To do so, an IPU includes a component similar to K8 compatible APIs to manage the containers (services) that run on the IPU itself. For example, the IPU may run a service that is just providing content to storage connected to the platform. In this case, the orchestration entity running in the IPU may manage the services running in the IPU as it happens in other systems (e.g. keeping the service level objectives).
Further, external orchestrators can be allowed to register to the IPU that services are running on the host may require to broker requests, implement failover mechanisms and other functionalities. For example, an external orchestrator may register that a particular service running on the local compute platform is replicated in another edge node managed by another IPU where requests can be forwarded.
In this later use case external orchestrators may provide to the Service/Application Intercept logic the inputs that are needed to intercept traffic for these services (as typically is encrypted). This may include properties such as a source and destination traffic of the traffic to be intercepted, or the key to use to decrypt the traffic. Likewise, this may be needed to terminate TLS to understand the requests that arrive to the IPU and that the other logics may need to parse to take actions. For example, if there is a CDN read request the IPU may need to decrypt the packet to understand that network packet includes a read request and may redirect it to another host based on the content that is being intercepted. Examples of Service/Application Intercept information is depicted in table 620 in
External Orchestration Implementation. External orchestration can be implemented in multiple topologies. One supported topology includes having the orchestrator managing all the IPUs running on the backend public or private cloud. Another supported topology includes having the orchestrator managing all the IPUs running in a centralized edge appliance. Still another supported topology includes having the orchestrator running in another IPU that is working as the controller or having the orchestrator running distributed in multiple other IPUs that are working as controllers (master/primary node), or in a hierarchical arrangement.
Functionality for Broker requests. The IPU may include Service Request Brokering logic and Load Balancing logic to perform brokering actions on arrival for requests of target services running in the local system. For instance, the IPU may decide to see if those requests can be executed by other peer systems (e.g., accessible through Service and Infrastructure Orchestration 630). This can be caused, for example, because load in the local systems is high. The local IPU may negotiate with other peer IPUs for the possibility to forward the request. Negotiation may involve metrics such as cost. Based on such negotiation metrics, the IPU may decide to forward the request.
Functionality for Load Balancing requests. The Service Request Brokering and Load Balancing logic may distribute requests arriving to the local IPU to other peer IPUs. In this case, the other IPUs and the local IPU work together and do not necessarily need brokering. Such logic acts similar to a cloud native sidecar proxy. For instance, requests arriving to the system may be sent to the service X running in the local system (either IPU or compute platform) or forwarded to a peer IPU that has another instance of service X running. The load balancing distribution can be based on existing algorithms such as based on the systems that have lower load, using round robin, etc.
Functionality for failover, resiliency and reliability. The IPU includes Reliability and Failover logic to monitor the status of the services running on the compute platform or the status of the compute platform itself. The Reliability and Failover logic may require the Load Balancing logic to transiently or permanently forward requests that aim specific services in situations such as where: i) The compute platform is not responding; ii) The service running inside the compute node is not responding; and iii) The compute platform load prevents the targeted service to provide the right level of service level objectives (SLOs). Note that the logic must know the required SLOs for the services. Such functionality may be coordinated with service information 650 including SLO information.
Functionality for executing parts of the workloads. Use cases such as video analytics tend to be decomposed in different microservices that conform a pipeline of actions that can be used together. The IPU may include a workload pipeline execution logic that understands how workloads are composed and manage their execution. Workloads can be defined as a graph that connects different microservices. The load balancing and brokering logic may be able to understand those graphs and decide what parts of the pipeline are executed where. Further, to perform these and other operations, Intercept logic will also decode what requests are included as part of the requests.
Resource Management
A distributed network processing configuration may enable IPUs to perform important role for managing resources of edge appliances. As further shown in
As a first example, an IPU can provide management or access to external resources that are hosted in other locations and expose them as local resources using constructs such as Compute Express Link (CXL). For example, the IPU could potentially provide access to a remote accelerator that is hosted in a remote system via CXL.mem/cache and IO. Another example includes providing access to remote storage device hosted in another system. In this later case the local IPU could work with another IPU in the storage system and expose the remote system as PCIE VF/PF (virtual functions/physical functions) to the local host.
As a second example, an IPU can provide access to IPU-specific resources. Those IPU resource may be physical (such as storage or memory) or virtual (such as a service that provides access to random number generation).
As a third example, an IPU can manage local resources that are hosted in the system where it belongs. For example, the IPU can manage power of the local compute platform.
As a fourth example, an IPU can provide access to other type of elements that relate to resources (such as telemetry or other types of data). In particular, telemetry provides useful data for something that is needed to decide where to execute things or to identify problems.
I/O Management. Because the IPU is acting as a connection proxy between the external peers (compute systems, remote storage etc.) resources and the local compute, the IPU can also include functionality to manage I/O from the system perspective.
Host Virtualization and XPU Pooling. The IPU includes Host Virtualization and XPU Pooling logic responsible to manage the access to resources that are outside the system domain (or within the IPU) and that can be offered to the local compute system. Here, “XPU” refers to any type of a processing unit, whether CPU, GPU, VPU, an acceleration processing unit, etc. The IPU logic, after discovery and attestation, can agree with other systems to share external resources with the services running in the local system. IPUs may advertise to other peers available resources or can be discovered during discovery phase as introduced earlier. IPUs may request to other IPUS to those resources. For example, an IPU on system A may request access to storage on system B manage by another IPU. Remote and local IPUs can work together to establish a connection between the target resources and the local system.
Once the connection and resource mapping is completed, resources can be exposed to the services running in the local compute node using the VF/PF PCIE and CXL Logic. Each of those resources can be offered as VF/PF. The IPU logic can expose to the local host resources that are hosted in the IPU. Examples of resources to expose may include local accelerators, access to services, and the like.
Power Management. Power management is one of the key features to achieve favorable system operational expenditures (OPEXs). IPU is very well positioned to optimize power consumption that the local system is consuming. The Distributed and local power management unit: Is responsible to meter the power that the system is consuming, the load that the system is receiving and track the service level agreements that the various services running in the system are achieving for the arriving requests. Likewise, when power efficiencies (e.g., power usage effectiveness (PUE)) are not achieving certain thresholds or the local compute demand is low, the IPU may decide to forward the requests to local services to other IPUs that host replicas of the services. Such power management features may also coordinate with the Brokering and Load Balancing logic discussed above. As will be understood, IPUs can work together to decide where requests can be consolidated to establish higher power efficiency as system. When traffic is redirected, the local power consumption can be reduced in different ways. Example operations that can be performed include: changing the system to C6 State; changing the base frequencies; performing other adaptations of the system or system components.
Telemetry Metrics. The IPU can generate multiple types of metrics that can be interesting from services, orchestration or tenants owning the system. In various examples, telemetry can be accessed, including: (i) Out of band via side interfaces; (ii) In band by services running in the IPU; or (iii) Out of band using PCIE or CXL from the host perspective. Relevant types of telemetries can include: Platform telemetry; Service Telemetry; IPU telemetry; Traffic telemetry; and the like.
System Configurations for Distributed Processing
Further to the examples noted above, the following configurations may be used for processing with distributed IPUs:
1) Local IPUs connected to a compute platform by an interconnect (e.g., as shown in the configuration of
2) Shared IPUs hosted within a rack/physical network—such as in a virtual slice or multi-tenant implementation of IPUs connected via CXL/PCI-E (local), or extension via Ethernet/Fiber for nodes within a cluster;
3) Remote IPUs accessed via an IP Network, such as within certain latency for data plane offload/storage offloads (or, connected for management/control plane operations); or
4) Distributed IPUs providing an interconnected network of IPUs, including as many as hundreds of nodes within a domain.
Configurations of distributed IPUs working together may also include fragmented distributed IPUs, where each IPU or pooled system provides part of the functionalities, and each IPU becomes a malleable system. Configurations of distributed IPUs may also include virtualized IPUs, such as provided by a gateway, switch, or an inline component (e.g., inline between the service acting as IPU), and in some examples, in scenarios where the system has no IPU.
Other deployment models for IPUs may include IPU-to-IPU in the same tier or a close tier; IPU-to-IPU in the cloud (data to compute versus compute to data); integration in small device form factors (e.g., gateway IPUs); gateway/NUC+IPU which connects to a data center; multiple GW/NUC (e.g. 16) which connect to one IPU (e.g. switch); gateway/NUC+IPU on the server; and GW/NUC and IPU that are connected to a server with an IPU.
The preceding distributed IPU functionality may be implemented among a variety of types of computing architectures, including one or more gateway nodes, one or more aggregation nodes, or edge or core data centers distributed across layers of the network (e.g., in the arrangements depicted in
The network processing unit 752 may provide a networked specialized processing unit such as an IPU, DPU, network processing unit (NPU), or other “xPU” outside of the central processing unit (CPU). The processing unit may be embodied as a standalone circuit or circuit package, integrated within an SoC, integrated with networking circuitry (e.g., in a SmartNIC), or integrated with acceleration circuitry, storage devices, or AI or specialized hardware, consistent with the examples above.
The compute processing unit 754 may provide a processor as a central processing unit (CPU) microprocessor, multi-core processor, multithreaded processor, an ultra-low voltage processor, an embedded processor, or other forms of a special purpose processing unit or specialized processing unit for compute operations.
Either the network processing unit 752 or the compute processing unit 754 may be a part of a system on a chip (SoC) which includes components formed into a single integrated circuit or a single package. The network processing unit 752 or the compute processing unit 754 and accompanying circuitry may be provided in a single socket form factor, multiple socket form factor, or a variety of other formats.
The processing units 752, 754 may communicate with a system memory 756 (e.g., random access memory (RAM)) over an interconnect 755 (e.g., a bus). In an example, the system memory 756 may be embodied as volatile (e.g., dynamic random access memory (DRAM), etc.) memory. Any number of memory devices may be used to provide for a given amount of system memory. A storage 758 may also couple to the processor 752 via the interconnect 755 to provide for persistent storage of information such as data, applications, operating systems, and so forth. In an example, the storage 758 may be implemented as non-volatile storage such as a solid-state disk drive (SSD).
The components may communicate over the interconnect 755. The interconnect 755 may include any number of technologies, including industry-standard architecture (ISA), extended ISA (EISA), peripheral component interconnect (PCI), peripheral component interconnect extended (PCIx), PCI express (PCIe), Compute Express Link (CXL), or any number of other technologies. The interconnect 755 may couple the processing units 752, 754 to a transceiver 766, for communications with connected edge devices 762.
The transceiver 766 may use any number of frequencies and protocols. For example, a wireless local area network (WLAN) unit may implement Wi-Fi® communications in accordance with the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standard, or a wireless wide area network (WWAN) unit may implement wireless wide area communications according to a cellular, mobile network, or other wireless wide area protocol. The wireless network transceiver 766 (or multiple transceivers) may communicate using multiple standards or radios for communications at a different range. A wireless network transceiver 766 (e.g., a radio transceiver) may be included to communicate with devices or services in the edge cloud 110 or the cloud 130 via local or wide area network protocols.
The communication circuitry (e.g., transceiver 766, network interface 768, external interface 770, etc.) may be configured to use any one or more communication technology (e.g., wired or wireless communications) and associated protocols (e.g., a cellular networking protocol such a 3GPP 4G or 5G standard, a wireless local area network protocol such as IEEE 802.11/Wi-Fi®, a wireless wide area network protocol, Ethernet, Bluetooth®, Bluetooth Low Energy, an IoT protocol such as IEEE 802.15.4 or ZigBee®, Matter®, low-power wide-area network (LPWAN) or low-power wide-area (LPWA) protocols, etc.) to effect such communication. Given the variety of types of applicable communications from the device to another component or network, applicable communications circuitry used by the device may include or be embodied by any one or more of components 766, 768, or 770. Accordingly, in various examples, applicable means for communicating (e.g., receiving, transmitting, etc.) may be embodied by such communications circuitry.
The computing device 750 may include or be coupled to acceleration circuitry 764, which may be embodied by one or more AI accelerators, a neural compute stick, neuromorphic hardware, an FPGA, an arrangement of GPUs, one or more SoCs, one or more CPUs, one or more digital signal processors, dedicated ASICs, or other forms of specialized processors or circuitry designed to accomplish one or more specialized tasks. These tasks may include AI processing (including machine learning, training, inferencing, and classification operations), visual data processing, network data processing, object detection, rule analysis, or the like. Accordingly, in various examples, applicable means for acceleration may be embodied by such acceleration circuitry.
The interconnect 755 may couple the processing units 752, 754 to a sensor hub or external interface 770 that is used to connect additional devices or subsystems. The devices may include sensors 772, such as accelerometers, level sensors, flow sensors, optical light sensors, camera sensors, temperature sensors, global navigation system (e.g., GPS) sensors, pressure sensors, pressure sensors, and the like. The hub or interface 770 further may be used to connect the edge computing node 750 to actuators 774, such as power switches, valve actuators, an audible sound generator, a visual warning device, and the like.
In some optional examples, various input/output (I/O) devices may be present within or connected to, the edge computing node 750. For example, a display or other output device 784 may be included to show information, such as sensor readings or actuator position. An input device 786, such as a touch screen or keypad may be included to accept input. An output device 784 may include any number of forms of audio or visual display, including simple visual outputs such as LEDs or more complex outputs such as display screens (e.g., LCD screens), with the output of characters, graphics, multimedia objects, and the like being generated or produced from the operation of the edge computing node 750.
A battery 776 may power the edge computing node 750, although, in examples in which the edge computing node 750 is mounted in a fixed location, it may have a power supply coupled to an electrical grid, or the battery may be used as a backup or for temporary capabilities. A battery monitor/charger 778 may be included in the edge computing node 750 to track the state of charge (SoCh) of the battery 776. The battery monitor/charger 778 may be used to monitor other parameters of the battery 776 to provide failure predictions, such as the state of health (SoH) and the state of function (SoF) of the battery 776. A power block 780, or other power supply coupled to a grid, may be coupled with the battery monitor/charger 778 to charge the battery 776.
In an example, the instructions 782 on the processing units 752, 754 (separately, or in combination with the instructions 782 of the machine-readable medium 760) may configure execution or operation of a trusted execution environment (TEE) 790. In an example, the TEE 790 operates as a protected area accessible to the processing units 752, 754 for secure execution of instructions and secure access to data. Other aspects of security hardening, hardware roots-of-trust, and trusted or protected operations may be implemented in the edge computing node 750 through the TEE 790 and the processing units 752, 754.
The computing device 750 may be a server, appliance computing devices, and/or any other type of computing device with the various form factors discussed above. For example, the computing device 750 may be provided by an appliance computing device that is a self-contained electronic device including a housing, a chassis, a case, or a shell.
In an example, the instructions 782 provided via the memory 756, the storage 758, or the processing units 752, 754 may be embodied as a non-transitory, machine-readable medium 760 including code to direct the processor 752 to perform electronic operations in the edge computing node 750. The processing units 752, 754 may access the non-transitory, machine-readable medium 760 over the interconnect 755. For instance, the non-transitory, machine-readable medium 760 may be embodied by devices described for the storage 758 or may include specific storage units such as optical disks, flash drives, or any number of other hardware devices. The non-transitory, machine-readable medium 760 may include instructions to direct the processing units 752, 754 to perform a specific sequence or flow of actions, for example, as described with respect to the flowchart(s) and block diagram(s) of operations and functionality discussed herein. As used herein, the terms “machine-readable medium”, “machine-readable storage”, “computer-readable storage”, and “computer-readable medium” are interchangeable.
In further examples, a machine-readable medium also includes any tangible medium that is capable of storing, encoding, or carrying instructions for execution by a machine and that cause the machine to perform any one or more of the methodologies of the present disclosure or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions. A “machine-readable medium” thus may include but is not limited to, solid-state memories, and optical and magnetic media. The instructions embodied by a machine-readable medium may further be transmitted or received over a communications network using a transmission medium via a network interface device utilizing any one of a number of transfer protocols (e.g., HTTP).
A machine-readable medium may be provided by a storage device or other apparatus which is capable of hosting data in a non-transitory format. In an example, information stored or otherwise provided on a machine-readable medium may be representative of instructions, such as instructions themselves or a format from which the instructions may be derived. This format from which the instructions may be derived may include source code, encoded instructions (e.g., in compressed or encrypted form), packaged instructions (e.g., split into multiple packages), or the like. The information representative of the instructions in the machine-readable medium may be processed by processing circuitry into the instructions to implement any of the operations discussed herein. For example, deriving the instructions from the information (e.g., processing by the processing circuitry) may include: compiling (e.g., from source code, object code, etc.), interpreting, loading, organizing (e.g., dynamically or statically linking), encoding, decoding, encrypting, unencrypting, packaging, unpackaging, or otherwise manipulating the information into the instructions.
In an example, the derivation of the instructions may include assembly, compilation, or interpretation of the information (e.g., by the processing circuitry) to create the instructions from some intermediate or preprocessed format provided by the machine-readable medium. The information, when provided in multiple parts, may be combined, unpacked, and modified to create the instructions. For example, the information may be in multiple compressed source code packages (or object code, or binary executable code, etc.) on one or several remote servers.
In further examples, a software distribution platform (e.g., one or more servers and one or more storage devices) may be used to distribute software, such as the example instructions discussed above, to one or more devices, such as example processor platform(s) and/or example connected edge devices noted above. The example software distribution platform may be implemented by any computer server, data facility, cloud service, etc., capable of storing and transmitting software to other computing devices. In some examples, the providing entity is a developer, a seller, and/or a licensor of software, and the receiving entity may be consumers, users, retailers, OEMs, etc., that purchase and/or license the software for use and/or re-sale and/or sub-licensing.
In some examples, the instructions are stored on storage devices of the software distribution platform in a particular format. A format of computer readable instructions includes, but is not limited to a particular code language (e.g., Java, JavaScript, Python, C, C#, SQL, HTML, etc.), and/or a particular code state (e.g., uncompiled code (e.g., ASCII), interpreted code, linked code, executable code (e.g., a binary), etc.). In some examples, the computer readable instructions stored in the software distribution platform are in a first format when transmitted to an example processor platform(s). In some examples, the first format is an executable binary in which particular types of the processor platform(s) can execute. However, in some examples, the first format is uncompiled code that requires one or more preparation tasks to transform the first format to a second format to enable execution on the example processor platform(s). For instance, the receiving processor platform(s) may need to compile the computer readable instructions in the first format to generate executable code in a second format that is capable of being executed on the processor platform(s). In still other examples, the first format is interpreted code that, upon reaching the processor platform(s), is interpreted by an interpreter to facilitate execution of instructions.
Memory Interleaving on Edge Computing Systems
In a variety of edge computing settings, there are variations in load for network traffic and usage of computing resources. For example, consider an edge computing network that deploys compute resources at respective base stations to process workloads, where each of the respective base stations are connected to different and varying numbers of client devices at different times, to perform varying types and amounts of processing operations. Many edge computing deployments attempt to handle this variation in traffic and compute usage by the organization and use of disaggregated resources situated at base stations and central offices. For example, memory and compute resources may be pooled among multiple locations to effectively handle workload among shared resources.
The disaggregated resources available in the edge layer 820 include communication resources 822, computing resources 824, caching resources 826, and the like, provided among a variety of devices or nodes. The resources 822, 824, and 826 may be arranged into compute pools, memory pools, etc., as shown by memory pooling 840, which represents a virtual pool of memory comprised of portions of memory devices existing among multiple physical systems. In some examples, a first set of resources may be tunneled via an interconnect protocol (e.g., CXL) to a second set of resources, such as the connection of accelerators to memory pools to enable the execution of compute operations on memory regions at the software level.
Within existing systems in the edge layer 820, there is a lack of capability to dynamically carve out memory regions across memory resources that are distributed. As a result, pooled resources of existing edge compute systems are unable to dynamically adapt the pooled resources based on bandwidth and other requirements. Further, with existing approaches, the memory pooling 840 that is established between different systems (e.g., between base stations) is fixed and cannot be easily reconfigured. There is no capability with existing pooling approaches to dynamically attach or adapt resources, such as mapping acceleration capability to memory pooling on the fly.
Carving out memory regions for dynamic memory pooling and memory pooling adaptation results in a number of new capabilities. For instance, existing approaches for memory pooling do not consider an estimation of current and future edge traffic at each base station, and corresponding pressure on various memory pools, proximity to various accelerators and their load. The present techniques for dynamic memory pool interleaving can consider these requirements, in addition to other aspects for memory pool performance such as granularity of interleaving, priority and redundancy/replication requirements, and the like. Further, the present techniques, especially when deployed at multiple base stations, can also consider sufficient proximity and network latency bottlenecks so that memory pool interleaving can be successfully performed without service degradation.
In the following, dynamic memory and resource pooling approaches are introduced that better adapt to real-world scenarios and service usage. In particular, a number of capabilities are introduced into a network switch to access and control a pooled architecture of memory resources (or any other resource pooling with similar characteristics). These capabilities include the introduction of transparently smart interleaving methods that are network aware. These capabilities also include pooling mechanisms that may be operated with bandwidth augmentation. This is accomplished with the use of estimation and prediction logic operating at the network switch, to evaluate resource telemetry and dynamically identify resource needs.
Various examples of memory pool interleaving are provided, but it will be understood that the present techniques for resource pooling can be combined with other types of interleaving and memory pool use management. As used herein, memory pool interleaving refers to the dispersing of memory storage and access among disaggregated, networked physical memory resources and locations (e.g., among network-connected different nodes or devices). The memory pool interleaving is an arrangement (e.g., configuration, scheme, approach) that is organized and performed in a coordinated fashion to reduce latency for overall use of the memory pool. For instance, in scenarios where available bandwidth presents a bottleneck for a particular memory location, then other memory resources of the pool are deployed for use in the pool.
It will be understood that memory pool interleaving, as used herein, is performed at a resource or system level, and is generally distinguishable from memory address interleaving that is commonly performed by a memory controller on individual memory banks within a memory module. Thus, individual systems in a distributed memory pool may use memory address interleaving within their memory modules. Further, memory pool interleaving may involve interleaving of data chunks, blocks, or sets that are much larger than conventional memory module interleaving.
The use of memory pool interleaving among multiple systems enables a unique capability to shape load for edge computing operations. Additionally, memory pool interleaving provides the capability to weave together memory hierarchies on the fly, attaching accelerators to perform operations on data in the memory. This also enables an ability to deploy accelerators to varying edge loads, especially for base stations.
In an example, a software stack on each platform (e.g., implemented in an operating system or network stack, or both) exposes an interface (e.g., application programming interface (API)) for identifying and configuring a pooled memory allocation. This interface can receive requests that identify a level of memory bandwidth, to enable a specification of the amount of bandwidth required for a memory chunk being allocated in the memory pool. For instance, multiple categories of memory bandwidth may be provided for use in the architecture, such as three types corresponding to High, Medium, Low (as a non-limiting example, High corresponding to over 1 Gbps, Medium corresponding to between 100-999 Mbps, Low corresponding to under 100 Mbps). Each memory type may be mapped to a different global address memory space. Additionally, the interface may enable an interleaving capability to be turned on and off for all or some of the pooled memory allocation. The interleaving capability can be configured with an explicit command invoked from the interface (e.g., to set interleaving on or off), or the interleaving capability can be implicitly configured based on a definition of the memory type. Various data may be used to save the state of the pooled memory allocation and which memory resources are available, and such data may be updated if the state of the memory resources changes.
The present approaches thus enable use of memory pooling that can control which portions of a memory resource or memory pool should (or should not) be interleaved among different memory locations, and which types of memory uses can or cannot be interleaved among different memory locations. In contrast, existing memory interleaving implementations are designed to enable interleaving at all regions of a single memory location—or at best, only provide a very limited number of memory regions in the location with interleaving disabled.
At a network switch 920, control of interleaving for a memory pool may be implemented by use of interleave logic 928. In an example, the interleave logic 928 may be implemented by use of a global source address decoder. For instance, the interleave logic 928 can be used to dynamically allocate the requested memory in the pooled memory region 930 from one of the interleaved memory spaces (e.g., a pool constructed from memory areas with memory storage interleaved among multiple memory resources 914A, 914B, 914C, 914D), or from one of the non-interleaved memory spaces (e.g., a pool constructed by directly storing to only one of the memory resources (914A, 914B, 914C, 914D). Accordingly, the pooled memory region 930 may include interleaved and non-interleaved memory storage among the various platforms.
The switch 920 may also configure the pooled memory region 930 to support use cases with in-memory compute capabilities. For example, the use cases may provide hints to use a non-interleaved mode to enable compute on the entire data as opposed to only a chunk of the interleaved data. Non-interleaved allocation may also be necessary for specific devices (DRAM, NVM, or others) that may need to be managed for resiliency through hot plugging or hot unplugging in order to service infrastructures without interfering with execution of distributed workloads.
In a further example, the switch 920 also implements an interface and additional logic for dynamic configuration and re-configuration of interleaved memory pooling. First, the logic implemented in the switch 920 can include bandwidth estimator and predictor logic 922. The bandwidth estimator and predictor logic 922 is used to select the numbers of end pooled memory servers needed for achieving the required level of memory bandwidth. The bandwidth estimator and predictor logic 922 selects and potentially reserves the fabric resources (e.g., specific memory bandwidth of a virtual channel) for connectivity to those servers. In a similar manner, an acceleration requirement estimator logic 924 can identify the usage of acceleration resources for in-memory processing; an incoming load requirement predictor logic 926 can provide a prediction of workload usage and needed resources for workload processing with use of the compute and memory resources. Each of these logic may also consider priority, tiering, and classification. Each of these logic may also discover or identify aspects of the disaggregated memory resources and update relevant data structures or databases about the disaggregated memory resources.
Next, the switch 920 implements the interleave logic 928. The switch 920 negotiates with the end memory pools to allocate the required memory chunks that will be interleaved in the pooled memory region 930. As can be understood, memory pool interleaving also includes some latency considerations, and can use different interleaving sizes to achieve different pooling properties. Further, interleaving also includes resiliency and infrastructure service management considerations.
The switch 920 may also use other logic (not depicted) to process reads and writes for a particular memory range. The logic is responsible for creating the corresponding unicast or multicast messages to split or gather all the required data and respond with one single response to the originator. For example, the interleave logic 928 may consider scenarios where the fabric resources are scarce and not enough to suffice a particular request without stealing temporary bandwidth from the best effort address ranges. The interleave logic 928 may also determine how pools from different groups can be mapped into the same interleaving type to apply load balancing schemes.
The interfaces in the respective platforms (e.g., 910A-910D) provide support for exposing (e.g., identifying, discovering) tiering within pools where interleaving is combined with an upper tier of memory at each pool that is deeply interleaved but where the capacity exposed by a lower tier is not deeply interleaved or not interleaved. These interfaces enable the infrastructure to support caching of popular or streaming data that flows from lower tiers. In some embodiments, when data cannot be split across servers (i.e., cannot be pooled), such interfaces provide an ability to achieve aggregate bandwidth and low latency of the highly interleaved upper tier memory and the high capacity of lower tiered memory/storage across the same interfaces. Further, such interfaces may be extended to enable transparent use of processing-in-memory/processing-in-storage through per-node acceleration logics.
The use of memory regions and configurations may be based on the use of tiers as noted above. It will be understood that additional in-memory-computing or in-storage-computing can be supported in outer (high capacity) tiers and the computed results can be supplied from inner tiers. At the same time, in-pool-compute can be supported with low power CPUs/XPUs for branch-based compute operations (sort, filter, etc.) at the upper tier with lower tier providing bulk in-pool-compute (scan, encrypt, reduce, split, merge) operations. Such operations may be enabled or coordinated through the use of acceleration capabilities or capabilities already built into memory technology devices.
The implementation of the memory pooling and memory pool interleaving may be enabled with use of a variety of distributed network processing units. For instance, an implementation may include the use of a set of distributed IPUs as coordinated according to the architectures discussed with reference to
IPUs enable connectivity and memory pooling among multiple network topologies, because data is synchronized through network connections to individual IPUs. Likewise, IPUs may implement logic to enable another tier for pooling (e.g. pool of pools). Accordingly, an IPU may operate at each platform/base station (e.g., IPU 1010A at base station 910A, IPU 1010B at base station 910B, IPU 1010C at base station 910C, IPU 1010D at base station 910D).
In the scenario of
In further examples, IPUs may coordinate to identify, discover, and map different memory pools or memory pool configurations into different interleaving types or resource groupings. A variety of discovery or data processing mechanisms may be used to perform this mapping and to retain or store data that maps the disaggregated memory resources at respective compute locations. Additionally, the IPUs may also be responsible for access or coordination of other resources, including but not limited to in-memory computing, accelerated processing, or low power operations (e.g., filter, sort, etc., or encryption/decryption operations).
At 1110, operations are performed to identify disaggregated memory resources at respective compute locations. In an example, the respective compute locations are connected to each another via at least one interconnect. For instance, the respective compute locations may correspond to processing hardware at respective base stations, as the client devices connect to the network via one or more of the respective base stations. Also for instance, the one or more of the respective compute locations may include acceleration resources, as the disaggregated memory resources are mapped to the acceleration resources (e.g., with disaggregated memory resources that are connected to the acceleration resources via a Compute Express Link (CXL) interconnect).
At 1120, operations are performed to identify workload requirements for use of the compute locations by respective workloads. In an example, the workloads are provided by client devices to the compute locations via a network. In various examples, the workload requirements are identified based on one or more of: a latency measurement for use of compute resources at the respective compute locations; an estimation of an availability of acceleration resources for current workloads in the network; a prediction of an availability of acceleration resources for future workloads in the network; a latency measurement for communications in the network; an estimation of current traffic in the network; or a prediction of bandwidth or load requirements in the network.
At 1130, operations are performed to determine an interleaving arrangement (e.g., configuration, scheme, or overlay) for a distributed memory pool that fulfills the workload requirements. In an example, the interleaving arrangement is provided in a virtual memory storage pool that is to distribute data for the respective workloads among the disaggregated memory resources at the respective compute locations. In a further example, this determination is based on categorizing memory bandwidth available at the disaggregated memory resources into multiple categories, to enable the interleaving arrangement to be determined using the multiple categories.
At 1140, operations are performed (e.g., via commands, requests, or other operations) to configure (i.e., enable) the memory pool for use by the client devices of the network. This configuration of the memory pool causes the disaggregated memory resources to host data based on the interleaving arrangement. In some examples, a portion of the disaggregated memory resources at one or more compute locations are established without interleaving (e.g., determined to not utilize interleaving) based on the workload requirements.
At 1150, operations are performed to conduct memory storage and retrieval operations from the disaggregated memory resources, using the memory pool. For instance, this may include storing data in the memory pool (e.g., to multiple compute locations) according to the interleaving arrangement, and retrieving data in the memory pool (e.g., from multiple compute locations) according to the interleaving arrangement.
At 1160, additional operations are performed to determine an updated interleaving arrangement for the memory pool, such as based on changed workload requirements or network conditions. At 1170, the memory pool is reconfigured to provide memory resources with use of the updated interleaving arrangement.
In further examples, the method of flowchart 1100 is performed by a network switch, and the method also includes (e.g., in connection with 1150) processing requests, at the network switch, for the use of the memory pool by the client devices of the network. Also in further examples, the method of flowchart 1100 is performed by a networked processing unit, and the method also includes implementing, at the networked processing unit (and, by networked processing unit operations), the interleaving arrangement among the disaggregated memory resources by causing the configuration of respective networked processing units at the respective compute locations.
Additional examples of the presently described method, system, and device embodiments include the following, non-limiting implementations. Each of the following non-limiting examples may stand on its own or may be combined in any permutation or combination with any one or more of the other examples provided below or throughout the present disclosure.
Example 1 is a method for configuring interleaving in a memory pool established in an edge computing arrangement, comprising: identifying (e.g., mapping, discovering, retrieving, or receiving information for) disaggregated memory resources at respective compute locations, the compute locations connected to each another via at least one interconnect; identifying (e.g., retrieving, generating, accessing) workload requirements for use of the compute locations by respective workloads, the workloads provided by client devices to the compute locations via a network; determining an interleaving arrangement for a memory pool that fulfills the workload requirements, the interleaving arrangement to distribute data for the respective workloads among the disaggregated memory resources at the respective compute locations; and configuring the memory pool (or, causing the memory pool to be configured) for use by the client devices of the network, the memory pool to cause the disaggregated memory resources among the compute locations to host data based on the interleaving arrangement.
In Example 2, the subject matter of Example 1 optionally includes subject matter where the method is performed by a network switch, and wherein the method further comprises: processing requests, at the network switch, for the use of the memory pool by the client devices of the network.
In Example 3, the subject matter of any one or more of Examples 1-2 optionally include subject matter where the method is performed by a networked processing unit, and wherein the method further comprises: implementing, at the networked processing unit, the interleaving arrangement among the disaggregated memory resources by configuration of respective networked processing units at the respective compute locations.
In Example 4, the subject matter of any one or more of Examples 1-3 optionally include subject matter where the workload requirements are identified based on one or more of: a latency measurement for use of compute resources at the respective compute locations; an estimation of an availability of acceleration resources for current workloads in the network; a prediction of an availability of acceleration resources for future workloads in the network; a latency measurement for communications in the network; an estimation of current traffic in the network; or a prediction of bandwidth or load requirements in the network.
In Example 5, the subject matter of any one or more of Examples 1-4 optionally include subject matter where the respective compute locations correspond to processing hardware at respective base stations, and wherein the client devices connect to the network via one or more of the respective base stations.
In Example 6, the subject matter of any one or more of Examples 1-5 optionally include subject matter where one or more of the respective compute locations include acceleration resources, and wherein the disaggregated memory resources are mapped to the acceleration resources.
In Example 7, the subject matter of Example 6 optionally includes subject matter where the disaggregated memory resources are connected to the acceleration resources via a Compute Express Link (CXL) interconnect.
In Example 8, the subject matter of any one or more of Examples 1-7 optionally include categorizing memory bandwidth available at the disaggregated memory resources into multiple categories; wherein the interleaving arrangement is determined using the multiple categories.
In Example 9, the subject matter of any one or more of Examples 1-8 optionally include allocating a portion of the disaggregated memory resources at one or more compute locations without interleaving based on the workload requirements.
In Example 10, the subject matter of any one or more of Examples 1-9 optionally include storing data in the memory pool according to the interleaving arrangement; and retrieving data in the memory pool according to the interleaving arrangement.
In Example 11, the subject matter of any one or more of Examples 1-10 optionally include determining an updated interleaving arrangement; and reconfiguring the memory pool for use by the client devices, based on the updated interleaving arrangement.
Example 12 is a device, comprising: a networked processing unit; and a storage medium including instructions embodied thereon, wherein the instructions, which when executed by the networked processing unit, configure the networked processing unit to: identify (e.g., map, discover, retrieve, or receive information for) disaggregated memory resources at respective compute locations, the compute locations connected to each another via at least one interconnect; identify (e.g., retrieve, generate, access) workload requirements for use of the compute locations by respective workloads, the workloads provided by client devices to the compute locations via a network; determine an interleaving arrangement for a memory pool that fulfills the workload requirements, the interleaving arrangement to distribute data for the respective workloads among the disaggregated memory resources at the respective compute locations; and configure the memory pool (or, cause the memory pool to be configured) for use by the client devices of the network, the memory pool to cause the disaggregated memory resources among the compute locations to host data based on the interleaving arrangement.
In Example 13, the subject matter of Example 12 optionally includes subject matter where the device is a network switch, and wherein the instructions further configure the networked processing unit to: process requests, at the network switch, for the use of the memory pool by the client devices of the network.
In Example 14, the subject matter of any one or more of Examples 12-13 optionally include subject matter where the instructions further configure the networked processing unit to: provide commands to respective networked processing units at the respective compute locations, to cause the respective networked processing units to implement the interleaving arrangement among the disaggregated memory resources.
In Example 15, the subject matter of any one or more of Examples 12-14 optionally include subject matter where the workload requirements are identified based on one or more of: a latency measurement for use of compute resources at the respective compute locations; an estimation of an availability of acceleration resources for current workloads in the network; a prediction of an availability of acceleration resources for future workloads in the network; a latency measurement for communications in the network; an estimation of current traffic in the network; or a prediction of bandwidth or load requirements in the network.
In Example 16, the subject matter of any one or more of Examples 12-15 optionally include subject matter where the respective compute locations correspond to processing hardware at respective base stations, and wherein the client devices connect to the network via one or more of the respective base stations.
In Example 17, the subject matter of any one or more of Examples 12-16 optionally include subject matter where one or more of the respective compute locations include acceleration resources, and wherein the disaggregated memory resources are mapped to the acceleration resources.
In Example 18, the subject matter of Example 17 optionally includes subject matter where the disaggregated memory resources are connected to the acceleration resources via a Compute Express Link (CXL) interconnect.
In Example 19, the subject matter of any one or more of Examples 12-18 optionally include subject matter where the instructions further configure the networked processing unit to: categorize memory bandwidth available at the disaggregated memory resources into multiple categories; wherein the interleaving arrangement is determined using the multiple categories.
In Example 20, the subject matter of any one or more of Examples 12-19 optionally include subject matter where the instructions further configure the networked processing unit to: allocate a portion of the disaggregated memory resources at one or more compute locations without interleaving based on the workload requirements.
In Example 21, the subject matter of any one or more of Examples 12-20 optionally include subject matter where the instructions further configure the networked processing unit to: store data in the memory pool according to the interleaving arrangement; and retrieve data in the memory pool according to the interleaving arrangement.
In Example 22, the subject matter of any one or more of Examples 12-21 optionally include subject matter where the instructions further configure the networked processing unit to: determine an updated interleaving arrangement; and reconfigure the memory pool for use by the client devices, based on the updated interleaving arrangement.
Example 23 is a machine-readable medium (e.g., a non-transitory storage medium) comprising information (e.g., data) representative of instructions, wherein the instructions, when executed by processing circuitry, cause the processing circuitry to perform, implement, or deploy any of Examples 1-22.
Example 24 is an apparatus of an edge computing system comprising means to implement any of Examples 1-23, or other subject matter described herein.
Example 25 is an apparatus of an edge computing system comprising logic, modules, circuitry, or other means to implement any of Examples 1-23, or other subject matter described herein.
Example 26 is a networked processing unit (e.g., an infrastructure processing unit as discussed here) or system including a networked processing unit, configured to implement any of Examples 1-23, or other subject matter described herein.
Example 27 is an edge computing system, including respective edge processing devices and nodes to invoke or perform any of the operations of Examples 1-23, or other subject matter described herein.
Example 28 is an edge computing system including aspects of network functions, acceleration functions, acceleration hardware, storage hardware, or computation hardware resources, operable to invoke or perform the use cases discussed herein, with use of any Examples 1-23, or other subject matter described herein.
Example 29 is a system to implement any of Examples 1-28.
Example 30 is a method to implement any of Examples 1-28.
Although these implementations have been described concerning specific exemplary aspects, it will be evident that various modifications and changes may be made to these aspects without departing from the broader scope of the present disclosure. Many of the arrangements and processes described herein can be used in combination or in parallel implementations that involve terrestrial network connectivity (where available) to increase network bandwidth/throughput and to support additional edge services. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof show, by way of illustration, and not of limitation, specific aspects in which the subject matter may be practiced. The aspects illustrated are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other aspects may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. This Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various aspects is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
Such aspects of the inventive subject matter may be referred to herein, individually and/or collectively, merely for convenience and without intending to voluntarily limit the scope of this application to any single aspect or inventive concept if more than one is disclosed. Thus, although specific aspects have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific aspects shown. This disclosure is intended to cover any adaptations or variations of various aspects. Combinations of the above aspects and other aspects not specifically described herein will be apparent to those of skill in the art upon reviewing the above description.
This application claims the benefit of priority to U.S. Provisional Patent Application No. 63/425,857, filed Nov. 16, 2022, and titled “COORDINATION OF DISTRIBUTED NETWORKED PROCESSING UNITS”, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63425857 | Nov 2022 | US |