5G is the fifth-generation technology standard for broadband cellular networks, which is planned eventually to take the place of the fourth-generation (4G) standard of Long-Term Evolution (LTE). 5G technology will offer greatly increased bandwidth, thereby broadening the cellular market beyond smartphones to provide last-mile connectivity to desktops, set-top boxes, laptops, Internet of Things (IoT) devices, and so on. Some 5G cells may employ frequency spectrum similar to that of 4G, while other 5G cells may employ frequency spectrum in the millimeter wave band. Cells in the millimeter wave band will have a relatively small coverage area but will offer much higher throughput than 4G.
Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, with emphasis instead being placed upon clearly illustrating the principles of the disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
The present disclosure generally relates to the use of an artificial intelligence assistant in configuring and managing radio-based networks in an intent-based manner, and for providing observability into the operations and performance of the radio-based networks. For example, a radio-based network may include a cellular network such as a fourth-generation (4G) Long-Term Evolution (LTE) network, a fifth-generation (5G) network, a 4G-5G hybrid core with both 4G and 5G RANs, a sixth-generation (6G) network, or another network that provides wireless network access. Cellular network providers such as telecommunications companies (“telcos”) deploy network functions to provide communication services to their customers. Network functions refer to the various components and services that make up a telecommunications network. Network functions are increasingly deployed using virtualization technologies such as Network Functions Virtualization (NFV) and Software-Defined Networking (SDN). This allows network functions to run as software on commodity hardware, providing flexibility and scalability. Virtualized network functions (VNFs) can also be easier to deploy and manage. For example, network providers can use Network Service Descriptors (NSDs) to define and plan network services, and leverage templates such as TOSCA (Topology and Orchestration Specification for Cloud Applications) templates to automate the deployment, scaling, and management of network functions specified in NSDs.
These service templates requiring defining “nodes” which refers to the fundamental building blocks or components that represent various entities in the network application. These nodes are used to describe the elements of the application's topology, including software components, services, and infrastructure resources, as well as the structure and relationships of the components in the application, along with their properties and requirements. A network orchestrator program or service can interpret these templates and carry out the deployment and management of the specified nodes and their interconnections as defined in the templates.
Service templates can define various network functions which run on the geographically distributed hardware of the network. Distributed units (DUs) are computing devices that are typically deployed at cell sites of radio access networks (RANs) in radio-based networks. DUs operate at the lower layers of the RAN protocol stack, such as the Radio Link Control (RLC) sublayer, the Medium Access Control (MAC) sublayer, and the physical layer, depending on the particular implementation. This is in contrast to centralized units (CUs), which may be deployed at centralized locations and provide support for higher layers of the protocol stack, such as the Service Data Adaptation Protocol (SDAP), the Packet Data Convergence Protocol (PDCP), and the Radio Resource Control (RRC) protocol. Together, the DU and CU may correspond to the next generation node B (gNB) in fifth-generation (5G) networks, which enables user equipment (UEs) to connect to the core network. The DUs interface with one or more radio units (RUs) in order to communicate wirelessly with the UEs.
Data traffic is often routed through a fiber transport network consisting of multiple hops of layer 3 routers (e.g., at aggregation sites) to the core network. The core network is typically housed in one or more data centers. The core network typically aggregates data traffic from end devices, authenticates subscribers and devices, applies personalized policies, and manages the mobility of the devices before routing the traffic to operator services or the Internet. A 5G Core for example can be decomposed into a number of microservice elements with control and user plane separation. Rather than physical network elements, a 5G Core can comprise virtualized, software-based network functions (deployed for example as microservices) and can therefore be instantiated within Multi-access Edge Computing (MEC) cloud infrastructures. The network functions of the core network can include a User Plane Function (UPF), Access and Mobility Management Function (AMF), and Session Management Function (SMF), and other functions.
Radio-based network deployments include multiple network functions that often require tedious manual configuration during deployment, maintenance, and failover procedures. Any downtime is unsatisfactory, as it leads to service downtime, service level agreement (SLA) violations, and a degraded user experience. However, manual reconfigurations in the case of a network failure typically require the physical presence of engineers and introduce a great overhead.
Accordingly, as it will be appreciated, generating network service templates to correctly deploy and manage telecommunications networks can be challenging, especially for complex network services. A deep understanding of TOSCA concepts—including node types, relationship types, properties, and capabilities—is required for modeling the application correctly. TOSCA templates not only define initial deployments but also need to consider the entire service lifecycle, including scaling, healing, updates, and decommissioning. This adds complexity to template creation. Further, security is a critical aspect of cloud and network applications. Incorporating security best practices and policies into TOSCA templates can be challenging and requires a thorough understanding of security principles. While many examples presented herein focus on TOSCA templates, it will be appreciated that the disclosed techniques can be applied for other types of radio-based network templates as well.
Beneficially, the embodiments of the present disclosure address these challenges, among others, by enabling users to express their intent about the type and quality of network that they want to provide, and in return generating a network service template for a network that matches that intent, as well as by enabling users to make natural language queries about the state of their network and returning accurate information about network performance together with any recommended improvements. For example, a user may express respective intents to deploy a network function with high availability, with low latency, and/or with throughput in a nominal range. As will be described, the corresponding generated network configuration can be customized to support the intent (e.g., with infrastructure deployed across two or more infrastructure sites in order to provide high availability, or with infrastructure deployed at an edge location to provide low latency). Various embodiments of the present disclosure introduce the use of an intent-based virtual assistant, powered by generative artificial intelligence (AI), to automate and simplify the configuration and management of radio-based networks.
In a first set of embodiments, the AI assistant can automatically generate configuration templates for deployment of network functions on cloud provider network infrastructure based upon a natural language prompt. A large language model (LLM) trained (at least) on code (and possibly large amounts of other textual data) can be enhanced with the knowledge of a grammar used in configuration templates for network function deployment, as well as specific network configurations that align with expressed intents, via retrieval augmented generation (RAG) and/or fine-tuning. The configuration templates may then be used to automatically deploy various network functions using various computing elements available in a cloud provider network. In some embodiments, the AI assistant can modify existing configuration templates to provide some form of improvement in terms of connectivity issues, network reachability, security policy, cost, and computing resource types.
In a second set of embodiments, the AI assistant can respond to queries regarding radio-based network status and health to provide users with visibility into operational conditions. This avoids requiring administrative users to look at potentially multiple dashboards to manually assess network conditions. Various embodiments may integrate across multiple databases to provide network key performance indicators, generation of structured query language (SQL) queries or API queries that return the desired data, as well as generation of maps of what is happing in the radio-based network.
As a non-limiting example, a user may query the AI assistant as to the cells in the radio-based network that have the highest number of active users. In response, the AI assistant may generate a map of a geographic area depicting the location of the cells having the highest number of active users, and/or provide a table with identifying information for the cells having the highest number of active users. In addition, the AI assistant may be able to implement configuration changes for on-going management of the radio-based network in response to natural language prompts.
As one skilled in the art will appreciate in light of this disclosure, certain embodiments may be capable of achieving certain advantages, including some or all of the following: (1) improving the deployment of radio-based networks by reducing errors in deployment configurations; (2) improving the security of radio-based networks by generating configurations that meet security requirements; (3) improving the operation of radio-based networks by facilitating the deployment of high-availability network function configurations; (4) improving the operation of radio-based networks by giving greater visibility into network health and status across multiple databases without requiring a user to look at multiple dashboards; (5) improving the functioning of radio-based networks by implementing configuration changes in response to natural language prompts; and so forth.
The radio-based network may use a core network infrastructure that may be provisioned dynamically and used in conjunction with one or more radio access networks operated by a cloud provider network and/or a plurality of communication service providers. The radio-based networks may also be scaled up or down or terminated dynamically. In various scenarios, an organization may create either a private radio-based network for internal use only or a radio-based network open to third-party customers using embodiments of the present disclosure.
Previous deployments of radio-based networks have relied upon manual deployment and configuration at each step of the process. This proved to be extremely time-consuming and expensive. Further, in previous generations, software was inherently tied to vendor-specific hardware, thereby preventing customers from deploying alternative software. By contrast, with 5G, hardware is decoupled from the software stack, which allows more flexibility, and allows components of the radio-based network to be executed on cloud provider infrastructure. Using a cloud delivery model for a radio-based network, such as a 5G network, can facilitate handling network traffic from hundreds up to billions of connected devices and compute-intensive applications, while delivering faster speeds, lower latency, and more capacity than other types of networks.
Various embodiments of the present disclosure may also bring the concept of elasticity and utility computing from the cloud computing model to radio-based networks and associated core networks. For example, the disclosed techniques can run core and radio access network functions and associated control plane management functions on cloud provider infrastructure, creating a cloud native core network and/or a cloud native radio access network (RAN). Such core and RAN network functions can be based on the 3rd Generation Partnership Project (3GPP) specifications in some implementations. By providing a cloud-native radio-based network, a customer, such as a communication service provider, may dynamically scale its radio-based network based on utilization, latency requirements, and/or other factors.
Among the benefits of the present disclosure is the ability to deploy and chain network functions together to deliver an end-to-end service that meets specified constraints and requirements. According to the present disclosure, network functions organized into microservices work together to provide end-to-end connectivity. One set of network functions are part of a radio network, running in cell towers and performing wireless signal to IP conversion. Other network functions run in large data centers performing subscriber related business logic and routing IP traffic to the internet and back. For applications to use the new capabilities of 5G such as low latency communication and reserved bandwidth, both of these types of network functions need to work together to appropriately schedule and reserve wireless spectrum, and perform real time compute and data processing. The presently disclosed techniques may provide edge location hardware (as described further below) integrated with network functions that run across the entire network, from cell sites to Internet break-outs, and orchestrate the network functions to meet required Quality of Service (QOS) constraints. This enables an entirely new set of applications that have strict QoS requirements, from factory-based Internet of Things (IoT), to augmented reality (AR), to virtual reality (VR), to game streaming, to autonomous navigation support for connected vehicles, that previously could not run on a mobile network.
The present disclosure describes embodiments relating to the creation and management of a cloud native 5G core and/or a cloud native 5G RAN, and associated control plane components. Cloud native refers to an approach to building and running applications that exploits the advantages of the cloud computing delivery model such as dynamic scalability, distributed computing, and high availability (including geographic distribution, redundancy, and failover). Cloud native refers to how these applications are created and deployed to be suitable for deployment in a cloud. While cloud native applications can be (and often are) run in the cloud, they also can be run in an on-premises data center. Some cloud native applications can be containerized, for example, having different parts, functions, or subunits of the application packaged in their own containers, which can be dynamically orchestrated so that each part is actively scheduled and managed to optimize resource utilization. These containerized applications can be architected using a microservices architecture to increase the overall agility and maintainability of the applications.
In a microservices architecture, an application is arranged as a collection of smaller subunits (“microservices”) that can be deployed and scaled independently from one another, and which can communicate with one another over a network. These microservices are typically fine-grained, in that they have specific technical and functional granularity, and often implement lightweight communications protocols. The microservices of an application can perform different functions from one another, can be independently deployable, and may use different programming languages, databases, and hardware/software environments from one another. Decomposing an application into smaller services beneficially improves modularity of the application, enables replacement of individual microservices as needed, and parallelizes development by enabling teams to develop, deploy, and maintain their microservices independently from one another. A microservice may be deployed using a virtual machine, container, or serverless function, in some examples. The disclosed core and RAN software may follow a microservices architecture such that the described radio-based networks are composed of independent subunits that can be deployed and scaled on demand.
Turning now to
Various deployments of the radio-based network 103 can include one or more of a core network and a RAN network, as well as a control plane for running the core and/or RAN network on cloud provider infrastructure. As described above, these components can be developed in a cloud native fashion, for example using a microservices architecture, such that centralized control and distributed processing is used to scale traffic and transactions efficiently. These components may be based on the 3GPP specifications by following an application architecture in which control plane and user plane processing is separated (CUPS Architecture).
The radio-based network 103 provides wireless network access to a plurality of wireless devices 106, which may be mobile devices or fixed location devices. In various examples, the wireless devices 106 may include smartphones, connected vehicles, IoT devices, sensors, machinery (such as in a manufacturing facility), hotspots, and other devices. The wireless devices 106 are sometimes referred to as user equipment (UE) or customer premises equipment (CPE).
The radio-based network 103 can include capacity provisioned on one or more RANs that provide the wireless network access to the plurality of wireless devices 106 through a plurality of cell sites 109. The RANs may be operated by a cloud network provider or different communication service providers. Each of the cell sites 109 may be equipped with one or more antennas and one or more radio units that send and receive wireless data signals to and from the wireless devices 106. As such, the RAN implements a radio access technology to enable radio connection with wireless devices 106 and provides connection with the radio-based network's core network. Components of the RAN include a base station and antennas that cover a given physical area, as well as required core network items for managing connections to the RAN. Core network functions can include a UPF, an SMF, an AMF, and/or other functions.
The UPF provides an interconnect point between the mobile infrastructure and the Data Network (DN), i.e., encapsulation and decapsulation of General Packet Radio Service (GPRS) tunneling protocol for the user plane (GTP-U). The UPF can also provide a session anchor point for providing mobility within the RAN, including sending one or more end marker packets to the RAN base stations. The UPF can also handle packet routing and forwarding, including directing flows to specific data networks based on traffic matching filters. Another feature of the UPF includes per-flow or per-application QoS handling, including transport level packet marking for uplink (UL) and downlink (DL), and rate limiting. The UPF can be implemented as a cloud native network function using modern microservices methodologies, for example being deployable within a serverless framework (which abstracts away the underlying infrastructure that code runs on via a managed service).
The AMF can receive the connection and session information from the wireless devices 106 or the RAN and can handle connection and mobility management tasks. For example, the AMF can manage handovers between base stations in the RAN. In some examples the AMF can be considered as the access point to the 5G core, by terminating certain RAN control plane and wireless device 106 traffic. The AMF can also implement ciphering and integrity protection algorithms.
The SMF can handle session establishment or modification, for example by creating, updating and removing Protocol Data Unit (PDU) sessions and managing session context within the UPF. The SMF can also implement Dynamic Host Configuration Protocol (DHCP) and IP Address Management (IPAM). The SMF can be implemented as a cloud native network function using modern microservices methodologies.
Various network functions to implement the radio-based network 103 may be deployed in distributed computing devices 112, which may correspond to general-purpose computing devices configured to perform the network functions. For example, the distributed computing devices 112 may execute one or more virtual machine instances and/or containers that are configured in turn to execute one or more services that perform the network functions. In one embodiment, the distributed computing devices 112 are ruggedized machines that are deployed at each cell site. The distributed computing devices 112 may be operated as an extension of a cloud provider network, with DU functions being executed, for example, by a container cluster upon the distributed computing devices 112. Further, the distributed computing devices 112 may be managed by the cloud provider network.
One or more centralized computing devices 115 may perform various network functions at a central site. For example, the centralized computing devices 115 may be centrally located on premises of the customer in a conditioned server room. The centralized computing devices 115 may execute one or more virtual machine instances that are configured in turn to execute one or more services that perform the network functions. In some cases, the centralized computing devices 115 may be located in a data center of a cloud provider network, rather than upon a customer's premises.
In one or more embodiments, network traffic from the radio-based network 103 is backhauled to one or more core computing devices 118 that may be located at one or more data centers situated remotely from the customer's site. The core computing devices 118 may also perform various network functions, including routing network traffic to and from the network 121, which may correspond to the Internet and/or other external public or private networks. The core computing devices 118 may perform functionality related to the management of the communication network 100 (e.g., billing, mobility management, etc.) and transport functionality to relay traffic between the communication network 100 and other networks. The core network sits between the RAN and external networks, such as the Internet and the public switched telephone network, and performs features such as authentication of UE, secure session management, user accounting, and handover of mobile UE between different RAN sites. As described herein, the core network functions typically performed by the core computing devices 118 may instead be performed by the distributed computing devices 112.
Collectively, the radio unit (RU), distributed unit (DU), and central unit (CU) convert the analog radio signal received from the antenna into a digital packet that can be routed over a network, and similarly they convert digital packets into radio signals that can be transmitted by the antenna. This signal transformation is accomplished by a sequence of network functions which can be distributed amongst the RU, DU, and CU in various ways to achieve different balances of latency, throughput, and network performance. These are referred to as “functional splits” of the RAN.
The network functions implemented in the RAN correspond to the lowest three network layers in the seven layer OSI model of computer networking. The physical Layer, PHY, or layer 1 (L1) is the first and lowest layer in the OSI model. In a radio-based network 103, the PHY is the layer that sends and receives radio signals. This can be split into two portions: a “high PHY” and “low PHY.” Each of these can be considered a network function. The high PHY converts binary bits into electrical pulses that represent the binary data, and the low PHY then converts these electric pulses into radio waves to be transmitted wirelessly by the antennae. The PHY similarly converts received radio waves into a digital signal. This layer may be implemented by a specialized PHY chip.
The PHY interfaces with the data link layer-layer 2 (L2) in the OSI model. The primary task of the L2 is to provide an interface between the higher transport layers and the PHY. The 5G L2 has three sublayers: media access control (MAC), Radio Link Control (RLC), and Packet Data Convergence Protocol (PDCP). Each of these can be considered a network function. The PDCP provides security of radio resource control (RRC) traffic and signaling data, sequence numbering and sequential delivery of RRC messages and IP packets, and IP packet header compression. The RLC protocol provides control of the radio link. The MAC protocol maps information between logical and transport channels.
The data link layer interfaces with layer 3 (L3) in the OSI model, the network layer. The 5G L3 is also referred to as the Radio Resource Control (RRC) layer and is responsible for functions such as packet forwarding, quality of service management, and the establishment, maintenance, and release of a RRC connection between the UE and RAN.
Various functional splits can be chosen for a RAN. The functional splits define different sets of the L1 and L2 functions which are run on the RU versus on the CU and DU. The L3 is also run on the CU. In a RAN architecture following split 7, for example, the functionality of the baseband unit (BBU) used in previous wireless network generations is split into two functional units: the DU which is responsible for real time L1 and L2 scheduling functions, and the CU which is responsible for non-real time, higher L2 and L3 functions. By contrast, in a RAN architecture following split 2, for example, only the PDCP from L2 is handled by the DU and CU, while RLC, MAC, PHY, and radio-frequency signals (RF) are handled by the RU. In split 5, for example, the DU and CU handle PDCP, RLC, and part of the MAC functions, while the RU handles part of the MAC as well as PHY and RF. In split 6, for example, the DU and CU handle PDCP, RLC, MAC, and the RU handles only PHY and RF. In split 8, for example, the DU and CU handle PDCP, RLC, MAC, and PHY, while the RU handles just RF.
The cloud provider network 203 can provide on-demand, scalable computing platforms to users through a network, for example, allowing users to have at their disposal scalable “virtual computing devices” via their use of the compute servers (which provide compute instances via the usage of one or both of central processing units (CPUs) and graphics processing units (GPUs), optionally with local storage) and block store servers (which provide virtualized persistent block storage for designated compute instances). These virtual computing devices have attributes of a personal computing device including hardware (various types of processors, local memory, random access memory (RAM), hard-disk, and/or solid-state drive (SSD) storage), a choice of operating systems, networking capabilities, and pre-loaded application software. Each virtual computing device may also virtualize its console input and output (e.g., keyboard, display, and mouse). This virtualization allows users to connect to their virtual computing device using a computer application such as a browser, API, software development kit (SDK), or the like, in order to configure and use their virtual computing device just as they would a personal computing device. Unlike personal computing devices, which possess a fixed quantity of hardware resources available to the user, the hardware associated with the virtual computing devices can be scaled up or down depending upon the resources the user requires.
As indicated above, users can connect to virtualized computing devices and other cloud provider network 203 resources and services, and configure and manage telecommunications networks such as 5G networks, using various interfaces 206 (e.g., APIs) via intermediate network(s) 212. An API refers to an interface 206 and/or communication protocol between a client device 215 and a server, such that if the client makes a request in a predefined format, the client should receive a response in a specific format or cause a defined action to be initiated. In the cloud provider network context, APIs provide a gateway for customers to access cloud infrastructure by allowing customers to obtain data from or cause actions within the cloud provider network 203, enabling the development of applications that interact with resources and services hosted in the cloud provider network 203. APIs can also enable different services of the cloud provider network 203 to exchange data with one another. Users can choose to deploy their virtual computing systems to provide network-based services for their own use and/or for use by their customers or clients.
The cloud provider network 203 can include a physical network (e.g., sheet metal boxes, cables, rack hardware) referred to as the substrate. The substrate can be considered as a network fabric containing the physical hardware that runs the services of the provider network. The substrate may be isolated from the rest of the cloud provider network 203, for example it may not be possible to route from a substrate network address to an address in a production network that runs services of the cloud provider, or to a customer network that hosts customer resources.
The cloud provider network 203 can also include an overlay network of virtualized computing resources that run on the substrate. In at least some embodiments, hypervisors or other devices or processes on the network substrate may use encapsulation protocol technology to encapsulate and route network packets (e.g., client IP packets) over the network substrate between client resource instances on different hosts within the provider network. The encapsulation protocol technology may be used on the network substrate to route encapsulated packets (also referred to as network substrate packets) between endpoints on the network substrate via overlay network paths or routes. The encapsulation protocol technology may be viewed as providing a virtual network topology overlaid on the network substrate. As such, network packets can be routed along a substrate network according to constructs in the overlay network (e.g., virtual networks that may be referred to as virtual private clouds (VPCs), port/protocol firewall configurations that may be referred to as security groups). A mapping service (not shown) can coordinate the routing of these network packets. The mapping service can be a regional distributed look up service that maps the combination of overlay internet protocol (IP) and network identifier to substrate IP so that the distributed substrate computing devices can look up where to send packets.
To illustrate, each physical host device (e.g., a compute server, a block store server, an object store server, a control server) can have an IP address in the substrate network. Hardware virtualization technology can enable multiple operating systems to run concurrently on a host computer, for example as virtual machines (VMs) on a compute server. A hypervisor, or virtual machine monitor (VMM), on a host allocates the host's hardware resources amongst various VMs on the host and monitors the execution of the VMs. Each VM may be provided with one or more IP addresses in an overlay network, and the VMM on a host may be aware of the IP addresses of the VMs on the host. The VMMs (and/or other devices or processes on the network substrate) may use encapsulation protocol technology to encapsulate and route network packets (e.g., client IP packets) over the network substrate between virtualized resources on different hosts within the cloud provider network 203. The encapsulation protocol technology may be used on the network substrate to route encapsulated packets between endpoints on the network substrate via overlay network paths or routes. The encapsulation protocol technology may be viewed as providing a virtual network topology overlaid on the network substrate. The encapsulation protocol technology may include the mapping service that maintains a mapping directory that maps IP overlay addresses (e.g., IP addresses visible to customers) to substrate IP addresses (IP addresses not visible to customers), which can be accessed by various processes on the cloud provider network 203 for routing packets between endpoints.
As illustrated, the traffic and operations of the cloud provider network substrate may broadly be subdivided into two categories in various embodiments: control plane traffic carried over a logical control plane 218 and data plane operations carried over a logical data plane 221. While the data plane 221 represents the movement of user data through the distributed computing system, the control plane 218 represents the movement of control signals through the distributed computing system. The control plane 218 generally includes one or more control plane components or services distributed across and implemented by one or more control servers. Control plane traffic generally includes administrative operations, such as establishing isolated virtual networks for various customers, monitoring resource usage and health, identifying a particular host or server at which a requested compute instance is to be launched, provisioning additional hardware as needed, and so on. The data plane 221 includes customer resources that are implemented on the cloud provider network (e.g., computing instances, containers, block storage volumes, databases, file storage). Data plane traffic generally includes non-administrative operations such as transferring data to and from the customer resources.
The control plane components are typically implemented on a separate set of servers from the data plane servers, and control plane traffic and data plane traffic may be sent over separate/distinct networks. In some embodiments, control plane traffic and data plane traffic can be supported by different protocols. In some embodiments, messages (e.g., packets) sent over the cloud provider network 203 include a flag to indicate whether the traffic is control plane traffic or data plane traffic. In some embodiments, the payload of traffic may be inspected to determine its type (e.g., whether control or data plane). Other techniques for distinguishing traffic types are possible.
As illustrated, the data plane 221 can include one or more compute servers, which may be bare metal (e.g., single tenant) or may be virtualized by a hypervisor to run multiple VMs (sometimes referred to as “instances”) or microVMs for one or more customers. These compute servers can support a virtualized computing service (or “hardware virtualization service”) of the cloud provider network 203. The virtualized computing service may be part of the control plane 218, allowing customers to issue commands via an interface 206 (e.g., an API) to launch and manage compute instances (e.g., VMs, containers) for their applications. The virtualized computing service may offer virtual compute instances with varying computational and/or memory resources. In one embodiment, each of the virtual compute instances may correspond to one of several instance types. An instance type may be characterized by its hardware type, computational resources (e.g., number, type, and configuration of CPUs or CPU cores), memory resources (e.g., capacity, type, and configuration of local memory), storage resources (e.g., capacity, type, and configuration of locally accessible storage), network resources (e.g., characteristics of its network interface and/or network capabilities), and/or other suitable descriptive characteristics. Using instance type selection functionality, an instance type may be selected for a customer, e.g., based (at least in part) on input from the customer. For example, a customer may choose an instance type from a predefined set of instance types. As another example, a customer may specify the desired resources of an instance type and/or requirements of a workload that the instance will run, and the instance type selection functionality may select an instance type based on such a specification.
The data plane 221 can also include one or more block store servers, which can include persistent storage for storing volumes of customer data as well as software for managing these volumes. These block store servers can support a managed block storage service of the cloud provider network 203. The managed block storage service may be part of the control plane 218, allowing customers to issue commands via the interface 206 (e.g., an API) to create and manage volumes for their applications running on compute instances. The block store servers include one or more servers on which data is stored as blocks. A block is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length of the block size. Block data is normally stored in a data buffer and read or written a whole block at a time. In general, a volume can correspond to a logical collection of data, such as a set of data maintained on behalf of a user. User volumes, which can be treated as an individual hard drive ranging for example from 1 gigabyte (GB) to 1 terabyte (TB) or more in size, are made of one or more blocks stored on the block store servers. Although treated as an individual hard drive, it will be appreciated that a volume may be stored as one or more virtualized devices implemented on one or more underlying physical host devices. Volumes may be partitioned a small number of times (e.g., up to 16) with each partition hosted by a different host. The data of the volume may be replicated between multiple devices within the cloud provider network, in order to provide multiple replicas of the volume (where such replicas may collectively represent the volume on the computing system). Replicas of a volume in a distributed computing system can beneficially provide for automatic failover and recovery, for example by allowing the user to access either a primary replica of a volume or a secondary replica of the volume that is synchronized to the primary replica at a block level, such that a failure of either the primary or secondary replica does not inhibit access to the information of the volume. The role of the primary replica can be to facilitate reads and writes (sometimes referred to as “input output operations,” or simply “I/O operations”) at the volume, and to propagate any writes to the secondary (preferably synchronously in the I/O path, although asynchronous replication can also be used). The secondary replica can be updated synchronously with the primary replica and provide for seamless transition during failover operations, whereby the secondary replica assumes the role of the primary replica, and either the former primary is designated as the secondary or a new replacement secondary replica is provisioned. Although certain examples herein discuss a primary replica and a secondary replica, it will be appreciated that a logical volume can include multiple secondary replicas. A compute instance can virtualize its I/O to a volume by way of a client. The client represents instructions that enable a compute instance to connect to, and perform I/O operations at, a remote data volume (e.g., a data volume stored on a physically separate computing device accessed over a network). The client may be implemented on an offload card of a server that includes the processing units (e.g., CPUs or GPUs) of the compute instance.
The data plane 221 can also include one or more object store servers, which represent another type of storage within the cloud provider network. The object storage servers include one or more servers on which data is stored as objects within resources referred to as buckets and can be used to support a managed object storage service of the cloud provider network. Each object typically includes the data being stored, a variable amount of metadata that enables various capabilities for the object storage servers with respect to analyzing a stored object, and a globally unique identifier or key that can be used to retrieve the object. Each bucket is associated with a given user account. Customers can store as many objects as desired within their buckets, can write, read, and delete objects in their buckets, and can control access to their buckets and the objects contained therein. Further, in embodiments having a number of different object storage servers distributed across different ones of the regions described above, users can choose the region (or regions) where a bucket is stored, for example to optimize for latency. Customers may use buckets to store objects of a variety of types, including machine images that can be used to launch VMs, and snapshots that represent a point-in-time view of the data of a volume.
An edge server 224 provides resources and services of the cloud provider network 203 within a separate network, such as a telecommunications network, thereby extending functionality of the cloud provider network 203 to new locations (e.g., for reasons related to latency in communications with customer devices, legal compliance, security, etc.). In some implementations, an edge server 224 can be configured to provide capacity for cloud-based workloads to run within the telecommunications network. In some implementations, an edge server 224 can be configured to provide the core and/or RAN functions of the telecommunications network, and may be configured with additional hardware (e.g., radio access hardware). Some implementations may be configured to allow for both, for example by allowing capacity unused by core and/or RAN functions to be used for running cloud-based workloads.
As indicated, such edge servers 224 can include cloud provider network-managed edge servers 227 (e.g., formed by servers located in a facility such as a customer's premises or a cellular communication network separate from those associated with the cloud provider network 203 but where such servers are still managed by the cloud provider), customer-managed edge servers 233 (e.g., formed by servers located on-premise in a customer or partner facility), among other possible types of substrate extensions.
As illustrated in the example edge server 224, an edge server 224 can similarly include a logical separation between a control plane 236 and a data plane 239, respectively extending the control plane 218 and data plane 221 of the cloud provider network 203. The edge server 224 may be pre-configured, e.g. by the cloud provider network operator, with an appropriate combination of hardware with software and/or firmware elements to support various types of computing-related resources, and to do so in a manner that mirrors the experience of using the cloud provider network 203. For example, one or more edge server location servers can be provisioned by the cloud provider for deployment within an edge server 224. As described above, the cloud provider network 203 may offer a set of predefined instance types, each having varying types and quantities of underlying hardware resources. Each instance type may also be offered in various sizes. In order to enable customers to continue using the same instance types and sizes in an edge server 224 as they do in the region, the servers can be heterogeneous servers. A heterogeneous server can concurrently support multiple instance sizes of the same type and may be also reconfigured to host whatever instance types are supported by its underlying hardware resources. The reconfiguration of the heterogeneous server can occur on-the-fly using the available capacity of the servers, that is, while other VMs are still running and consuming other capacity of the edge server location servers. This can improve utilization of computing resources within the edge location by allowing for better packing of running instances on servers, and also provides a seamless experience regarding instance usage across the cloud provider network 203 and the cloud provider network-managed edge server 227.
The edge servers can host one or more compute instances. Compute instances can be VMs, or containers that package up code and all its dependencies, so that an application can run quickly and reliably across computing environments (e.g., including VMs and microVMs). In addition, the servers may host one or more data volumes, if desired by the customer. In the region of a cloud provider network 203, such volumes may be hosted on dedicated block store servers. However, due to the possibility of having a significantly smaller capacity at an edge server 224 than in the region, an optimal utilization experience may not be provided if the edge server 224 includes such dedicated block store servers. Accordingly, a block storage service may be virtualized in the edge server 224, such that one of the VMs runs the block store software and stores the data of a volume. Similar to the operation of a block storage service in the region of a cloud provider network 203, the volumes within an edge server 224 may be replicated for durability and availability. The volumes may be provisioned within their own isolated virtual network within the edge server 224. The compute instances and any volumes collectively make up a data plane 239 extension of the provider network data plane 221 within the edge server 224.
The servers within an edge server 224 may, in some implementations, host certain local control plane components, for example, components that enable the edge server 224 to continue functioning if there is a break in the connection back to the cloud provider network 203. Examples of these components include a migration manager that can move compute instances between edge servers if needed to maintain availability, and a key value data store that indicates where volume replicas are located. However, generally the control plane 236 functionality for an edge server 224 will remain in the cloud provider network 203 in order to allow customers to use as much resource capacity of the edge server 224 as possible.
Server software running at an edge server 224 may be designed by the cloud provider to run on the cloud provider substrate network, and this software may be enabled to run unmodified in an edge server 224 by using local network manager(s) 242 to create a private replica of the substrate network within the edge location (a “shadow substrate”). The local network manager(s) 242 can run on edge server 224 servers and bridge the shadow substrate with the edge server 224 network, for example, by acting as a virtual private network (VPN) endpoint or endpoints between the edge server 224 and the proxies 245, 248 in the cloud provider network 203 and by implementing the mapping service (for traffic encapsulation and decapsulation) to relate data plane traffic (from the data plane proxies 248) and control plane traffic (from the control plane proxies 245) to the appropriate server(s). By implementing a local version of the provider network's substrate-overlay mapping service, the local network manager(s) 242 allow resources in the edge server 224 to seamlessly communicate with resources in the cloud provider network 203. In some implementations, a single local network manager 242 can perform these actions for all servers hosting compute instances in an edge server 224. In other implementations, each of the server hosting compute instances may have a dedicated local network manager 242. In multi-rack edge locations, inter-rack communications can go through the local network managers 242, with local network managers maintaining open tunnels to one another.
Edge server locations can utilize software-defined networking and secure networking tunnels through the edge server 224 network to the cloud provider network 203, for example, to maintain security of customer data when traversing the edge server 224 network and any other intermediate network (which may include the public internet). Within the cloud provider network 203, these tunnels are composed of virtual infrastructure components including isolated virtual networks (e.g., in the overlay network), control plane proxies 245, data plane proxies 248, and substrate network interfaces. Such proxies 245, 248 may be implemented as containers running on compute instances. In some embodiments, each server in an edge server 224 location that hosts compute instances can utilize at least two tunnels: one for control plane traffic (e.g., Constrained Application Protocol (CoAP) traffic) and one for encapsulated data plane traffic. A connectivity manager (not shown) within the cloud provider network 203 manages the cloud provider network-side lifecycle of these tunnels and their components, for example, by provisioning them automatically when needed and maintaining them in a healthy operating state. In some embodiments, a direct connection between an edge server 224 location and the cloud provider network 203 can be used for control and data plane communications. As compared to a VPN through other networks, the direct connection can provide constant bandwidth and more consistent network performance because of its relatively fixed and stable network path.
A control plane (CP) proxy 245 can be provisioned in the cloud provider network 203 to represent particular host(s) in an edge location. CP proxies 245 are intermediaries between the control plane 218 in the cloud provider network 203 and control plane targets in the control plane 236 of edge server 224. That is, CP proxies 245 provide infrastructure for tunneling management API traffic destined for edge servers out of the region substrate and to the edge server 224. For example, a virtualized computing service of the cloud provider network 203 can issue a command to a VMM of a server of an edge server 224 to launch a compute instance. A CP proxy 245 maintains a tunnel (e.g., a VPN) to a local network manager 242 of the edge server 224. The software implemented within the CP proxies 245 ensures that only well-formed API traffic leaves from and returns to the substrate. CP proxies 245 provide a mechanism to expose remote servers on the cloud provider substrate while still protecting substrate security materials (e.g., encryption keys, security tokens) from leaving the cloud provider network 203. The one-way control plane traffic tunnel imposed by the CP proxies 245 also prevents any (potentially compromised) devices from making calls back to the substrate. CP proxies 245 may be instantiated one-for-one with servers at an edge server 224 or may be able to manage control plane traffic for multiple servers in the same edge server.
A data plane (DP) proxy 248 can also be provisioned in the cloud provider network 203 to represent particular server(s) in an edge server 224. The DP proxy 248 acts as a shadow or anchor of the server(s) and can be used by services within the cloud provider network 203 to monitor the health of the host (including its availability, used/free compute and capacity, used/free storage and capacity, and network bandwidth usage/availability). The DP proxy 248 also allows isolated virtual networks to span edge servers 224 and the cloud provider network 203 by acting as a proxy for server(s) in the cloud provider network 203. Each DP proxy 248 can be implemented as a packet-forwarding compute instance or container. As illustrated, each DP proxy 248 can maintain a VPN tunnel with a local network manager 242 that manages traffic to the server(s) that the DP proxy 248 represents. This tunnel can be used to send data plane traffic between the edge server(s) and the cloud provider network 203. Data plane traffic flowing between an edge server 224 and the cloud provider network 203 can be passed through DP proxies 248 associated with that edge server 224. For data plane traffic flowing from an edge server 224 to the cloud provider network 203, DP proxies 248 can receive encapsulated data plane traffic, validate it for correctness, and allow it to enter into the cloud provider network 203. DP proxies 248 can forward encapsulated traffic from the cloud provider network 203 directly to an edge server 224.
Local network manager(s) 242 can provide secure network connectivity with the proxies 245, 248 established in the cloud provider network 203. After connectivity has been established between the local network manager(s) 242 and the proxies 245, 248, customers may issue commands via the interface 206 to instantiate compute instances (and/or perform other operations using compute instances) using edge server resources in a manner analogous to the way in which such commands would be issued with respect to compute instances hosted within the cloud provider network 203. From the perspective of the customer, the customer can now seamlessly use local resources within an edge server 224 (as well as resources located in the cloud provider network 203, if desired). The compute instances set up on a server at an edge server 224 may communicate both with electronic devices located in the same network, as well as with other resources that are set up in the cloud provider network 203, as desired. A local gateway 251 can be implemented to provide network connectivity between an edge server 224 and a network associated with the extension.
There may be circumstances that necessitate the transfer of data between the object storage service and an edge server 224. For example, the object storage service may store machine images used to launch VMs, as well as snapshots representing point-in-time backups of volumes. The object gateway can be provided on an edge server or a specialized storage device, and provide customers with configurable, per-bucket caching of object storage bucket contents in their edge server 224 to minimize the impact of edge server-region latency on the customer's workloads. The object gateway can also temporarily store snapshot data from snapshots of volumes in the edge server 224 and then sync with the object servers in the region when possible. The object gateway can also store machine images that the customer designates for use within the edge server 224 or on the customer's premises. In some implementations, the data within the edge server 224 may be encrypted with a unique key, and the cloud provider can limit keys from being shared from the region to the edge server 224 for security reasons. Accordingly, data exchanged between the object store servers and the object gateway may utilize encryption, decryption, and/or re-encryption in order to preserve security boundaries with respect to encryption keys or other sensitive data. The transformation intermediary can perform these operations, and an edge server bucket can be created (on the object store servers) to store snapshot data and machine image data using the edge server encryption key.
In the manner described above, an edge server 224 forms an edge location, in that it provides the resources and services of the cloud provider network 203 outside of a traditional cloud provider data center and closer to customer devices. An edge location, as referred to herein, can be structured in several ways. In some implementations, an edge location can be an extension of the cloud provider network substrate including a limited quantity of capacity provided outside of an availability zone (e.g., in a small data center or other facility of the cloud provider that is located close to a customer workload and that may be distant from any availability zones). Such edge locations may be referred to as “local zones,” “edge zones,” or “distributed cloud edge zones” (due to being near to customer workloads at the “edge” of the network). An edge zone may be connected in various ways to a publicly accessible network such as the Internet, for example directly, via another network, or via a private connection to a region. Although typically an edge zone would have more limited capacity than a region, in some cases an edge zone may have substantial capacity, for example thousands of racks or more.
In some implementations, an edge location may be an extension of the cloud provider network substrate formed by one or more servers located on-premise in a customer or partner facility, wherein such server(s) communicate over a network (e.g., a publicly-accessible network such as the Internet) with a nearby availability zone or region of the cloud provider network. This type of substrate extension located outside of cloud provider network data centers can be referred to as an “outpost” of the cloud provider network. Some outposts may be integrated into communications networks, for example as a multi-access edge computing (MEC) site having physical infrastructure spread across telecommunication data centers, telecommunication aggregation sites, and/or telecommunication base stations within the telecommunication network. In the on-premise example, the limited capacity of the outpost may be available for use only by the customer who owns the premises (and any other accounts allowed by the customer). In the telecommunications example, the limited capacity of the outpost may be shared amongst a number of applications (e.g., games, virtual reality applications, healthcare applications) that send data to users of the telecommunications network.
An edge location can include data plane capacity controlled at least partly by a control plane of a nearby availability zone of the provider network. As such, an availability zone group can include a “parent” availability zone and any “child” edge locations homed to (e.g., controlled at least partly by the control plane of) the parent availability zone. Certain limited control plane functionality (e.g., features that require low latency communication with customer resources, and/or features that enable the edge location to continue functioning when disconnected from the parent availability zone) may also be present in some edge locations. Thus, in the above examples, an edge location refers to an extension of at least data plane capacity that is positioned at the edge of the cloud provider network, close to customer devices and/or workloads.
In the example of
In 5G wireless network development efforts, edge locations may be considered a possible implementation of MEC. Such edge locations can be connected to various points within a 5G network that provide a breakout for data traffic as part of the UPF. Older wireless networks can incorporate edge locations as well. In 3G wireless networks, for example, edge locations can be connected to the packet-switched network portion of a communication network 100, such as to a Serving General Packet Radio Services Support Node (SGSN) or to a Gateway General Packet Radio Services Support Node (GGSN). In 4G wireless networks, edge locations can be connected to a Serving Gateway (SGW) or Packet Data Network Gateway (PGW) as part of the core network or evolved packet core (EPC). In some embodiments, traffic between an edge server 224 and the cloud provider network 203 can be broken out of the communication network 100 without routing through the core network.
In some embodiments, edge servers 224 can be connected to more than one communication network associated with respective customers. For example, when two communication networks of respective customers share or route traffic through a common point, an edge server 224 can be connected to both networks. For example, each customer can assign some portion of its network address space to the edge server 224, and the edge server 224 can include a router or gateway 251 that can distinguish traffic exchanged with each of the communication networks 100. For example, traffic destined for the edge server 224 from one network might have a different destination IP address, source IP address, and/or virtual local area network (VLAN) tag than traffic received from another network. Traffic originating from the edge server 224 to a destination on one of the networks can be similarly encapsulated to have the appropriate VLAN tag, source IP address (e.g., from the pool allocated to the edge server 224 from the destination network address space) and destination IP address.
The network service/function catalog 268 is also referred to as the NF Repository Function (NRF). In a Service Based Architecture (SBA) 5G network, the control plane functionality and common data repositories can be delivered by way of a set of interconnected network functions built using a microservices architecture. The NRF can maintain a record of available NF instantiations and their supported services, allowing other NF instantiations to subscribe and be notified of registrations from NF instantiations of a given type. The NRF thus can support service discovery by receipt of discovery requests from NF instantiations, and details which NF instantiations support specific services. The network function orchestrator 270 can perform NF lifecycle management including instantiation, scale-out/in, performance measurements, event correlation, and termination. The network function orchestrator 270 can also onboard new NFs, manage migration to new or updated versions of existing NFs, identify NF sets that are suitable for a particular network slice or larger network, and orchestrate NFs across different computing devices and sites that make up the radio-based network 103 (
The control plane cell 257 may be in communication with one or more cell sites 272 by way of a RAN interface 273, one or more customer local data centers 274, one or more local zones 276, and one or more regional zones 278. The RAN interface 273 may include an application programming interface (API) that facilitates provisioning or releasing capacity in a RAN operated by a third-party communication service provider at a cell site 272. The cell sites 272 include computing hardware 280 that executes one or more distributed unit (DU) network functions 282. The customer local data centers 274 include computing hardware 283 (e.g., an edge server 224) that execute one or more central unit (CU) network functions 284, a network controller 285, a UPF 286, one or more edge applications 287 corresponding to customer workloads, and/or other components.
The local zones 276, which may be in a data center operated by a cloud service provider, may execute one or more core network functions 288, such as an AMF, an SMF, a network exposure function (NEF) that securely exposes the services and capabilities of other network functions, a unified data management (UDM) function that manages subscriber data for authorization, registration, and mobility management. The local zones 276 may also execute a UPF 286, a service for metric processing 289, and one or more edge applications 287. In some implementations, such core network functions may be run on an edge server 224 which is more local to the edge server 224 running the DU/CU network functions 284, for example the same edge server 224 or another edge server 224 collocated at the same facility.
The regional zones 278, which may be in a data center operated by a cloud service provider, may execute one or more core network functions 288; a UPF 286; an operations support system (OSS) 290 that supports network management systems, service delivery, service fulfillment, service assurance, and customer care; an internet protocol multimedia subsystem (IMS) 291; a business support system (BSS) 292 that supports product management, customer management, revenue management, and/or order management; one or more portal applications 293, and/or other components.
In this example, the communication network 100 employs a cellular architecture to reduce the blast radius of individual components. At the top level, the control plane is in multiple control plane cells 257 to prevent an individual control plane failure from impacting all deployments.
Within each control plane cell 257, multiple redundant stacks can be provided with the control plane shifting traffic to secondary stacks as needed. For example, a cell site 272 may be configured to utilize a nearby local zone 276 as its default core network. In the event that the local zone 276 experiences an outage, the control plane can redirect the cell site 272 to use the backup stack in the regional zone 278. Traffic that would normally be routed from the internet to the local zone 276 can be shifted to endpoints for the regional zones 278. Each control plane cell 257 can implement a “stateless” architecture that shares a common session database across multiple sites (such as across availability zones or edge sites).
Each region 306 can include two or more availability zones (AZs) connected to one another via a private high-speed network such as, for example, a fiber communication connection. An availability zone refers to an isolated failure domain including one or more data center facilities with separate power, separate networking, and separate cooling relative to other availability zones. A cloud provider may strive to position availability zones within a region 306 far enough away from one another such that a natural disaster, widespread power outage, or other unexpected event does not take more than one availability zone offline at the same time. Customers can connect to resources within availability zones of the cloud provider network 203 via a publicly accessible network (e.g., the Internet, a cellular communication network, a communication service provider network). Transit Centers (TC) are the primary backbone locations linking customers to the cloud provider network 203 and may be co-located at other network provider facilities (e.g., Internet service providers, telecommunications providers). Each region 306 can operate two or more TCs for redundancy. Regions 306 are connected to a global network which includes private networking infrastructure (e.g., fiber connections controlled by the cloud service provider) connecting each region 306 to at least one other region. The cloud provider network 203 may deliver content from points of presence (PoPs) outside of, but networked with, these regions 306 by way of edge locations 303 and regional edge cache servers. This compartmentalization and geographic distribution of computing hardware enables the cloud provider network 203 to provide low-latency resource access to customers on a global scale with a high degree of fault tolerance and stability.
In comparison to the number of regional data centers or availability zones, the number of edge locations 303 can be much higher. Such widespread deployment of edge locations 303 can provide low-latency connectivity to the cloud for a much larger group of end user devices (in comparison to those that happen to be very close to a regional data center). In some embodiments, each edge location 303 can be peered to some portion of the cloud provider network 203 (e.g., a parent availability zone or regional data center). Such peering allows the various components operating in the cloud provider network 203 to manage the compute resources of the edge location 303. In some cases, multiple edge locations 303 may be sited or installed in the same facility (e.g., separate racks of computer systems) and managed by different zones or data centers 309 to provide additional redundancy. Note that although edge locations 303 are typically depicted herein as within a communication service provider network or a radio-based network 103 (
As indicated herein, a cloud provider network 203 can be formed as a number of regions 306, where each region 306 represents a geographical area in which the cloud provider clusters data centers 309. Each region 306 can further include multiple (e.g., two or more) availability zones (AZs) connected to one another via a private high-speed network, for example, a fiber communication connection. An AZ may provide an isolated failure domain including one or more data center facilities with separate power, separate networking, and separate cooling from those in another AZ. Preferably, AZs within a region 306 are positioned far enough away from one another such that a same natural disaster (or other failure-inducing event) should not affect or take more than one AZ offline at the same time. Customers can connect to an AZ of the cloud provider network 203 via a publicly accessible network (e.g., the Internet, a cellular communication network).
The parenting of a given edge location 303 to an AZ or region 306 of the cloud provider network 203 can be based on a number of factors. One such parenting factor is data sovereignty. For example, to keep data originating from a communication network in one country within that country, the edge locations 303 deployed within that communication network can be parented to AZs or regions 306 within that country. Another factor is availability of services. For example, some edge locations 303 may have different hardware configurations such as the presence or absence of components such as local non-volatile storage for customer data (e.g., solid state drives), graphics accelerators, etc. Some AZs or regions 306 might lack the services to exploit those additional resources, thus, an edge location could be parented to an AZ or region 306 that supports the use of those resources. Another factor is the latency between the AZ or region 306 and the edge location 303. While the deployment of edge locations 303 within a communication network has latency benefits, those benefits might be negated by parenting an edge location 303 to a distant AZ or region 306 that introduces significant latency for the edge location 303 to region traffic. Accordingly, edge locations 303 are often parented to nearby (in terms of network latency) AZs or regions 306.
With reference to
The computing environment 403 may comprise, for example, a server computer or any other system providing computing capacity. Alternatively, the computing environment 403 may employ a plurality of computing devices that may be arranged, for example, in one or more server banks or computer banks or other arrangements. Such computing devices may be located in a single installation or may be distributed among many different geographical locations. For example, the computing environment 403 may include a plurality of computing devices that together may comprise a hosted computing resource, a grid computing resource, and/or any other distributed computing arrangement. In some cases, the computing environment 403 may correspond to an elastic computing resource where the allotted capacity of processing, network, storage, or other computing-related resources may vary over time. For example, the computing environment 403 may correspond to a cloud provider network 203, where customers are billed according to their computing resource usage based on a utility computing model.
In some embodiments, the computing environment 403 may correspond to a virtualized private network within a physical network comprising virtual machine instances executed on physical computing hardware, e.g., by way of a hypervisor. The virtual machine instances and any containers running on these instances may be given network connectivity by way of virtualized network components enabled by physical network components, such as routers and switches.
Various applications and/or other functionality may be executed in the computing environment 403 according to various embodiments. Also, various data is stored in a data store 415 that is accessible to the computing environment 403. The data store 415 may be representative of a plurality of data stores 415 as can be appreciated. The data stored in the data store 415, for example, is associated with the operation of the various applications and/or functional entities described below.
The computing environment 403 as part of a cloud provider network offering utility computing services includes computing devices 418 and other types of computing devices. The computing devices 418 may correspond to different types of computing devices 418 and may have different computing architectures. The computing architectures may differ by utilizing processors having different architectures, such as x86, x86_64, ARM, Scalable Processor Architecture (SPARC), PowerPC, and so on. For example, some computing devices 418 may have x86 processors, while other computing devices 418 may have ARM processors. The computing devices 418 may differ also in hardware resources available, such as local storage, graphics processing units (GPUs), machine learning extensions, and other characteristics.
The computing devices 418 may have various forms of allocated computing capacity 421, which may include virtual machine (VM) instances, containers, serverless functions, and so forth. The VM instances may be instantiated from a VM image. To this end, customers may specify that a virtual machine instance should be launched in a particular type of computing device 418 as opposed to other types of computing devices 418. In various examples, one VM instance may be executed singularly on a particular computing device 418, or a plurality of VM instances may be executed on a particular computing device 418. Also, a particular computing device 418 may execute different types of VM instances, which may offer different quantities of resources available via the computing device 418. For example, some types of VM instances may offer more memory and processing capability than other types of VM instances.
The components executed on the computing environment 403, for example, include one or more network functions 422, a network health service 423, a state synchronization service 424, a network controller 425, one or more service function forwarders 426, one or more AI assistant services 427, and other applications, services, processes, systems, engines, or functionality not discussed in detail herein.
The network functions 422 may correspond to various service functions implemented in a radio-based network 103, such as a 4G, 5G, or 6G network. In various examples, the network functions 422 may include one or more DU network functions 282 (
The network health service 423 may be executed to monitor the health and performance of the radio-based network 103. This may include receiving and analyzing metrics related to network links, networking components, and network functions 422. As such, the network health service 423 may be able to detect overload conditions and predict failures or service degradations. The network health service 423 may also detect when a network function 422 has become unresponsive, e.g., due to a failure of the network function 422 to respond to a health status check from the network health service 423, or when the network function 422 has failed to transmit an expected periodic health status update to the network health service 423. In some embodiments, the network health service 423 may utilize a hierarchy of health monitoring agents deployed in various components of the radio-based network 103, which may collect and aggregate health information for further processing by the network health service 423. In one implementation, the network health service 423 may utilize simple network management protocol (SNMP) to collect information from various components of the radio-based network 103. Ultimately, as a result of processing the collected health information, the network health service 423 may generate adverse health events for reporting to the network controller 425.
The state synchronization service 424 is executed to maintain synchronization of network function state in a radio-based network 103. In this way, the state synchronization service 424 may exchange state synchronization messages with various RAN-enabled edge servers in a radio-based network 103, aggregate the messages, and develop a canonical copy of the DU state, CU state, and/or core state. Corresponding state synchronization agents may also be executed on the RAN-enabled edge servers to facilitate synchronization. In some cases, a centralized state synchronization service 424 is absent and replaced with the state synchronization agents on the RAN-enabled edge servers. By synchronizing or replicating network function state, a particular instantiation of a network function 422 can be swapped out for another instantiation of the network function 422 and/or stateful load balancing may be implemented.
The network controller 425 may be executed to manage a radio-based network 103. In various embodiments, the network controller 425 may manage self-healing activities and resilience-building activities. The network controller 425 may also manage service function chaining and initiate modifications as appropriate to service function chains, e.g., to implement load balancing or service restoration activities.
The service function forwarders 426 are configured to route and forward network traffic (e.g., control plane network traffic) between a source (e.g., a UE or a radio unit) and other network functions 422 in a radio-based network 103. The operation of the service function forwarders 426 is configured to implement one or more service function chains, or chains of network functions 422 that are to process the network traffic. The service function forwarders 426 may determine a next-hop network function 422 based at least in part on a network service header added to the network traffic through encapsulation. Additional components involved in a system that implements service function chaining are described in U.S. patent application Ser. No. 18/066,072, entitled “SERVICE FUNCTION CHAINING IN RADIO-BASED NETWORKS, and filed on Dec. 14, 2022, which is incorporated herein by reference in its entirety.
The AI assistant service 427 may be executed to assist in intent-driven deployment, configuration, and management of radio-based networks 103. In various embodiments, the AI assistant service 427 may, in response to a natural language prompt, generate templates and/or other configuration data for deploying network functions 422 and resources in the cloud provider network 203 that would implement the network functions 422. In various embodiments, the AI assistant service 427 may, in response to a natural language prompt, provide observational information about the radio-based network 103 such as status, health, bottlenecks, security issues, and so forth.
The data stored in the data store 415 includes, for example, one or more network plans 439, one or more cellular topologies 442, one or more spectrum assignments 445, device data 448, one or more RBN health metrics 451, customer billing data 454, network function topology and orchestration templates 463, one or more network function workloads 466, one or more service function chains 469, and potentially other data.
The network plan 439 is a specification of a radio-based network 103 to be deployed for a customer. For example, a network plan 439 may include premises locations or geographic areas to be covered, a number of cells, device identification information and permissions, a desired maximum network latency, a desired bandwidth or network throughput for one or more classes of devices, one or more quality of service parameters for applications or services, one or more routes to be covered by the RBN 103, a schedule of coverage for the RBN 103 or for portions of the RBN 103, a periodic schedule of coverage for the RBN 103 or for portions of the RBN 103, a start time for the RBN 103 or for portions of the RBN 103, an end time for the RBN 103 or for portions of the RBN 103, and/or other parameters that can be used to create a radio-based network 103. A customer may manually specify one or more of these parameters via a user interface. One or more of the parameters may be prepopulated as default parameters. In some cases, a network plan 439 may be generated for a customer based at least in part on automated site surveys using unmanned aerial vehicles. Values of the parameters that define the network plan 439 may be used as a basis for a cloud service provider billing the customer under a utility computing model. For example, the customer may be billed a higher amount for lower latency targets and/or higher bandwidth targets in a service-level agreement (SLA), and the customer can be charged on a per-device basis, a per-cell basis, based on a geographic area served, based on spectrum availability, etc. In some cases, the network plan 439 may incorporate thresholds and reference parameters determined at least in part on an automated probe of an existing private network of a customer.
The cellular topology 442 includes an arrangement of a plurality of cells for a customer that takes into account reuse of frequency spectrum where possible given the location of the cells. The cellular topology 442 may be automatically generated given a site survey. In some cases, the number of cells in the cellular topology 442 may be automatically determined based on a desired geographic area to be covered, availability of backhaul connectivity at various sites, signal propagation, available frequency spectrum, and/or on other parameters. For radio-based networks 103, the cellular topology 442 may be developed to cover one or more buildings in an organizational campus, one or more schools in a school district, one or more buildings in a university or university system, and other areas.
The spectrum assignments 445 include frequency spectrum that is available to be allocated for radio-based networks 103 as well as frequency spectrum that is currently allocated to radio-based networks 103. The frequency spectrum may include spectrum that is publicly accessible without restriction, spectrum that is individually owned or leased by customers, spectrum that is owned or leased by the provider, spectrum that is free to use but requires reservation, and so on.
The device data 448 corresponds to data describing wireless devices 106 that are permitted to connect to the radio-based network 103. This device data 448 includes corresponding users, account information, billing information, data plans, permitted applications or uses, an indication of whether the wireless device 106 is mobile or fixed, a location, a current cell, a network address, device identifiers (e.g., International Mobile Equipment Identity (IMEI) number, Equipment Serial Number (ESN), Media Access Control (MAC) address, Subscriber Identity Module (SIM) number, etc.), and so on.
The RBN health metrics 451 include various metrics or statistics that indicate the performance or health of the radio-based network 103. Such RBN health metrics 451 may include bandwidth metrics, dropped packet metrics, signal strength metrics, latency metrics, and so on. The RBN health metrics 451 may be aggregated on a per-device basis, a per-cell basis, a per-customer basis, etc.
The customer billing data 454 specifies charges that the customer is to incur for the operation of the radio-based network 103 for the customer by the provider. The charges may include fixed costs based upon equipment deployed to the customer and/or usage costs based upon utilization as determined by usage metrics that are tracked. In some cases, the customer may purchase the equipment up-front and may be charged only for bandwidth or backend network costs. In other cases, the customer may incur no up-front costs and may be charged purely based on utilization. With the equipment being provided to the customer based on a utility computing model, the cloud service provider may choose an optimal configuration of equipment in order to meet customer target performance metrics while avoiding overprovisioning of unnecessary hardware.
The network function topology and orchestration (NFTO) templates 463 correspond to templates that configure the deployment and/or operation of various network functions 422 for the radio-based network 103. In various embodiments, the network functions 422 may be deployed in VM instances or containers located in computing devices 418 that are at cell sites, at customer aggregation sites, or in data centers remotely located from the customer. In some examples, the network functions 422 represented in the NFTO templates 463 may be implemented in cloud infrastructure that spans multiple regions 306 and edge locations 303, such as local zones. In some embodiments, the NFTO templates 463 employ a grammar based upon Yet Another Markup Language (YAML), Extensible Markup Language (XML), JAVASCRIPT Object Notation (JSON), and/or Topology and Orchestration Specification for Cloud Applications (TOSCA).
TOSCA is an open standard from the Organization for the Advancement of Structured Information Standards (OASIS) that provides a structured way to define and manage cloud applications and their infrastructure. TOSCA offers templates for describing the components, relationships, properties, and deployment plans of cloud-based services. TOSCA simplifies the deployment, scaling, and orchestration of complex cloud applications by ensuring interoperability and portability across various cloud environments and automation tools. TOSCA serves as a foundation for creating, sharing, and automating cloud application blueprints.
In some embodiments, the NFTO templates 462 may use a grammar that is specific to, or contains enhancements directed to, a particular infrastructure environment. To illustrate, a version of TOSCA that is customized for elements of a particular cloud provider network 203 may be used. For example, a cloud provider network 203 may have proprietary infrastructures, services, types of resources, APIs, and so on. A version of TOSCA with modifications to describe these provider-specific elements may be employed in the NFTO templates 463.
The network function workloads 466 correspond to machine images, containers, or functions to be launched in the allocated computing capacity 421 to perform one or more of the network functions 422.
The service function chains 469 correspond to an ordered sequence of network functions 422 that are to process network traffic (e.g., control plane traffic and/or data plane traffic) for a particular source (e.g., a UE or a radio unit). The service function chains 469 may be implemented by the network controller 425 configuring one or more service function forwarders 426 to forward traffic associated with particular service function chains 469 to a next destination in the service function chain 469.
The client device 406 is representative of a plurality of client devices 406 that may be coupled to the network 412. The client device 406 may comprise, for example, a processor-based system such as a computer system. Such a computer system may be embodied in the form of a desktop computer, a laptop computer, personal digital assistants, cellular telephones, smartphones, set-top boxes, music players, web pads, tablet computer systems, game consoles, electronic book readers, smartwatches, head mounted displays, voice interface devices, or other devices. The client device 406 may include a display comprising, for example, one or more devices such as liquid crystal display (LCD) displays, gas plasma-based flat panel displays, organic light emitting diode (OLED) displays, electrophoretic ink (E ink) displays, LCD projectors, or other types of display devices, etc.
The client device 406 may be configured to execute various applications such as a client application 436 and/or other applications. The client application 436 may be executed in a client device 406, for example, to access network content served up by the computing environment 403 and/or other servers, thereby rendering a user interface on the display. To this end, the client application 436 may comprise, for example, a browser, a dedicated application, etc., and the user interface may comprise a network page, an application screen, etc. The client device 406 may be configured to execute applications beyond the client application 436 such as, for example, email applications, social networking applications, word processors, spreadsheets, and/or other applications.
Turning now to
The generative AI system 503 implements the generative AI functionality of the AI assistant service 427. The generative AI system 503 may employ machine learning and artificial intelligence in order to generate the resulting work product. Machine learning refers to a discipline by which computer systems can be trained to recognize patterns through repeated exposure to training data. In unsupervised learning, a self-organizing algorithm learns previously unknown patterns in a data set without any provided labels. In supervised learning, this training data includes an input that is labeled (either automatically, or by a human annotator) with a “ground truth” of the output that corresponds to the input. A portion of the training data set is typically held out of the training process for purposes of evaluating performance of the trained model. The learned parameters of the model can be considered as an encoding of meaningful patterns in the training data, such that the trained model can then recognize these same patterns in new data. The use of a trained model in production is often referred to as “inference,” during which the model receives new data that was not in its training data set and provides an output based on its learned parameters. The training and validation process may be repeated periodically or intermittently, by using new training data to refine previously learned parameters of a production model and deploy a new production model for inference, in order to mitigate degradation of model accuracy over time. In contrast to machine learning (ML), artificial intelligence (AI) refers to a human perception of a computer system as possessing a capability typically considered to require intelligence.
The generative AI system 503 may include, for example, a language model 515, a code generator 518, an optimization recommendation engine 519, a forecasting engine 520, a visualization module 521, and/or other components. A language model is a type of AI model that is trained on textual data to generate coherent and contextually relevant text. A “large” language model (LLM) refers to a language model that has been trained on an extensive dataset and has a high number of parameters, enabling them to capture complex language patterns and perform a wider range of tasks. LLMs are designed to handle a wide range of natural language processing tasks, such as text completion, translation, summarization, and even conversation. The specific parameter count required for a model to be considered an LLM can vary depending on context and technological advancements. However, traditionally, large language models have millions to billions of parameters.
The language model 515 may comprise a general purpose LLM that may be used for conversations with the customer. The language model 515 in various embodiments may comprise a commercially available LLM such as AMAZON TITAN, ANTHROPIC CLAUDE, META LLAMA 2, and/or other models.
The code generator 518 may be a commercially available LLM that is trained specifically on code, such as AMAZON CODEWHISPERER or STARCODER, so that the code generator 518 can generate code. Some implementations may use only one language model 515 instead of a separate code generator 518, since coding LLMs can also be capable of natural language conversations and general purpose LLMs have some coding abilities, while other implementations may use both a language model 515 and the code generator 518.
The code generator 518 may be configured to generate NFTO templates 463 and/or other data for configuration and deployment of the radio-based network 103. In this regard, the code generator 518 may be specially trained by retrieval augmented generation (RAG) and/or fine-tuning to generate the data according to the grammar of one or more NFTO templates 463. The code generator 518 may also be trained to generate TOSCA in general, but specifically the TOSCA employed to describe network functions 422 and cloud network resources in the NFTO templates 463. That is to say, the code generator 518 may be trained on a deployment configuration grammar that specific to, or contains enhancements directed to, a particular infrastructure environment.
In some embodiments, the code generator 518 may be trained on customer data, such as NFTO templates 463, RBN health metrics 451, and so on, in order to generate the NFTO templates 463. Modifications by customers to the output data may then be used as feedback to the system for further training. In some embodiments, the code generator 518 and/or the language model 515 may include suppression to make sure that outputs similar to the training data are not shown to users. In some embodiments, the code generator 518 and/or the language model 515 may include reference tracking to show outputs that are similar to the training data but with provenance information about the source of the similar training data.
RAG retrieves data from outside the language model and augments the prompts by adding the relevant retrieved data in context. RAG can help reduce model hallucinations by guiding the output to be similar to or based on the retrieved information.
Fine-tuning is a machine learning technique used to improve the performance of a pre-trained model on a specific task. Fine-tuning involves taking a neural network model that has already been trained on a large dataset for a general task, such as language understanding or image recognition, and then further training it on a smaller dataset that is specific to the task at hand. Fine-tuning can include both the deployment configuration grammar as well as question and answer pairs about deploying radio-based networks 103 at least partly implemented on a cloud provider network 203. Such question and answer pairs may help teach the model the intent. Also, if the model were not pretrained on a type of language used in the deployment configurations, such as TOSCA, the fine-tuning data would include that language.
A language model context window refers to the range of text that a language model considers when processing or generating a specific word or token within a given sequence of text. The context window represents the surrounding words or tokens that the model uses to understand the context of the current word or token. The size of this context window is determined by the architecture of the language model. The context window is helpful for language models to generate coherent and contextually accurate responses in various natural language processing tasks, such as text completion, translation, question answering, and more. It enables the model to consider the broader context and semantic meaning of words, ensuring that its output aligns with the input text's intended meaning. The size of this window varies depending on the model but typically includes both preceding and following words in a text sequence. Larger windows may accommodate additional information, such as relevant information added via RAG. However, larger context windows can be computationally expensive, so the size of the context window may vary depending on the specific language model architecture and resource constraints.
The optimization recommendation engine 519 may be a machine learning model trained to optimize a deployment configuration based upon various factors, such as performance, reliability, cost, and so on. The optimization recommendation engine 519 may be trained based upon historical RBN health metrics 451 and cost data associated with radio-based networks 103 having particular deployment configurations. The training may be customer-specific based on the customer's own historical data, or with customer consent such as an “opt-in” from the customer, across the historical data of multiple customers. The optimization recommendation engine 519 may be used as part of pre-validation for an automatically generated initial deployment configuration, and/or as part of optimizing existing deployments of radio-based networks 103 using configuration modifications.
The forecasting engine 520 may be a machine learning model trained to predict future observability metrics based upon historical data. The training may be customer-specific based on the customer's own historical data, or with customer consent such as an “opt-in” from customers, across the historical data of multiple customers. For example, an RBN health metric 451 that is currently in an acceptable range may be trending in a direction that the forecasting engine 520 can predict will be unacceptable. Accordingly, proactive configuration changes can be made in advance of failures or other service issues.
The visualization module 521 may be used for generation of visualizations relating to observability of a radio-based network 103. The visualizations may include network functions, cells, and/or other components of the radio-based network 103 as well as components from the infrastructure layer such as compute instances, container clusters, etc. Such visualizations may include a topology graph based upon logical or physical topology. For example, the topology graph may show logical or physical connections among a plurality of infrastructure components. The visualizations may comprise a control plane representation and/or a data plate representation.
The prompt engineering system 506 may store the session state in the session state store 512. The session state store 512 may store the context of the language model 515 and/or code generator 518 as well as other information retrieved from customer account data for the observability and management scenarios.
The prompt engineering system 506 may include, for example, a sanitizing agent 522, an API caller 524, a prompt enricher 527, and/or other components. The API caller 524 would obtain information about the network functions 422 of a radio-based network 103 by calling APIs of other services. In this way, the API caller 524 can get information about the resources that are running in the customer's account, as well as call a metrics service (e.g., the network health service 423) to get data about performance of those resources. For example, the API caller 524 may call various storage services, compute services, content delivery services, analytics services, and so forth, in the cloud provider network 203 to obtain information about the customer's radio-based network 103.
The sanitizing agent 522 validates inputs and outputs to make sure they are appropriate, or that inputs are related to the purpose of this AI and not something unrelated, and that the outputs have no toxicity or bias.
The prompt enricher 527 includes the various agents that the AI assistant service 427 would use on the backend in different use cases to guide the language model 515 to the desired output in a more specific way than the customer's general input/question. For example, one agent may add prompts to configure the model to generate templates, while other agents may give different prompts to help in the observability and management scenarios. In RAG scenarios, an agent may get information on the grammar used in the NFTO templates 463 and add the information to the prompt, or the agent may get information on the actual running resources of the customer (e.g., through a data representation or by way of various API calls) and provide this information to the language model 515. These agents may also condense large amounts of information into an appropriate format to fit in the context window of the language model 515.
The user interfaces 509 can include a canvas user interface 530, a chat user interface 533, and/or other user interfaces. The canvas user interface 530 presents a visualization of network function 422 in a radio-based network 103 via an architecture diagram using icons for different resources. In a first example, the diagram may represent a proposed architecture of their network function workloads that will be run based off of the generated templates. In a second example, the diagram may correspond to a visualization of their existing radio-based network 103 architecture annotated with performance issues and/or suggested changes based on observations of the radio-based network 103 by the AI assistant service 427.
The chat user interface 533 is where the customer would provide their inputs to the AI assistant service 427 and receive its outputs. Examples of outputs may include, for example, conversational text, configuration files such as NFTO templates 463, or other code-containing files that can be opened in the customer's development environment of choice. As non-limiting examples, a user may express their requests with intents like: “deploy a network function in high availability mode,” “consider data plane development kit features in the deployment,” “consider efficient latency,” or throughput nominal ranges to take into account during the deployments. With respect to observability, a customer may, for example, query sanity checks of the security rules, ask to list the API calls or sequences and drill down on some of them, ask for an analysis of the API call outputs, and so on. In some cases, a customer can use recursive prompts to append to NFTO templates 463 and build out portions of the radio-based network 103 iteratively.
Referring next to
Beginning with box 603, the AI assistant service 427 teaches an AI language model, such as the code generator 518 or the language model 515, to recognize deployment configuration grammar for deploying radio-based networks 103. In some scenarios, the deployment configuration grammar may be used for radio-based networks 103 deployed at least partly on infrastructure of a cloud provider network 203. In other scenarios, the deployment configuration grammar may be used for radio-based networks 103 deployed on infrastructure of a communication service provider. The grammar may be specific to, or contain enhancements directed to, a particular infrastructure environment, such as a particular cloud provider network 203.
The AI language model may be pretrained to recognize and/or generate code in a particular language in some implementations. In some scenarios, the AI language model may be taught only using deployment configurations of a specific customer so as to protect proprietary configurations of one customer from being used by other customers of the cloud provider network 203. In some implementations, the AI language model may be taught based on receiving a training prompt that defines a scope of expertise for the AI language model to include the infrastructure of the cloud provider network 203 and the types of resources therein. Such a training prompt may also define the scope of expertise to include the deployment configuration grammar, e.g., as expressed in documentation for the deployment configurations. Such a training prompt may direct the AI language model to respond only in the provided context. The teaching may include RAG, training or fine-tuning, or a combination of both. Fine-tuning may encompass parameter-efficient fine-tuning, such as using Low-Rank Adaptation of Large Language Models (LoRA).
The AI language model may also be taught to understand specific network configurations that correspond to expressed intents. For example, the AI language model may be taught to understand high availability network configurations, low latency network configurations, network configurations that provide specific throughput, and so on, in order to be able to generate deployment configurations that align with expressed intents.
In box 606, the AI assistant service 427 receives a prompt from the customer (e.g., via the chat UI 533) to generate a deployment configuration for a network function in a radio-based network 103 of the customer. For example, the deployment configuration may correspond to one or more NFTO templates 463 in YAML and TOSCA. A non-limiting example of a prompt may express the intent to deploy a network function in a highly available way. In box 609, the AI assistant service 427 using the AI language model generates the deployment configuration according to the intent expressed in the prompt. For example, for the prompt to deploy a network function in a highly available way, the deployment configuration may deploy the network function in a plurality of availability zones in the cloud provider network 203 in order to make the network function highly available. In addition, the AI LLM can answer questions about and provide explanations about the service or the templates it generates. For example, a customer may query “Which resources are required to build a high availability UPF, respond with YAML examples of resources,” and the AI language model will respond “NodeGroups in at least two availability zones” while also providing a sample deployment configuration.
In box 612, the AI assistant service 427 receives a subsequent prompt (e.g., via the chat UI 533) with a specific intent to modify the deployment configuration. For example, the customer may desire to make an existing function highly available, or replicate an existing function in a different region 306. In box 615, the AI assistant service 427 using the AI language model generates a modification to the deployment configuration according to the intent expressed in the subsequent prompt.
In box 618, the AI assistant service 427 receives a manual modification to the deployment configuration from the customer. For example, the customer may manually edit the NFTO template 463. In box 621, the AI assistant service 427 may teach the AI language model based at least in part on the manual modification identified in box 618 and/or the automatically generated modification from box 615. Customer consent may be elicited for use of the customer's data in training the AI language model. For example, the customer may choose to opt-in to sharing the data for training purposes, with informed consent that the model may be used for other customers. Alternatively, the customer's data may be used in training the AI language model for the customer's use only.
In box 624, the AI assistant service 427 may automatically deploy one or more network functions 422 in the radio-based network 103 according to the deployment configuration. For example, the network controller 425 or another orchestration agent may ingest the deployment configuration and instantiate various cloud resources for use by the radio-based network 103 as specified in the configuration. Network function workloads 466 may be executed on the cloud resources. Thereafter, the operation of the portion of the AI assistant service 427 ends.
Referring next to
Beginning with box 630, the AI assistant service 427 teaches an AI language model, such as the code generator 518 or the language model 515, to recognize deployment configuration grammar for deploying radio-based networks 103. In some scenarios, the deployment configuration grammar may be used for radio-based networks 103 deployed at least partly on infrastructure of a cloud provider network 203. In other scenarios, the deployment configuration grammar may be used for radio-based networks 103 deployed on infrastructure of a communication service provider. The grammar may be specific to, or contain enhancements directed to, a particular infrastructure environment, such as a particular cloud provider network 203.
The AI language model may be pretrained to recognize and/or generate code in a particular language in some implementations. In some implementations, the AI language model may be taught based on receiving a training prompt that defines a scope of expertise for the AI language model to include the infrastructure of the cloud provider network 203 and the types of resources therein. Such a training prompt may also define the scope of expertise to include the deployment configuration grammar, e.g., as expressed in documentation for the deployment configurations. Such a training prompt may direct the AI language model to respond only in the provided context. The teaching may include RAG, training or fine-tuning, or a combination of both. Fine-tuning may encompass parameter-efficient fine-tuning, such as using Low-Rank Adaptation of Large Language Models (LoRA).
In box 633, the AI assistant service 427 receives a deployment configuration for a radio-based network 103. For example, the deployment configuration may correspond to one or more NFTO templates 463 in YAML and TOSCA. The deployment configuration may be provided by a customer in advance of a deployment of a radio-based network 103 for the purposes of pre-validation and optimization. Alternatively, the deployment configuration may correspond to one that has already been used to deploy a radio-based network 103 for the purpose of optimizing an existing radio-based network 103. In box 634, the AI assistant service 427 receives a prompt from the customer expressing an intent to modify the deployment configuration. The prompt may relate to a general improvement of the deployment configuration, or the prompt may include a specific intent to improve in terms of cost, performance, connectivity, and so on. In some cases, the prompt from the customer may be omitted, and the AI assistant service 427 may execute with the general intent to improve the deployment configuration.
In box 636, the AI assistant service 427 analyzes the deployment configuration using the AI language model. The analysis may determine various connectivity issues, computing resource issues, cost optimizations, security policy issues (e.g., permissions that not sufficiently restrictive), network reachability issues, and so forth, with respect to the deployment configuration. In some use cases, the AI language model may analyze a HELM chart in the deployment configuration to determine resources used.
In box 639, the AI assistant service 427 using the AI language model generates a modification to the deployment configuration according to the customer intent expressed in the prompt. The modification may resolve a network reachability issue or a connectivity issue in the deployment configuration. For example, connectivity issues may be automatically detected due to configuration settings with regard to virtual private clouds, subnets, security groups, internet gateways, routing tables, transit gateways, and so on. The modification may resolve a security policy issue in the deployment configuration, where the deployment configuration is determined to be not aligned with security best practices. For example, the modification may involve increasing security restrictions or removing unnecessary permissions. The modification may resolve a resource allocation issue in the deployment configuration. For example, the modification may include changing a computing instance type used in the deployment configuration (e.g., changing an instance from one with low computational resources to one with high computational resources). In some cases, the modification may be to reduce cost associated with the radio-based network. For example, the resources specified may be unnecessarily powerful and costly for the current usage. The modification may also involve reallocating a network function to a different location.
In box 642, the AI assistant service 427 presents the proposed modification to the deployment configuration for approval by the customer. For example, the AI assistant service 427 may present metrics, such as improved performance or reduce cost, that are predicted to result from the modified deployment configuration as compared to the previous deployment configuration. In box 645, upon receiving approval from the customer, the AI assistant service 427 initially deploys the radio-based network 103 using the modified deployment configuration. In the case of existing radio-based networks 103, the AI assistant service 427 may reconfigure the existing network according to the modified deployment configuration, including migrating existing network functions 422. Thereafter, the operation of the portion of the AI assistant service 427 ends.
Referring next to
Beginning with box 703, the AI assistant service 427 teaches an AI language model 515 to aggregate and recognize status information and/or configuration information for radio-based networks 103. Such information may include RBN health metrics 451, NFTO templates 463, descriptive information from cloud provider APIs, stored API call information, and/or other information. The teaching may be performed using RAG, training or fine-tuning, or a combination.
In box 706, the AI assistant service 427 receives a prompt from the customer (e.g., via the chat UI 533) to provide a type of status information. For example, the prompt may identify a particular region of the cloud provider network 203 or a particular type of network function 422 as the scope. In another example, the prompt may be to determine one or more cell sites in the radio-based network that are associated with a particular status. The type of status information may be security audit information, network function health information, network function topology information, and so forth. Examples of prompts may include “tell me the status of my network,” “list which network function clusters are not healthy,” “perform a security audit on my network,” “what are the cells with the highest number of active users, return also the location,” etc. In some embodiments, the user interface for providing the prompt may also include other input components (e.g., sliders, checkboxes, radio buttons, etc.) to specify a temperature, an intensity, or another parameter that can affect the processing of the prompt.
In box 709, the AI assistant service 427 interprets the prompt and obtains the status information relevant to answering the prompt. The AI assistant service 427 may generate a suitable query to submit to a database or a service to obtain the status information. The AI assistant service 427 may query various APIs by the API caller 524 to obtain this information. Such APIs may be specific to managing RBNs 103 in the cloud provider network 203, or the APIs may relate to cloud resources generally, such as storage and compute resources. In some cases, the AI assistant service 427 may automatically generate a database query (e.g., in SQL) to query one or more databases to obtain this information. In some cases, the AI assistant service 427 may obtain predictions of future information using the forecasting engine 520.
In box 712, the AI assistant service 427 generates a response to the prompt that includes the type of status information and/or predictions of future status information requested according to the intent expressed in the prompt. In some examples, the status information is presented via the canvas UI 530. In some cases, the status information may indicate a present or predicted issue or problem in the radio-based network 103, such as problematic compute instances, problematic network functions 422, problematic container clusters, and so on. The status information may include information about network functions 422 of the radio-based network 103 as well as information about the operation of components in the underlying infrastructure layer. The AI assistant service 427 may also cause the response to be presented to the customer, e.g., via a network page, an application user interface, etc.
In some cases, the status information may include visualizations, which may correspond to a logical or physical topology graph, locations of network or infrastructure components, and/or a map of a geographic area showing the locations and associated status of network or infrastructure components. For example, in response to a prompt of “what are the cells with the highest number of active users, return also the location,” a listing of information about a number of cells may be provided, including the cell identifier, location coordinates, and the number of active users, in addition to a map of the locations. In some cases, a database query or API query that can be used to obtain the same status information may be provided to the user in the user interface for future reference or modification. A written explanation of the database query in plain language may also be generated by the AI assistant service 427 and provided.
In box 715, the AI assistant service 427 receives a subsequent prompt (e.g., via the chat UI 533) to improve a configuration of the radio-based network 103. For example, the prompt may express an intent to resolve an issue or generally improve the configuration. The prompt may express an intent to improve in terms of network reachability, connectivity, resource allocation, cost, security policy, and so on. In box 718, the AI assistant service 427 generates a configuration modification according to the intent to improve expressed in the subsequent prompt. For example, this may result in the automatically deploying an additional network function in the radio-based network 103 according to the configuration modification in order to scale the network function to resolve an excess load issue. Thereafter, the operation of the portion of the AI assistant service 427 ends.
With reference to
Stored in the memory 806 are both data and several components that are executable by the processor 803. In particular, stored in the memory 806 and executable by the processor 803 are the network functions 422, the network health service 423, the state synchronization service 424, the network controller 425, the service function forwarders 426, the AI assistant service 427, and potentially other applications. Also stored in the memory 806 may be a data store 415 and other data. In addition, an operating system may be stored in the memory 806 and executable by the processor 803.
It is understood that there may be other applications that are stored in the memory 806 and are executable by the processor 803 as can be appreciated. Where any component discussed herein is implemented in the form of software, any one of a number of programming languages may be employed such as, for example, C, C++, C#, Objective C, Java®, JavaScript®, Perl, PHP, Visual Basic®, Python®, Ruby, Flash®, or other programming languages.
A number of software components are stored in the memory 806 and are executable by the processor 803. In this respect, the term “executable” means a program file that is in a form that can ultimately be run by the processor 803. Examples of executable programs may be, for example, a compiled program that can be translated into machine code in a format that can be loaded into a random access portion of the memory 806 and run by the processor 803, source code that may be expressed in proper format such as object code that is capable of being loaded into a random access portion of the memory 806 and executed by the processor 803, or source code that may be interpreted by another executable program to generate instructions in a random access portion of the memory 806 to be executed by the processor 803, etc. An executable program may be stored in any portion or component of the memory 806 including, for example, random access memory (RAM), read-only memory (ROM), hard drive, solid-state drive, universal serial bus (USB) flash drive, memory card, optical disc such as compact disc (CD) or digital versatile disc (DVD), floppy disk, magnetic tape, or other memory components.
The memory 806 is defined herein as including both volatile and nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power. Thus, the memory 806 may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discs accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and/or other memory components, or a combination of any two or more of these memory components. In addition, the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), or magnetic random access memory (MRAM) and other such devices. The ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other like memory device.
Also, the processor 803 may represent multiple processors 803 and/or multiple processor cores and the memory 806 may represent multiple memories 806 that operate in parallel processing circuits, respectively. In such a case, the local interface 809 may be an appropriate network that facilitates communication between any two of the multiple processors 803, between any processor 803 and any of the memories 806, or between any two of the memories 806, etc. The local interface 809 may comprise additional systems designed to coordinate this communication, including, for example, performing load balancing. The processor 803 may be of electrical or of some other available construction.
Although the network functions 422, the network health service 423, the state synchronization service 424, the network controller 425, the service function forwarders 426, the AI assistant service 427, and other various systems described herein may be embodied in software or code executed by general purpose hardware as discussed above, as an alternative the same may also be embodied in dedicated hardware or a combination of software/general purpose hardware and dedicated hardware. If embodied in dedicated hardware, each can be implemented as a circuit or state machine that employs any one of or a combination of a number of technologies. These technologies may include, but are not limited to, discrete logic circuits having logic gates for implementing various logic functions upon an application of one or more data signals, application specific integrated circuits (ASICs) having appropriate logic gates, field-programmable gate arrays (FPGAs), or other components, etc. Such technologies are generally well known by those skilled in the art and, consequently, are not described in detail herein.
The flowcharts of
Although the flowcharts of
Also, any logic or application described herein, including the network functions 422, the network health service 423, the state synchronization service 424, the network controller 425, the service function forwarders 426, or the AI assistant service 427, that comprises software or code can be embodied in any non-transitory computer-readable medium for use by or in connection with an instruction execution system such as, for example, a processor 803 in a computer system or other system. In this sense, the logic may comprise, for example, statements including instructions and declarations that can be fetched from the computer-readable medium and executed by the instruction execution system. In the context of the present disclosure, a “computer-readable medium” can be any medium that can contain, store, or maintain the logic or application described herein for use by or in connection with the instruction execution system.
The computer-readable medium can comprise any one of many physical media such as, for example, magnetic, optical, or semiconductor media. More specific examples of a suitable computer-readable medium would include, but are not limited to, magnetic tapes, magnetic floppy diskettes, magnetic hard drives, memory cards, solid-state drives, USB flash drives, or optical discs. Also, the computer-readable medium may be a random access memory (RAM) including, for example, static random access memory (SRAM) and dynamic random access memory (DRAM), or magnetic random access memory (MRAM). In addition, the computer-readable medium may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other type of memory device.
Further, any logic or application described herein, including the network functions 422, the network health service 423, the state synchronization service 424, the network controller 425, the service function forwarders 426, or the AI assistant service 427, may be implemented and structured in a variety of ways. For example, one or more applications described may be implemented as modules or components of a single application. Further, one or more applications described herein may be executed in shared or separate computing devices or a combination thereof. For example, a plurality of the applications described herein may execute in the same computing device 800, or in multiple computing devices 800 in the same computing environment 403.
Unless otherwise explicitly stated, articles such as “a” or “an”, and the term “set”, should generally be interpreted to include one or more described items. Accordingly, phrases such as “a device configured to” are intended to include one or more recited devices. Such one or more recited devices can also be collectively configured to carry out the stated recitations. For example, “a processor configured to carry out recitations A, B, and C” can include a first processor configured to carry out recitation A working in conjunction with a second processor configured to carry out recitations B and C.
Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present.
Any process descriptions, elements or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or elements in the process. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, executed out of order from that shown, or discussed, including substantially concurrently or in reverse order, depending on the functionality involved as would be understood by those skilled in the art.
Embodiments of the present disclosure may be described by one or more of the following clauses:
Clause 1. A system, comprising: an artificial intelligence (AI) language model taught to recognize a deployment configuration grammar for deploying radio-based networks at least partly on infrastructure of a cloud provider network and to understand specific network configurations that correspond to expressed intents, wherein the AI language model is taught based at least in part on fine-tuning; and a computing device configured to at least: receiving a prompt from a customer to generate at least a portion of a deployment configuration for a network function in a radio-based network; generating, using the AI language model, the at least a portion of the deployment configuration according to an intent expressed in the prompt; allocating a computing resource in the cloud provider network according to the deployment configuration; and deploying the network function in the radio-based network on the computing resource according to the deployment configuration.
Clause 2. The system of clause 1, wherein the computing device is further configured to at least: analyze, by the AI language model, the deployment configuration; and generate, by the AI language model, a modification to the deployment configuration to improve the deployment configuration.
Clause 3. The system of clauses 1 to 2, wherein the computing device is further configured to at least: receive a modification to the deployment configuration from the customer; and teach the AI language model based at least in part on the modification.
Clause 4. The system of clauses 1 to 3, wherein the cloud provider network comprises a private cloud of the customer or a public cloud serving a plurality of customers.
Clause 5. A computer-implemented method, comprising: teaching an artificial intelligence (AI) language model to recognize a deployment configuration grammar for deploying radio-based networks; receiving a prompt from a customer to generate at least a portion of a deployment configuration for a network function in a radio-based network; and generating, using the AI language model, the at least a portion of the deployment configuration according to an intent expressed in the prompt.
Clause 6. The computer-implemented method of clause 5, wherein teaching the AI language model further comprises receiving a training prompt to define a scope of expertise for the AI language model to include infrastructure of a cloud provider network.
Clause 7. The computer-implemented method of clauses 5 to 6, wherein teaching the AI language model further comprises receiving a training prompt to define a scope of expertise for the AI language model to include the deployment configuration grammar, wherein the deployment configuration grammar is enhanced for a particular infrastructure environment.
Clause 8. The computer-implemented method of clauses 5 to 7, further comprising: receiving a subsequent prompt to modify the deployment configuration for the network function; and generating, using the AI language model, a modification to the deployment configuration according to an intent expressed in the subsequent prompt.
Clause 9. The computer-implemented method of clauses 5 to 8, wherein the prompt expresses the intent to make the network function highly available, and the deployment configuration deploys the network function in a plurality of availability zones to make the network function highly available.
Clause 10. The computer-implemented method of clauses 5 to 9, wherein the AI language model is taught based at least in part on at least one of: retrieval augmented generation (RAG) or fine-tuning.
Clause 11. The computer-implemented method of clauses 5 to 10, wherein before the AI language model is taught to recognize the deployment configuration grammar, the AI language model is pretrained to generate code in a particular language.
Clause 12. The computer-implemented method of clause 11, wherein the particular language is Topology and Orchestration Specification for Cloud Applications (TOSCA).
Clause 13. The computer-implemented method of clauses 5 to 12, further comprising automatically deploying the network function in the radio-based network according to the deployment configuration.
Clause 14. The computer-implemented method of clauses 5 to 13, further comprising: receiving a modification to the deployment configuration from the customer; and teaching the AI language model based at least in part on the modification.
Clause 15. A computer-implemented method, comprising: teaching an artificial intelligence (AI) language model to recognize a deployment configuration grammar for deploying radio-based networks; receiving a deployment configuration for a radio-based network; receiving a prompt from a customer with a specific intent to modify the deployment configuration; analyzing, by the AI language model, the deployment configuration; and generating, by the AI language model, a modification to the deployment configuration according to the specific intent expressed in the prompt.
Clause 16. The computer-implemented method of clause 15, wherein the modification resolves at least one of: a network reachability issue or a connectivity issue in the deployment configuration.
Clause 17. The computer-implemented method of clauses 15 to 16, wherein the modification resolves a resource allocation issue in the deployment configuration, and the modification comprises changing a computing instance type used in the deployment configuration.
Clause 18. The computer-implemented method of clauses 15 to 17, wherein the modification changes a security policy associated with the radio-based network.
Clause 19. The computer-implemented method of clauses 15 to 18, wherein the modification reduces a cost associated with the radio-based network.
Clause 20. The computer-implemented method of clauses 15 to 19, wherein the modification comprises reallocating a network function to a different location.
Clause 21. A system, comprising: an artificial intelligence (AI) language model taught to aggregate and recognize status information for radio-based networks that are implemented at least partly on infrastructure of a cloud provider network; and a computing device configured to at least: receive a prompt from a customer to provide a type of status information for at least a portion of a radio-based network; generate, using the AI language model, a suitable query to submit to a database or a service to obtain the status information; obtain the status information using the query; generate, using the AI language model, a response to the prompt that includes or summarizes the type of status information using the obtained status information according to an intent expressed in the prompt; and causing the response to be presented to the customer.
Clause 22. The system of clause 21, wherein the response to the prompt further includes the query.
Clause 23. The system of clauses 21 to 22, wherein the response to the prompt further includes an explanation of the query.
Clause 24. The system of clauses 21 to 23, wherein the response to the prompt further includes a topology graph showing one or more elements of the radio-based network.
Clause 25. The system of clauses 21 to 24, wherein the prompt identifies at least one of: a particular region of the cloud provider network, or a particular type of network function in the radio-based network.
Clause 26. A computer-implemented method, comprising: teaching an artificial intelligence (AI) language model to aggregate and recognize status information for radio-based networks; receiving a prompt from a customer to provide a type of status information for at least a portion of a radio-based network; obtaining the status information; and generating, using the AI language model, a response to the prompt that includes or summarizes the type of status information using the obtained status information according to an intent expressed in the prompt.
Clause 27. The computer-implemented method of clause 26, wherein the status information includes respective locations of one or more cell sites in the radio-based network, and the response to the prompt includes a visualization depicting the respective locations of the one or more cell sites.
Clause 28. The computer-implemented method of clauses 26 to 27, further comprising: generating a database query to obtain the status information; and returning the database query to the customer in conjunction with the response to the prompt.
Clause 29. The computer-implemented method of clauses 26 to 28, further comprising: teaching the AI language model to recognize configuration information for the radio-based networks; receiving a subsequent prompt from the customer to improve a configuration of the radio-based network; and generating, using the AI language model, a configuration modification according to an intent to improve the configuration expressed in the prompt.
Clause 30. The computer-implemented method of clause 29, further comprising automatically deploying an additional network function in the radio-based network according to the configuration modification.
Clause 31. The computer-implemented method of clauses 26 to 30, wherein the prompt identifies a particular region of a cloud provider network in which the radio-based network is at least partly implemented.
Clause 32. The computer-implemented method of clauses 26 to 31, wherein the prompt identifies a particular type of network function in the radio-based network.
Clause 33. The computer-implemented method of clauses 26 to 32, wherein obtaining the status information further comprises querying a plurality of application programming interfaces (APIs) to obtain the status information.
Clause 34. The computer-implemented method of clauses 26 to 33, wherein the type of status information includes security audit information.
Clause 35. The computer-implemented method of clauses 26 to 34, wherein the type of status information includes network function health information.
Clause 36. The computer-implemented method of clauses 26 to 35, wherein the type of status information includes network function topology information.
Clause 37. The computer-implemented method of clauses 26 to 36, wherein the AI language model is taught based at least in part on at least one of: retrieval augmented generation (RAG) or fine-tuning.
Clause 38. A computer-implemented method, comprising: teaching an artificial intelligence (AI) language model to aggregate and recognize status information for radio-based networks; receiving a prompt from a customer to determine a plurality of infrastructure components in a radio-based network that are associated with a particular type of status; obtaining status information for the radio-based network; and generating, using the AI language model and the status information, a response to the prompt that includes a visualization of the plurality of infrastructure components that are determined to be associated with the particular type of status.
Clause 39. The computer-implemented method of clause 38, wherein the visualization comprises a map of a plurality of physical locations corresponding to the plurality of infrastructure components.
Clause 40. The computer-implemented method of clauses 38 to 39, wherein the visualization comprises a topology graph showing logical or physical connections among the plurality of infrastructure components.
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
This application claims the benefit of, and priority to, U.S. Provisional Patent Application No. 63/582,506, entitled “CONFIGURING AND MANAGING RADIO-BASED NETWORKS VIA AN ARTIFICIAL INTELLIGENCE ASSISTANT,” and filed on Sep. 13, 2023, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63582506 | Sep 2023 | US |