The present application relates generally to the field of communication networks and more specifically to techniques that facilitate use of commercial off-the-shelf (COTS) computing systems, such as cloud infrastructure, for real-time processing associated with physical layer (PHY) communications in a wireless network (e.g., 5G network).
In general, “cloud computing” refers to the delivery of remote computing services such as application servers, storage, databases, networking, analytics, and intelligence to users over the Internet. “Cloud” infrastructure” generally refers to the hardware and software components such as servers, storage, networking, etc., that are needed to support the computing requirements of a cloud computing model. Cloud infrastructure also typically includes an abstraction layer that virtualizes the hardware resources and logically presents them to users (e.g., as “virtual machines”) through application program interfaces (APIs). Such virtualized resources are typically hosted by a service provider are delivered to users over the public Internet or a private network. Publicly available cloud infrastructure can be referred to as “infrastructure as a service”. Cloud infrastructure is typically built on top on large scale commodity servers, typically based on the well-known Intel x86 architecture used in personal computing.
Cloud technology has swiftly transformed information and communications technology (ICT) and it is continuing to spread to new areas. Many traditional ICT applications, such as web-based services, are suitable for cloud deployment in that they have relaxed timing or performance requirements. These services are typically based on HyperText Transport Protocol (HTTP) signaling. A common platform used to provide cloud-based web-services is Kubernetes, which can coordinate a highly available cluster of connected computers (also referred to as “processing elements” or “hosts”) to work as a single unit. Kubernetes deploys applications packaged in “containers” (e.g., via its “container runtime”) to decouple them from individual computing hosts. These Kubernetes abstractions facilitate deploying applications to a cloud-based computing cluster without tying them specifically to individual computing machines. In this manner, containerized applications are more flexible and available than in deployment modes where applications were installed directly onto specific machines. Such containerized applications are referred as “cloud-native” applications.
However, many other types of applications and services have more strict timing and/or performance requirements that have prevented migration of these applications and services to cloud infrastructure. These include applications that have “real-time” requirements. For example, an application or service has a “hard real-time” requirement if missing a deadline for completing an operation or producing a result will result in a catastrophic failure to the application and/or an associated physical system. Examples include factory automation, networking, vehicle control, etc. In contrast, an application or service has a “soft real-time” requirement if missing a deadline will result in reduced performance without catastrophic failure. Examples include media streaming, financial transaction processing, etc.
In general, new scheduling techniques and hardware accelerators are needed to make cloud infrastructure suitable for more mission-critical applications with real-time requirements, such as networking and communications. For example, the Internet Engineering Task Force (IETF) network function virtualization (NFV) initiative is standardising a virtual networking infrastructure and a virtual network function (VNF) architecture. Even so, this effort is focused on higher protocol layers and/or network functions that with soft real-time requirements, such as the IP Multimedia Subsystem (IMS).
As another example, cloud radio access network (Cloud RAN, or C-RAN) is a centralized, cloud computing-based architecture for radio access networks that is intended to support 2G, 3G, 4G, and possibly future systems standardized by 3GPP. C-RAN uses open platforms and real-time virtualization technology from cloud computing to achieve dynamic shared resource allocation and support multi-vendor, multi-technology environments. C-RAN systems include remote radio heads (RRHs) that connect to baseband units (BBUs) over a standard Common Public Radio Interface (CPRI). Even so, BBUs are typically purpose-built using specialized, vendor-proprietary hardware platforms to meet the hard real-time requirements of lower protocol layers in wireless networks.
To further reduce costs for network operators, it is ultimately desirable to migrate digital processing for lower protocol layers also to cloud infrastructure, including physical layer (PHY, also called L1) and medium access control layer (MAC, also called L2). Although commercial off-the-shelf (COTS) servers, processing units, and/or virtualization software used in cloud infrastructure are continually improving, they still lack the capability to support hard real-time requirements on small time scales found in L1/L2 processing. Improvements are needed to achieve these goals, such as new techniques for determining resource needs and scheduling resources for L1/L2 processing on cloud infrastructure.
Embodiments of the present disclosure provide specific improvements to implementation of digital (or baseband) processing of lower protocol layers in a wireless network on COTS computing infrastructure by facilitating solutions to overcome the exemplary problems, issues, and/or difficulties summarized above and described in more detail below.
Embodiments include methods (e.g., procedures) for scheduling processing resources for physical layer (PHY) communications in a wireless network. For example, such methods can be performed by a task resource scheduler (IRS) that is communicatively coupled to a resource management function for the processing resources (e.g., physical or virtual processing units).
These exemplary methods can include estimating processing resources needed, during a subsequent second duration, for PHY communications in one or more cells of the wireless network. The estimate can be based on:
In some embodiments, the processing resources comprise a plurality of COTS processing units, the resource management function is an operating system (OS) or a virtualization layer executing on the processing units, and TRS also executes on the processing units. In some embodiments, the first transmission timing configuration can be received from a cell management function in the wireless network, the current workload can be received from the respective RUs, and information about user data traffic scheduled can be received from a user plane scheduler in the wireless network.
In some embodiments, for each cell, the first transmission timing configuration includes one or more of the following: time-division duplexing (TDD) configuration of a plurality of slots in each subframe; relative or absolute timing of an initial slot in each subframe; and relative or absolute timing of an initial symbol in each slot.
In some embodiments, the first duration includes a plurality of slots and the request is sent at least the scheduling delay before the second duration. In some embodiments, the second duration is based on hard real-time deadlines associated with the transmission or reception in the one or more cells by the RUs.
In some embodiments, the first duration includes one or more subframes. In such embodiments, the information about user data traffic includes traffic load for each of the following channels or signals during each of the one or more subframes: physical uplink control channel (PUCCH), physical uplink shared channel (PUSCH), physical downlink control channel (PDCCH), physical downlink shared channel (PDSCH), and sounding reference signals (SRS). In some of these embodiments, each subframe includes a plurality of slots and the information about user data traffic includes traffic load for each of the signals or channels during each of the plurality of slots (i.e., in each subframe). In some of these embodiments, the information about user data traffic also includes requirements during each of the one or more subframes for beam forming and/or beam pairing associated with the user data traffic.
In some embodiments, estimating the processing resources needed can be further based on information about user data traffic scheduled for transmission or reception in the one or more cells during a plurality of durations before the first duration. In other embodiments, estimating the processing resources needed can be further based on estimated processing resources needed during a plurality of durations before the second duration.
In some embodiments, estimating the processing resources needed can include estimating the processing resources needed in each particular cell based on a cost in processing resources per unit of data traffic for each signal or channel associated with the user data traffic; and a number of traffic units for each signal or channel in the particular cell. In some embodiments, estimating the processing resources needed can also include scaling the estimated amount of processing resources needed for the respective cells based on a function of respective current workloads of the RUs serving the respective cells. In some embodiments, estimating the processing resources needed can also include summing the scaled estimated amounts of processing resources needed for the respective cells, and adding to the sum a minimum per-slot processing resource for each cell.
In some embodiments, the one or more cells can include a plurality of cells and the exemplary methods can also include sending, to a cell management function, one or more of the following: information about estimated processing resources needed in each slot of a subframe for each of the cells; and a transmission timing offset to be applied to at least one of the cells.
In some of these embodiments, these exemplary methods can also include receiving, from the cell management function, a second transmission timing configuration for the plurality of cells. For at least one of the cells, the second transmission timing configuration can include a transmission timing offset (e.g., some number of slots) relative to the first transmission timing configuration. In such embodiments, these exemplary methods can also include estimating further processing resources needed, during a subsequent third duration, for PHY communications in the plurality of cells based on the second transmission timing configuration; and sending, to the resource management function, a request for the estimated further processing resources during the third duration. In some of these embodiments, the further processing resources have reduced variation across slots of a subframe relative to the processing resources estimated based on the first transmission timing configuration.
Other embodiments include TRS configured to perform operations corresponding to any of the exemplary methods described herein. Other embodiments include non-transitory, computer-readable media storing (or computer program products comprising) computer-executable instructions that, when executed by processing circuitry associated with a TRS, configure the TRS to perform operations corresponding to any of the exemplary methods described herein.
Other embodiments include a processing system for PHY communications in a wireless network. The processing system can include a plurality of processing units and one or more memories storing executable instructions corresponding to the TRS and a resource management function arranged to allocate the processing units for software tasks associated with the PHY communications. Execution of the instructions by the processing units configures the TRS to perform operations corresponding any of the exemplary methods described herein. In some embodiments, the processing units and the resource management function can be COTS.
Other embodiments include a wireless network comprising one or more virtualized distributed units (vDUs) and a plurality of RUs, each serving one or more cells. Each vDU can include an embodiment of the processing system and can be communicatively coupled to a different portion of the RUs.
These and other objects, features, and advantages of embodiments of the present disclosure will become apparent upon reading the following Detailed Description in view of the Drawings briefly described below.
Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject matter disclosed herein, the disclosed subject matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.
Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where a step must necessarily follow or precede another step due to some dependency. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features, and advantages of the enclosed embodiments will be apparent from the following description.
Furthermore, the following terms are used throughout the description given below:
Note that the description herein focuses on a 3GPP cellular communications system and, as such, 3GPP terminology or terminology similar to 3GPP terminology is oftentimes used. However, the concepts disclosed herein are not limited to a 3GPP system. Furthermore, although the term “cell” is used herein, it should be understood that (particularly with respect to 5G NR) beams may be used instead of cells and, as such, concepts described herein apply equally to both cells and beams.
In the present disclosure, the term “service” is used generally to refer to a set of data, associated with one or more applications, that is to be transferred via a network with certain specific delivery requirements that need to be fulfilled in order to make the applications successful. In the present disclosure, the term “component” is used generally to refer to any component needed for the delivery of the service. Examples of components are cloud infrastructure with related resources such as computation and storage.
Currently the fifth generation (“5G”) of cellular systems, also referred to as New Radio (NR), is being standardized within the Third-Generation Partnership Project (3GPP). NR is developed for maximum flexibility to support multiple and substantially different use cases. These include enhanced mobile broadband (eMBB), machine type communications (MTC), ultra-reliable low latency communications (URLLC), side-link device-to-device (D2D), and several other use cases.
NG-RAN 199 is layered into a Radio Network Layer (RNL) and a Transport Network Layer (TNL). The NG-RAN architecture, i.e., the NG-RAN logical nodes and interfaces between them, is defined as part of the RNL. For each NG-RAN interface (NG, Xn, F1) the related TNL protocol and the functionality are specified. The TNL provides services for user plane transport and signaling transport. In some exemplary configurations, each gNB is connected to all 5GC nodes within an “AMF Region,” which is defined in 3GPP TS 23.501. If security protection for CP and UP data on TNL of NG-RAN interfaces is supported, NDS/IP shall be applied.
The NG RAN logical nodes shown in
A gNB-CU connects to gNB-DUs over respective F1 logical interfaces, such as interfaces 122 and 132 shown in
Each of the gNBs 210 can support the NR radio interface including frequency division duplexing (FDD), time division duplexing (TDD), or a combination thereof. Each of ng-eNBs 220 can support the fourth-generation (4G) Long-Term Evolution (LTE) radio interface. Unlike conventional LTE eNBs, however, ng-eNBs 220 connect to the 5GC via the NG interface. Each of the gNBs and ng-eNBs can serve a geographic coverage area including one more cells, such as cells 211a-b and 221a-b shown in
5G/NR technology shares many similarities with LTE technology. For example, NR uses CP-OFDM (Cyclic Prefix Orthogonal Frequency Division Multiplexing) in the DL and both CP-OFDM and DFT-spread OFDM (DFT-S-OFDM) in the UL. As another example, in the time domain, NR DL and UL physical resources are organized into equal-sized 1-ms subframes. A subframe is further divided into multiple slots of equal duration, with each slot including multiple OFDM-based symbols. However, time-frequency resources can be configured much more flexibly for an NR cell than for an LTE cell. For example, rather than a fixed 15-kHz OFDM sub-carrier spacing (SCS) as in LTE, NR SCS can range from 15 to 240 kHz, with even greater SCS considered for future NR releases.
In addition to providing coverage via cells as in LTE, NR networks also provide coverage via “beams.” In general, a downlink (DL, i.e., network to UE) “beam” is a coverage area of a network-transmitted reference signal (RS) that may be measured or monitored by a UE. In NR, for example, RS can include any of the following: synchronization signal/PBCH block (SSB), channel state information RS (CSI-RS), tertiary reference signals (or any other sync signal), positioning RS (PRS), demodulation RS (DMRS), phase-tracking reference signals (PTRS), etc. In general, SSB is available to all UEs regardless of the state of their connection with the network, while other RS (e.g., CSI-RS, DM-RS, PTRS) are associated with specific UEs that have a network connection.
On the UP side, Internet protocol (IP) packets arrive to the PDCP layer as service data units (SDUs), and PDCP creates protocol data units (PDUs) to deliver to RLC. When each IP packet arrives, PDCP starts a discard timer. When this timer expires, PDCP discards the associated SDU and the corresponding PDU. If the PDU was delivered to RLC, PDCP also indicates the discard to RLC.
The RLC layer transfers PDCP PDUs to the MAC through logical channels (LCH). RLC provides error detection/correction, concatenation, segmentation/reassembly, sequence numbering, reordering of data transferred to/from the upper layers. If RLC receives a discard indication from associated with a PDCP PDU, it will discard the corresponding RLC SDU (or any segment thereof) if it has not been sent to lower layers.
The MAC layer provides mapping between LCHs and PHY transport channels, LCH prioritization, multiplexing into or demultiplexing from transport blocks (TBs), hybrid ARQ (HARQ) error correction, and dynamic scheduling (on gNB side). The PHY layer provides transport channel services to the MAC layer and handles transfer over the NR radio interface, e.g., via modulation, coding, antenna mapping, and beam forming.
On UP side, the Service Data Adaptation Protocol (SDAP) layer handles quality-of-service (QoS). This includes mapping between QoS flows and Data Radio Bearers (DRBs) and marking QoS flow identifiers (QFI) in UL and DL packets. On CP side, the non-access stratum (NAS) layer is between UE and AMF and handles UE/gNB authentication, mobility management, and security control.
The RRC layer sits below NAS in the UE but terminates in the gNB rather than the AMF. RRC controls communications between UE and gNB at the radio interface as well as the mobility of a UE between cells in the NG-RAN. RRC also broadcasts system information (SI) and performs establishment, configuration, maintenance, and release of DRBs and Signaling Radio Bearers (SRBs) and used by UEs. Additionally, RRC controls addition, modification, and release of carrier aggregation (CA) and dual-connectivity (DC) configurations for UEs. RRC also performs various security functions such as key management.
After a UE is powered ON it will be in the RRC_IDLE state until an RRC connection is established with the network, at which time the UE will transition to RRC_CONNECTED state (e.g., where data transfer can occur). The UE returns to RRC_IDLE after the connection with the network is released, in RRC_IDLE state, the UE's radio is active on a discontinuous reception (DRX) schedule configured by upper layers. During DRX active periods (also referred to as “DRX On durations”), an RRC_IDLE UE receives SI broadcast in the cell where the UE is camping, performs measurements of neighbor cells to support cell reselection, and monitors a paging channel on PDCCH for pages from 5GC via gNB. An NR UE in RRC_IDLE state is not known to the gNB serving the cell where the UE is camping. However, NR RRC includes an RRC_INACTIVE state in which a UE is known (e.g., via UE context) by the serving gNB. RRC_INACTIVE has some properties similar to a “suspended” condition used in LTE.
An NR UE can be configured with up to four carrier bandwidth parts (BWPs) in the DL with a single DL BWP being active at a given time. A UE can be configured with up to four BWPs in the UL with a single UL BWP being active at a given time. If a UE is configured with a supplementary UL, the UE can be configured with up to four additional BWPs in the supplementary UL, with a single supplementary UL BWP being active at a given time. In this manner, a UE can be configured with a narrow BWP (e.g., 10 MHz) and a wide BWP (e.g., 100 MHz), each starting at a particular CRB, but only one BWP can be active for the UE at a given point in time.
NR supports various sub-carrier spacings (SCS) Δf=(15×2μ) kHz, where μ∈(0,1,2,3,4) are referred to as “numerologies.” Numerology μ=0 (i.e., Δf=15 kHz) provides the basic (or reference) SCS that is also used in LTE. The symbol duration, cyclic prefix (CP) duration, and slot duration are inversely related to SCS or numerology. For example, there is one (1-ms) slot per subframe for Δf=15 kHz, two 0.5-ms slots per subframe for Δf=30 kHz, etc. In addition, the maximum carrier bandwidth is directly related to numerology according to 2μ*50 MHz. Table 1 below summarizes the supported NR numerologies and associated parameters. Different DL and UL numerologies can be configured by the network.
In general, an NR physical channel corresponds to a set of REs carrying information that originates from higher layers. Downlink (DL, i.e., gNB to UE) physical channels include Physical Downlink Shared Channel (PDSCH), Physical Downlink Control Channel (PDCCH), and Physical Broadcast Channel (PBCH).
PDSCH is the main physical channel used for unicast DL data transmission, but also for transmission of RAR (random access response), certain system information blocks (SIBs), and paging information. PBCH carries the basic system information (SI) required by the UE to access a cell. PDCCH is used for transmitting DL control information (DCI) including scheduling information for DL messages on PDSCH, grants for UL transmission on PUSCH, and channel quality feedback (e.g., CSI) for the UL channel.
Uplink (UL, i.e., UE to gNB) physical channels include Physical Uplink Shared Channel (PUSCH), Physical Uplink Control Channel (PUCCH), and Physical Random-Access Channel (PRACH). PUSCH is the uplink counterpart to the PDSCH. PUCCH is used by UEs to transmit uplink control information (UCI) including HARQ feedback for gNB DL transmissions, channel quality feedback (e.g., CSI) for the DL channel, scheduling requests (SRs), etc. PRACH is used for random access preamble transmission.
Within the NR DL, certain REs within each subframe are reserved for the transmission of reference signals (RS). These include demodulation reference signals (DM-RS), which are transmitted to aid the UE in the reception of an associated PDCCH or PDSCH. Other DL reference signals include positioning reference signals (PRS) and CSI reference signals (CSI-RS), the latter of which are monitored by the UE for the purpose of providing channel quality feedback (e.g., CSI) for the DL channel. Additionally, phase-tracking RS (PTRS) are used by the UE to identify common phase error (CPE) present in sub-carriers of a received DL OFDM symbol.
Other RS-like DL signals include Primary Synchronization Sequence (PSS) and Secondary Synchronization Sequence (SSS), which facilitate the UEs time and frequency synchronization and acquisition of system parameters (e.g., via PBCH). The PSS, SSS, and PBCH are collectively referred to as an SS/PBCH block (SSB).
The NR UL also includes DM-RS, which are transmitted to aid the gNB in the reception of an associated PUCCH or PUSCH, and PTRS, which are used by the gNB to identify CPE present in sub-carriers of a received UL OFDM symbol. The NR UL also includes sounding reference signals (SRS), which perform a similar function in the UL as CSI-RS in the DL.
A CORESET can include one or more RBs (i.e., multiples of 12 REs) in the frequency domain and 1-3 OFDM symbols in the time domain. The smallest unit used for defining CORESET is resource element group (REG), which spans one RB (i.e., 12 REs) in frequency and one OFDM symbol in time. CORESET resources can be indicated to a UE by RRC signaling. In addition to PDCCH, each REG in a CORESET contains DM-RS to aid in the estimation of the radio channel over which that REG was transmitted.
NR data scheduling can be performed dynamically, e.g., on a per-slot basis. In each slot, the base station (e.g., gNB) transmits downlink control information (DCI) over PDCCH that indicates which UE is scheduled to receive data in that slot, as well as which RBs will carry that data. A UE first detects and decodes DCI and, if the DCI includes DL scheduling information for the UE, receives the corresponding PDSCH based on the DL scheduling information. DCI formats 1_0 and 1_1 are used to convey PDSCH scheduling. Likewise, DCI on PDCCH can include UL grants that indicate which UE is scheduled to transmit data on PUSCH in that slot, as well as which RBs will carry that data. A UE first detects and decodes DCI and, if the DCI includes an uplink grant for the UE, transmits the corresponding PUSCH on the resources indicated by the UL grant.
Scheduling of workloads for High Performance Computing (HPC) and cloud systems is a well-researched technical field with a large body of work. However, the main focus has long been on batch processing rather than stream processing. In general, throughput and fairness are key criteria for batch processing whereas timeliness and/or latency are more important for stream processing. In this context, “latency” refers generally to an amount of time between a request for computing resources for a workload (or task) and the scheduled time for the workload to run on the allocated computing resources. Alternately, latency can be the time required to make a scheduling decision for a resource request. However, workload scheduling for HPC typically does not address hard real-time constraints and/or timing guarantees associated with data streams. Real-time computing systems is another well-researched area with a large body of work.
Historically, real-time systems were scheduled by cyclic executives, typically constructed in an ad hoc manner During the 1970's and 1980's, real-time computing infrastructure was developed based on a fixed-priority scheduling theory, in which each computing workload is assigned a priority via some policy. In real-time computing vernacular, a task may consist of several jobs, with each job of the same task assigned the same priority. Contention for resources is resolved in favor of the job with the higher priority that is ready to run. Even so, there has been little (if any) work on adaptive scheduling of streaming data (i.e., with real-time requirements) that varies over time to meet variations in application workload.
There are several existing frameworks for adaptive resource management, most of them at very early stages of development.
Another work specifically targeted for scheduling of stream processing is known as Cutting edge Reconfigurable ICs for Stream Processing (CRISP). In particular, CRISP includes a multiprocessor system-on-a-chip (MPSoC) that supports dynamic reconfiguration of computing resources (i.e., at the hardware level) according to application resource requirements.
As shown in
Each vDU also communicates with a plurality of radio units (RUs), each of which contains transceivers, antennas, etc. that provide the NR (and optionally LIE) radio interface in respective coverage areas (e.g., cells). For example, vDU 720 communicates with RUs 721-723 and vDU 730 communicates with RUs 731-733. vDU-RU communication can be based on the Common Public Radio Interface (CPRI) over high-speed optical communications technology.
To further reduce costs for network operators, it is desirable to migrate digital processing for lower protocol layers (e.g., PHY/L1 and MAC/L2) to cloud infrastructure based on COTS servers, processing units, and/or virtualization software. Efficient management of concurrent computation is challenging in general, and even more difficult when running 5G workloads on COTS. Scheduling of compute resources for processing of radio workload needs to be done such that all hard real-time deadlines are met while optimizing utilization of available hardware and software resources.
One current approach for scheduling radio workloads (e.g., streams of radio data) on computing resources is based on processes. These rely on an operating system (e.g., Linux) scheduler. The workload consists of multiple streams of radio data with varying requirements on compute capacity, timeliness, etc. Using different processes to handle different type of streams makes for a straightforward solution that leverages the OS capabilities and provides a flexible and scalable solution. Processes can be dynamically allocated to different processing units (e.g., cores, stream processors, etc.), which facilitates high utilization of available hardware. However, handling of processes requires significant computation and communication resources since data must be copied between user and kernel space.
Alternately, a kernel-bypass approach with static allocation of processing units can be used to address some of the drawbacks of the process-based approach. In the kernel-bypass approach, a portion of available processing units are not controlled and scheduled by the OS but instead provide raw compute capabilities. Each of these processing unit can process only one type of request and is associated with a queue of incoming packets (or streams) The processing units repeatedly query (or poll) their associated queues for new workload to process. The processing software running on these processing units are subject to very low communication overhead and can thus provide low latency processing.
A drawback of the kernel-bypass approach is statically-assigned processing units cannot be dynamically configured and/or reallocated to match changes in workload. For example, it is not possible and/or feasible to increase the number (or portion) of processing units handling a given type of workload if needed. This leads to poor utilization and lower-than-desirable capacity. Furthermore, all processing units must continually poll their queues, such that they are running even when there is no workload to process. This leads to increased energy consumption and unnecessary costs.
Even though COTS components are continually improving, they still lack the capability to support hard real-time requirements on small time scales found in 5G L1/L2 processing. For example, graphics processing units (GPUs) typically include a large number of stream (or vector) multiprocessors that have very high raw computing capabilities. However, GPUs are optimized processing at video frame rates such as 60 Hz, 120 Hz, etc. In contrast, 5G L1/L2 real-time processing demands occur at the following rates, which can be much higher than video rates:
Accordingly, improvements are needed to be able to use COTS hardware and software for 5G L1/L2 processing, such as new techniques for determining resource needs and scheduling computing resources for such processing.
Exemplary embodiments of the present disclosure address these and other problems, issues, and/or difficulties by providing a flexible application-aware resource scheduler that takes advantage of 5G- and RAN-specific information to allocate processing resource according to user traffic load, scheduling of physical signals and channels, cell configurations, etc. This scheduler can be referred to as a L1 (or PHY) Task Resource Scheduler (TRS). By incorporating such information into scheduling of processing resources, the TRS can provide superior performance relative to generic task schedulers, both in terms of meeting hard real-time deadlines associated with PHY processing and in terms of better utilization of the underlying hardware. Such improved performance facilitates use of COTS processing hardware and software for 5G baseband processing, which can provide a more competitive product (e.g., DU) in terms of cost, energy efficiency, etc.
Based on these inputs, the TRS can determine a specific deadline for each workload and predict and/or estimate processing resource demand in upcoming time intervals (e.g., next N slots). Accordingly, the TRS can notify a processing resource management function 810 (e.g., OS, virtualizer, etc.) sufficiently in advance to facilitate actual allocation of the needed processing resources to meet hard real-time deadlines for L1/L2 processing, while avoiding over-dimensioning of the system to account for peak demands (as traditionally done in purpose-built systems). This notification can be in the form of a resource request, resource release, resource allocation request, resource deallocation request, etc. In some cases, the resource management function may respond to such a notification, e.g., with an acknowledgement or confirmation of the allocation, an indication of an error condition that prevents the allocation, etc.
In general, the frequency or rate of resource allocation and deallocation of computing resources is far lower than changes in the instantaneous traffic demands. This is because embodiments of the TRS use predictive techniques to anticipate future resource requirements based on current and past information. In other words, according to the principles of these embodiments, resources will not be allocated simply based on instantaneous demand (e.g., when queues are full) but based on demand predicted based cell configuration, current and past traffic load, RU workload or processing margin, etc. As such, the TRS may sometimes request allocation of more resources than needed in a particular slot, but still lower than if the system would have been overprovisioned for peak traffic demand as done conventionally.
The processing resource estimation (or prediction) algorithm can be implemented in various ways, such as by auto-regressive (AR) filters, moving-average (MA) filters, ARMA filters, rule-based systems, artificial intelligence (AI), machine learning (ML), etc. An exemplary processing resource estimation function is given below, where the output P expresses the processing resource demand (e.g., in processing units, threads, cores, etc.) for the next N slots (e.g., the latency or delay for processing resource allocation):
where:
In some embodiments, each factor loadi,j can be an average over a particular number (e.g., M) of most recent subframes. This can be considered a moving average (MA) filter and can be computed using equal weights for each of the M subframes, or by assigning a weight (or coefficient) to each subframe based on its recency (e.g., more recent subframes are weighted more heavily).
In some embodiments, the output P for a particular N slots can be averaged with previous outputs P for a particular number (e.g., M) of most recent durations of N slots. This can be considered an autoregressive (AR) filter and can be computed using equal weights for each of the N-slot durations, or by assigning a weight (or coefficient) to each N-slot duration based on its recency (e.g., more recent durations are weighted more heavily) and/or on other considerations. For example, known techniques can be used to find AR prediction coefficients based on historical load input data.
In some embodiments, scheduling of radio resources (e.g., in multiple cells) can be based on, or influenced by, TRS estimation of processing resource demand. This is illustrated in
A PHY/L1 process 910 runs on processing system 930. For example, this process can handle the 5G baseband processing such as currently performed in DUs, vDUs, and/or BBUs. PHY/L1 process 910 includes a plurality of queues, each holding different kinds of 5G workloads waiting to be processed. For example,
TRS 920 creates one or more worker threads for each queue/workload type depending on the amount of processing required. These worker threads can be considered “virtualized processors” from the viewpoint of the PHY/L1 process. TRS 920 predicts or estimates future workload based on the inputs, and from that can determine when threads can be moved from active state to idle state (e.g., yielded back to OS 950) until they need to be re-activated again. These actions by TRS free up physical processing units for other types of workloads in PHY/L1 process 910, as well as for the other processes. The latency for changing thread state change is generally lower than the latency for creating and destroying threads. Furthermore, even if changing thread state is too slow for certain real-time deadlines (e.g., symbol time), TRS 920 can ensure that there are enough active threads to address such time-sensitive deadlines.
As discussed above, TRS 920 can notify OS 950 sufficiently in advance to facilitate actual allocation of the needed processing units 940 to meet hard real-time deadlines for PHY/L1 processing, while avoiding over-dimensioning of processing resources to account for peak demands (as traditionally done in purpose-built systems). In
Processing units 1030 are preferably COTS units, such as graphics processing units (GPUs), rack-mounted x86 server boards, reduced instruction-set computer (RISC, e.g., ARM) boards, etc. Each processing unit 1030 can include processing circuitry 1060 and memory 1090. Memory 1090 can include non-persistent memory 1090-1 (e.g., for permanent or semi-permanent storage) and persistent memory 1090-2 (e.g., for temporary storage), each of which can store instructions 1095 (also referred to as software or computer program product).
Memory 1090 can store instructions 1095 executable by processing circuitry 1060 whereby various applications 1010 and/or 1020 can be operative for various features, functions, procedures, etc. of the embodiments disclosed herein. For example, instructions 1095 can include program instructions that, when executed by processing circuitry 1060, can configure processing unit 1030 to perform operations corresponding to the methods or procedures described herein, including those related to embodiments of the TRS.
Memory 1090 can also store instructions 1095 executable by processing circuitry 1060 to instantiate one or more virtualization layers 1050 (also referred to as hypervisor or virtual machine monitor, VMM). In some embodiments, virtualization layer 1050 can be used to provide a plurality of virtual machines (VMs) 1040 that are abstracted from the underlying processing units 1030. For example, virtualization layer 1050 can present a virtual operating platform that appears like computing hardware to containers and/or pods hosted by environment 1000. Moreover, each VM (e.g., as facilitated by virtualization layer 1050) can manifest itself as a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each VM can have dedicated processing units 1030 or can share resources of one or more processing units 1030 with other VMs.
Memory 1090 can store software to execute each VM 1040 as well as software allowing a VM 1040 to execute functions, features and/or benefits described in relation with some embodiments described herein. VMs 1040 can include virtual processing, virtual memory, virtual networking or interface and virtual storage, and can be run by virtualization layer 1050. As shown in
As a specific example, applications 1010 can be implemented in WebAssembly, a binary instruction format designed as a portable compilation target for programming languages. In other words, virtualization layer 1050 can provide VMs 1040 that are capable of running applications, such as PHY/L1 application 1010, that are compiled into WebAssembly executables. As another specific example, virtualization layer 1050 can provide Java VMs 1040 that are capable of running applications (e.g., PHY/L1 application 1010) written in the Java programming language or written in other programming languages and compiled into Java byte code.
In other embodiments, virtualization layer 1050 can host various applications 1020 arranged in pods. Each pod can include one or more containers 1021, such as 1021a-b shown for a particular application 1020 in
Note that in
Processing circuitry 1060 can include general-purpose or special-purpose hardware devices, such as one or more Intel x86-family processors (or equivalent), reduced instruction-set computing (RISC) processors (e.g., ARM), stream or vector multiprocessors, application-specific integrated circuits (ASICs), or any other type of processing circuitry including digital or analog hardware components. Each processing unit 1030 can include one or more high-speed communication interfaces 1070, each of which can include a physical network interface 1080. The respective communication interfaces 1070 can be used for communication among the processing units 1030, and/or with other computing hardware internal and/or external to system 1000.
For a particular DL slot, the baseband processing must be completed during the slot before the particular DL slot is transmitted over the air (OTA). For a particular UL slot, the baseband processing must be completed during the slot after the particular UL slot is received OTA. These relationships are illustrated by the arrows. As can be seen in
As shown in
As mentioned above, in some embodiments scheduling of radio resources can be based on, or influenced by, TRS estimation of processing resource demand. For example, the TRS can feedback information about processing resource needs for current transmission timing in multiple cells to a cell management function (e.g., 830), which can adapt TDD pattern, slot 0 timing, etc. for one or more cells whose processing resources are managed by the TRS.
Although the processing resources required for the two cells is not uniform across slots, the shifting of slot 0 serves to reduce the amount of variation.
The principles illustrated by
The embodiments described above can be further illustrated by the exemplary method (e.g., procedure) for scheduling processing resources for physical layer (PHY) communications in a wireless network shown in
The method can include the operations of block 1510, in which the TRS can estimate processing resources needed, during a subsequent second duration, for PHY communications in one or more cells of the wireless network. The estimate can be based on:
In some embodiments, the processing resources comprise a plurality of commercial off-the-shelf (COTS) processing units, the resource management function is an operating system (OS) or a virtualization layer executing on the processing units, and TRS also executes on the processing units. In some embodiments, the first transmission timing configuration can be received from a cell management function in the wireless network, the current workload can be received from the respective radio units (RUs), and information about user data traffic scheduled can be received from a user plane scheduler in the wireless network. An example is shown in
In some embodiments, for each cell, the first transmission timing configuration includes one or more of the following: time-division duplexing (TDD) configuration of a plurality of slots in each subframe; relative or absolute timing of an initial slot in each subframe; and relative or absolute timing of an initial symbol in each slot.
In some embodiments, the first duration includes a plurality of slots and the request is sent at least the scheduling delay before the second duration. In some embodiments, the second duration is based on hard real-time deadlines associated with the transmission or reception in the one or more cells by the RUs.
In some embodiments, the first duration includes one or more subframes. In such embodiments, the information about user data traffic includes traffic load for each of the following channels or signals during each of the one or more subframes: physical uplink control channel (PUCCH), physical uplink shared channel (PUSCH), physical downlink control channel (PDCCH), physical downlink shared channel (PDSCH), and sounding reference signals (SRS). In some of these embodiments, each subframe includes a plurality of slots and the information about user data traffic includes traffic load for each of the signals or channels during each of the plurality of slots (i.e., in each subframe). In some of these embodiments, the information about user data traffic also includes requirements during each of the one or more subframes for beam forming and/or beam pairing associated with the user data traffic.
In some embodiments, estimating the processing resources needed (e.g., in block 1510) can be further based on information about user data traffic scheduled for transmission or reception in the one or more cells during a plurality of durations before the first duration. This is exemplified by a moving average (MA) filter. In other embodiments, estimating the processing resources needed can be further based on estimated processing resources needed during a plurality of durations before the second duration. This is exemplified by an autoregressive (AR) filter. Combinations of these embodiments are also possible, e.g., an ARMA filter.
In some embodiments, estimating the processing resources needed (e.g., in block 1510) can include the operations of sub-block 1511, where the TRS can estimate the processing resources needed in each particular cell based on a cost in processing resources per unit of data traffic for each signal or channel associated with the user data traffic; and a number of traffic units for each signal or channel in the particular cell. This is exemplified by factors cost) and loadi,j discussed above. In some embodiments, estimating the processing resources needed can also include the operations of sub-block 1512, where the TRS can scale the estimated amount of processing resources needed for the respective cells based on a function of respective current workloads of the RUs serving the respective cells. This is exemplified by factor Ri discussed above.
In some embodiments, estimating the processing resources needed (e.g., in block 1510) can also include the operations of sub-blocks 1513-1514. In sub-block 1513, the TRS can sum the scaled estimated amounts of processing resources needed for the respective cells. In sub-block 1514, the TRS can add to the sum a minimum per-slot processing resource for each cell. This is exemplified by factor C·ncell, discussed above.
In some embodiments, the one or more cells can include a plurality of cells and the exemplary method can also include the operations of block 1530, where the TRS can send, to a cell management function, one or more of the following:
In some of these embodiments, the exemplary method can also include the operations of blocks 1540-1560. In block 1540, the TRS can receive, from the cell management function, a second transmission timing configuration for the plurality of cells. For at least one of the cells, the second transmission timing configuration can include a transmission timing offset (e.g., some number of slots) relative to the first transmission timing configuration. In block 1550, the TRS can estimate further processing resources needed, during a subsequent third duration, for PHY communications in the plurality of cells based on the second transmission timing configuration. In block 1560, the TRS can send, to the resource management function, a request for the estimated further processing resources during the third duration.
In some of these embodiments, the further processing resources have reduced variation across slots of a subframe relative to the processing resources estimated based on the first transmission timing configuration. An example is shown in
Although
Additionally, the operations corresponding to the method (including any blocks and sub-blocks) can also be embodied in a non-transitory, computer-readable medium storing computer-executable instructions. The operations corresponding to the method (including any blocks and sub-blocks) can also be embodied in a computer program product storing computer-executable instructions. In either case, when such instructions are executed by processing circuitry associated with a TRS, they can configure the TRS to perform operations corresponding to the method of
Similarly, embodiments can also include a processing system for PHY communications in the wireless network. The exemplary processing system can include a plurality of processing units and one or more memories storing executable instructions corresponding to the TRS and a resource management function arranged to allocate the processing units for software tasks associated with the PHY communications. An example is illustrated by
In various embodiments of the processing system, the processing units can be any of the following: graphics processing units (GPUs); Intel x86 processors or equivalent; or reduced instruction set computing (RISC) processors (e.g., ARM processors). In various embodiments of the processing system, the resource management function can be a virtualization layer or an operating system. Examples are shown in
In some embodiments, such a processing system can also be part of a wireless network comprising one or more virtualized distributed units (vDUs) and a plurality of radio units (RUs), each serving one or more cells. Each vDU can include the processing system and can be communicatively coupled to a different portion of the RUs (i.e., than other vDUs). An example is shown in
The foregoing merely illustrates the principles of the disclosure. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous systems, arrangements, and procedures that, although not explicitly shown or described herein, embody the principles of the disclosure and can be thus within the spirit and scope of the disclosure. Various exemplary embodiments can be used together with one another, as well as interchangeably therewith, as should be understood by those having ordinary skill in the art.
The term unit, as used herein, can have conventional meaning in the field of electronics, electrical devices and/or electronic devices and can include, for example, electrical and/or electronic circuitry, devices, modules, processors, memories, logic solid state and/or discrete devices, computer programs or instructions for carrying out respective tasks, procedures, computations, outputs, and/or displaying functions, and so on, as such as those that are described herein.
Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include Digital Signal Processor (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as Read Only Memory (ROM), Random Access Memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and/or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.
As described herein, device and/or apparatus can be represented by a semiconductor chip, a chipset, or a (hardware) module comprising such chip or chipset; this, however, does not exclude the possibility that a functionality of a device or apparatus, instead of being hardware implemented, be implemented as a software module such as a computer program or a computer program product comprising executable software code portions for execution or being run on a processor. Furthermore, functionality of a device or apparatus can be implemented by any combination of hardware and software. A device or apparatus can also be regarded as an assembly of multiple devices and/or apparatuses, whether functionally in cooperation with or independently of each other. Moreover, devices and apparatuses can be implemented in a distributed fashion throughout a system, so long as the functionality of the device or apparatus is preserved. Such and similar principles are considered as known to a skilled person.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In addition, certain terms used in the present disclosure, including the specification and drawings, can be used synonymously in certain instances (e.g., “data” and “information”). It should be understood, that although these terms (and/or other terms that can be synonymous to one another) can be used synonymously herein, there can be instances when such words can be intended to not be used synonymously. Further, to the extent that the prior art knowledge has not been explicitly incorporated by reference herein above, it is explicitly incorporated herein in its entirety. All publications referenced are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2020/061295 | 11/30/2020 | WO |