Telecommunications networks complexity is increasing with the presence of multiple different Radio Access Technologies (such as 3G, 4G and 5G) and with major network transformations toward virtualized, software-defined and cloud-based infrastructure. As such, telecommunications networks are becoming more dynamic and self-organized and they now also need to meet stricter service level agreements for applications such as enhanced mobile broadband, essential services and massive Internet of Things. Delivering highly reliable services in such a context requires monitoring systems with advanced troubleshooting capacity to efficiently resolve any service performance degradation or outage.
The network monitoring data can take various forms depending on the networks being supervised. To list some examples: Call Traces for the Radio Access Network (RAN), Call Data Records or Session Data Records for the Mobile Core Network, metrics and key performance indicators (KPI) for the core network, log file and metrics for data center network. In all cases, the quantity of information is typically very high and isolating the cause of any performance degradation or outage is difficult. More and more, manual investigation is prohibitively difficult. Thus, automatic analysis is required to increase the efficiency of the analysis and reduce resolution time.
Telecommunications network analysis systems that aim to uncover problems in the network are commonly called root cause analysis (RCA) systems and have been used for decades. These RCA systems often have a wide range of implementation strategies, ranging from expert systems to statistical approaches. Although some of the techniques employed in RCA could be used across domains and applications, identifying the source of a fault often requires more specific knowledge of the context.
Uncovering possible causes for faults in modern telecommunication networks remains an area of open research due to the complexity and evolving nature of this type of network. First, a diagnosis system should work for various types of data logs (e.g., voice calls, data, multimedia sessions, system telemetry, and other operational aspects) as communication networks carry large amounts of data traffic along with traditional voice signals for a call, which might include network operations and entities outside of immediate context for the call or session. Second, a diagnosis solution should work with the increasing number of features. Logs can include features related to the service (e.g., the content provider, the quality and priority classes), the network (e.g. the Radio Access Technology and the involved gateways), and/or the user (e.g. the handset type and the handset manufacturer). Further, these features can depend on each other due to the architecture of network and services. Third, a diagnosis solution should address the complex interplay between features—for example, an OS version not supporting a particular service. Both the service and the OS can behave normally in a majority of sessions when scrutinized independently; however, the issue might only be diagnosed in logs containing both. Finally, the diagnosis solution should focus on problems that have an actual impact on the network performance. A problem that happens sporadically in a device used by millions of users can have a greater importance than a problem that occurs regularly in a device used by only hundreds of users. The balance between number of occurrences and inefficiency is a matter of prioritizing mitigation actions.
Amongst conventional RCA solutions applied in telecommunications networks, two common approaches can typically be distinguished. One approach involves analysis or diagnosis is implemented by scrutinizing one feature in particular. The other main approach, which may cover a range of techniques, involves analysis of network topological structure.
Single feature analysis is the most popular approach, as this can most easily be applied and interpreted by telecom network experts when attempting to isolate a possible cause for a network inefficiency. As used herein, the term “network inefficiency” or “inefficiency” applied to a feature, network entity, or the like, refers to one or another form of degradation, failure, of network operation or performance. Although single feature analysis is the simplest approach for exploration, excluding scenarios where more than one feature may be contributing to the problem can miss important insights. The main drawback of this approach is that it does not properly account for possible additive effects or incompatibilities between multiple features for degradation in network performance. Network performance can only be fully explained by combining different network elements, as the impact of a single network element might be insufficient to produce noticeable degradation. As such, failure to explore network interactions reduces the likelihood of isolating the root cause of a problem.
Thus, diagnosis based on an isolated feature approach, while understandable and manageable by telecommunication experts, has limits in that it typically does not account for feature dependencies and interactions. For example, the cells connected to a low performing Base Station Controller (BSC) may appear as inefficient. Approaches evaluating one feature at a time may be limited in that they ignore all the problems produced by multiple features, such as incompatibility between components.
Traditional RCA for telecommunication networks using network topological structure offers the advantage of leveraging network domain knowledge for quick assessment of topologically connected elements. However, a fixed topology may limit the discovery of interactions between distant, seemingly unrelated nodes. Topology based strategies also require knowledge of the telecommunication network topology which, as telecom networks become more complex, becomes increasingly problematic. Thus, a solution that does not depend on topology might better leverage connections between distant problematic network elements and generalize to increasingly complex telecommunication networks.
To summarize, the growing complexity of telecommunication networks has made traditional applications of RCA largely impractical. Investigations at the level of network elements can miss more complex arrangements, while predetermined/defined topologies will generally not account for other possible interactions in the network; these account for the traditional statistical and rule-based approaches, respectively, which are becoming too simplistic relative to the complexity of the data. Thus, there is a need for new approaches and techniques that can accurately and reliably account for the effects on network performance of interactions between the many operational elements as they contribute to network performance, while at the same time providing an analysis which relates particular operational elements to network degradation in a non-linear manner. Further, these new systems are needed to improve the automated discovery of topological relations, discovering and controlling for the context of the interactions between the operational elements.
Accordingly, the inventor has recognized (1) that modern machine learning (ML) techniques can be applied to modeling communication network operations in a manner that learns new, and incorporates known, interactions of the operational elements and features that are inputs to an ML model, and (2) that analytical model interpretation techniques can be applied to the ML models themselves to yield data that can effectively explain the ML model predictions in terms of individual model inputs. Modern techniques in machine learning and deep learning are designed to represent large amounts of data, and the inventor has recognized that machine learning can be utilized to devise novel techniques for and approaches to performance and fault analysis of communication systems. In particular, as described herein, machine learning can be used to construct performance and fault analysis system, referred to hereinafter as “ML-based PFA systems.” Since machine learning does not necessarily enforce causal relationships, the description of ML-based PFA systems herein largely avoids the RCA terminology that is implicitly or explicitly associated with “cause.” However, example embodiments of ML-based PFA systems nevertheless significantly expand on what traditional correlation-based RCA systems do.
Machine learning and deep learning strategies offer state-of-the-art performance for predictions. Given sufficient data, a machine learning model offers a finely-tuned mapping (i.e., function) between the data it was trained on and some target variable (e.g., latency). Yet the complexity of modern machine learning models can often make interpretability difficult (e.g., deep learning models), while simpler models are often easier to understand (e.g., linear models, tree-based models). Consequently, modelling using more interpretable models may offer particular advantages as part of a ML-based PFA system.
Example embodiments described herein attain the high-prediction performance of complex models while remaining interpretable, where prediction performance may be measured in one or more ways, including but not limited to metrics such as prediction accuracy, precision, and recall. This is accomplished by using extreme gradient boosting algorithms to form tree-based models, a strategy that commonly offers good prediction performance for structured data like that generated by telecommunications systems (e.g., call data records, session data records), while also remaining interpretable using a state-of-the-art interpretability strategy called SHapley Additive exPlanations. Advanced machine learning interpretability strategies such as SHAP have not generally (if at all) been used previously to explain telecommunication networks data. Moreover, the specific application for fault detection and analysis using the model to represent relationships, extract and highlight the relevant differences in relative feature contributions is novel.
The strategy involves first representing telecom data into a non-linear model, such as a decision tree structure formed using a gradient boosting algorithm, to represent relationships and interactions between a set of inputs (e.g., call detail record, or CDR, dimensions and metrics) to a target variable outcome (e.g., average throughput, call status) that might serve as a key performance indicator (KPI). In other words, in a CDR example the model is a non-linear function that maps CDR inputs to target field is formed, ensuring that complex interactions in the inputs are not ignored as they would be in conventional linear correlational and expert rule approaches. Once the representation is formed, the quality of which can be determined by the model prediction performance on samples of unseen data, the strategy involves probing the model to gauge the importance of different input features or elements (e.g., a continuous or discrete value, setting, or category identifier for a feature) on specific outcomes given a problematic context (e.g., a problematic region, a dropped call status, etc.). Taken together, we produce a non-linear model that captures relationships to a target variable outcome and use this model to understand problems given particular contexts of a telecom network data, a strategy that more effectively incorporates the complexity and context dependence of telecom network data than traditional approaches such a linear correlation and expert rules.
The specific approach to attribute individual feature contributions to the outcome (e.g., how much individual locations, dimensions and metrics were associated with a KPI such as average throughput or the final call status), a strategy called SHapley Additive Explanations (SHAP) is used (e.g., Scott M Lundberg and Su-In Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems 30, 2017, pp. 4768-4777). The SHAP technique computes Shapley values for the marginal expectation or conditional expectations for feature values in the context of a specific example by analyzing a machine learning model. This approach provides fair contributions of each feature-value pair to the model prediction. The conventional use of SHAP is for model interpretability, to understand and explain why a model is making specific predictions, but not as part of a ML-based PFA system. To our knowledge, such a system has not been used for performance and fault analysis in telecommunications networks.
Accordingly, in one aspect, example embodiments may involve a computer-implemented method. The method may include: obtaining a set of computer-readable training data records that each characterize operation of a communication network, wherein each given training data record includes a plurality of operational features of the communication network and one or more observed performance characteristics of the communication network, and wherein each operational feature is associated with one or more feature-value pairs specific to the given training record, and each of the one or more observed performance characteristics corresponds to an observation specific to the given training record; using at least a portion of the set of training data records to train a machine learning (ML) model of network performance to predict expected performance characteristics given the plurality of operational features in the training data records as input and the one or more observed performance characteristics as ground truths, wherein the ML model is configured for computing mappings of given input feature-value pairs to output predicted performance characteristics, and wherein, for each input training data record, the mappings represent relationships and/or interactions between one or more combinations among the plurality of operational features and one or more predicted performance characteristics; for each input data record of a first subset of the set of training data records, computing a fair distribution of first respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model, wherein the first subset includes at least those training data records sufficient to represent a baseline of observed performance characteristics; for each input data record of a second subset of the set of training data records, computing a fair distribution of second respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model, wherein the second subset includes only those training data records representing at least one problematic observed performance characteristic; and comparing the first and second respective quantitative contributions to determine a respective degradation metric for associating each of the plurality of operational features of the second subset with the at least one problematic observed performance characteristic of the second subset.
In another aspect, example embodiments may involve a system having one or more processors; and memory configured for storing instructions that, when executed by the one or more processors, cause the system to carry out various operations. The operations may include: obtaining a set of computer-readable training data records that each characterize operation of a communication network, wherein each given training data record includes a plurality of operational features of the communication network and one or more observed performance characteristics of the communication network, and wherein each operational feature is associated with one or more feature-value pairs specific to the given training record, and each of the one or more observed performance characteristics corresponds to an observation specific to the given training record; using at least a portion of the set of training data records to train a machine learning (ML) model of network performance to predict expected performance characteristics given the plurality of operational features in the training data records as input and the one or more observed performance characteristics as ground truths, wherein the ML model is configured for computing mappings of given input feature-value pairs to output predicted performance characteristics, and wherein, for each input training data record, the mappings represent relationships and/or interactions between one or more combinations among the plurality of operational features and one or more predicted performance characteristics; for each input data record of a first subset of the set of training data records, computing a fair distribution of first respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model, wherein the first subset includes at least those training data records sufficient to represent a baseline of observed performance characteristics; for each input data record of a second subset of the set of training data records, computing a fair distribution of second respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model, wherein the second subset includes only those training data records representing at least one problematic observed performance characteristic; and comparing the first and second respective quantitative contributions to determine a respective degradation metric for associating each of the plurality of operational features of the second subset with the at least one problematic observed performance characteristic of the second subset.
In yet another aspect, example embodiments may involve an article of manufacture including a non-transitory computer-readable medium, having stored thereon program instructions that, when executed by one more processors of a system, cause the system to carry out various operations. The operations may include: obtaining a set of computer-readable training data records that each characterize operation of a communication network, wherein each given training data record includes a plurality of operational features of the communication network and one or more observed performance characteristics of the communication network, and wherein each operational feature is associated with one or more feature-value pairs specific to the given training record, and each of the one or more observed performance characteristics corresponds to an observation specific to the given training record; using at least a portion of the set of training data records to train a machine learning (ML) model of network performance to predict expected performance characteristics given the plurality of operational features in the training data records as input and the one or more observed performance characteristics as ground truths, wherein the ML model is configured for computing mappings of given input feature-value pairs to output predicted performance characteristics, and wherein, for each input training data record, the mappings represent relationships and/or interactions between one or more combinations among the plurality of operational features and one or more predicted performance characteristics; for each input data record of a first subset of the set of training data records, computing a fair distribution of first respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model, wherein the first subset includes at least those training data records sufficient to represent a baseline of observed performance characteristics; for each input data record of a second subset of the set of training data records, computing a fair distribution of second respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model, wherein the second subset includes only those training data records representing at least one problematic observed performance characteristic; and comparing the first and second respective quantitative contributions to determine a respective degradation metric for associating each of the plurality of operational features of the second subset with the at least one problematic observed performance characteristic of the second subset.
In still another aspect, example embodiments may involve a system that may include various means for carrying out each of the operations of the first and/or second example embodiment.
These, as well as other embodiments, aspects, advantages, and alternatives, will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. Further, this summary and other descriptions and figures provided herein are intended to illustrate embodiments by way of example only and, as such, that numerous variations are possible. For instance, structural elements and process steps can be rearranged, combined, distributed, eliminated, or otherwise changed, while remaining within the scope of the embodiments as claimed.
Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features unless stated as such. Thus, other embodiments can be utilized and other changes can be made without departing from the scope of the subject matter presented herein.
Accordingly, the example embodiments described herein are not meant to be limiting. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations.
Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.
Additionally, any enumeration of elements, blocks, or steps in this specification or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are carried out in a particular order.
A. Example Data, Notation, and Overview
Traditional automatic root cause analysis for service performance degradation and outages in telecommunications networks, together referred to as network inefficiencies, may be configured to exploit data collected by monitoring entities (e.g. physical, and virtual probes, logging systems, etc.) within telecommunication networks. Because the same types of data may be used in the performance and fault analysis techniques and example PFA systems described herein, a general review of the data collected by various monitoring entities of telecommunication systems is provided below.
The data provided by the monitoring entities form a dataset that may be used for performance and fault analysis and for root cause analysis. A dataset may be described as a collection of feature vectors, where a feature vector is a list of feature-value pairs. Each feature-value pair is also referred as an element of the feature vector. A feature refers to a measurable property, such as utilization, load, or may refer to a tag or label identifying an operational component, such as a device, service, program, or IP address of the network. A value may be either categorical (e.g., a device brand or model) or numerical (e.g., a measured and/or detected parameter value, which may be discrete or continuous, or Boolean value). In practice, there may be a plurality of features, and a respective plurality of possible values for each feature. In some illustrative discussions, features may be denoted as fj, j=1, . . . , n, where n specifies the number of features, and a values will be denoted by vk, k=1, . . . , mj, where mj specifies the number of values for feature fj. A feature-value pair may also be referred to as an element, ei, where ei=(fi,vk), k=1, . . . , mj, for each j=i.
Table 1 shows a simplified example of a dataset, where the features describe the attributes of the parties involved in mobile communications, such as a call or session. In this example, there are six features (n=6). The number of possible values for each feature is not necessarily indicated, but it may be seen that there are at least two values for each feature. Each row of the table includes a feature vector followed an associated performance metric, which, for purposes example, is a response time. There could be different and/or additional performance metrics logged for each record. Each row of the table may also correspond to a record of a database or dataset of performance data that may be obtained by one or more monitoring devices in or of a communication network. The vertical ellipses in the last row indicate that there may be more entries in the table (i.e., more records containing feature vectors). In particular, the statistical analyses described are generally applied to the performance metrics. As such, it may generally be assumed that there are sufficient numbers of records to help ensure the validity and/or accuracy of the statistical analyses. In practice, this may typically be the case, as the number of call records, session logs, performance logs, and the like usually stretch into the hundreds, thousands, or more over typical collection time spans. The four records shown in Table 1 thus serve to illustrate concepts of analysis relating to various data selection criteria, with the assumption that number of actual records involved may be much larger.
For convenience in the discussion herein, each row is labeled with a record number (“Rec No.” in the table), although this label may not necessarily be included in an actual implementation of the table. It should be understood that the form and content of Table 1 is an example for illustrative purposes of the discussion herein, and should not be interpreted as limiting with respect to example embodiments herein.
The organization of records containing feature vectors and performance metrics into a table, such as Table 1, may serve to describe certain aspects of the analysis described below. Specifically, it may be seen that each feature corresponds to a column in the table, and that the features of each row correspond to feature vectors. The entries in the feature columns correspond to values, and the column heading—i.e., feature—plus a given value corresponds to an element. For example, the pair (Service Type, VoIP) is an element that is present in both the second and third rows or data records. In later descriptions, when the term “feature” is used, it will usually refer to an entire column. And reference to a set of data containing only a specific element will be used to mean a subset of records each containing only feature vectors having specific feature-value pair combination corresponding to that element. For example, a subset of the data containing only the element (Service Type, VoIP) would be a subset of only the second and third records. In the continuous case as with performance metrics, the element can be a particular value or range of values (e.g., a quantile) depending on the use-case and what represents the most meaningful grouping. In addition, subsets of data need not necessarily be separate from Table 1. Rather, they may be viewed as Table 1 with ancillary information specifying which rows and/or columns are under consideration.
One of the main goals of both automatic root cause analysis and performance and fault analysis is determining which feature-value pairs are most associated to network inefficiencies, where, as noted, “network inefficiency” is a term used herein to describe degradation (including possible failure) of one or more aspects of network performance below some threshold or statistically meaningful level. Traditional root cause analysis attempts to attribute feature-value pairs as causes of inefficiencies, while PFA may identify correlations and not make causal claims. A network inefficiency may also be described as a statistically meaningful negative contribution to one or more aspects of network performance. Thus, a feature-value pair is considered to be inefficient if it causes or is associated with a statistically meaningful negative contribution to one or more aspects of network performance.
In the context of ML-based PFA systems described herein, feature-value pairs are inputs to a ML model of network performance, while one or more predicted performance characteristics are outputs of the model. As described below, training such an ML model involves iteratively adjusting model parameters to achieve some prescribed level of agreement between predicted performance characteristics and observed performance characteristics, given feature-value pairs as inputs. In accordance with example embodiments, data records corresponding to actual communication network operation may include and/or be associated with sets to feature-value pairs (such as feature vectors described above) as well as observed performance characteristics collected (e.g., via monitoring) during operation, for example. Thus, data records may be considered as providing both input data, as well as “ground truth” data for training an exemplary ML model of network performance. By way of example, data records may be or include call detail records (CDRs) and/or session detail records (SDRs).
As also described below, while the ML model can be trained to accurately predict various performance characteristics given input feature-value pairs, a goal of example embodiments herein is to interpretatively analyze a trained model in order to quantitatively evaluate how specific features and interactions between features impact and/or influence the predicted performance characteristics that are the outputs of the model. More particularly, the ML model may be constructed to incorporate complex interactions among and between the model inputs as they relate to the outputs of the model. As such, training may yield a ML model that accurately predicts outputs, but that may also be too complex to enable practical (or event tractable) direct analysis that explains the connections between the inputs (including their interactions) and the outputs. Example embodiments further involve techniques for computing for a ML model a form of diagnostic or explanatory data that can be used to obtain the quantitative contributions of features and feature interactions to network performance.
In the discussion herein, a data record (such as a CDR or SDR) may be described as including one or more “operational features” and one or more “observed performance characteristics” of a communication network. Further, “operational features” may be described as being “associated with one or more feature-value pairs” specific to the data record. This terminology should be understood to mean that a data record may include a data label, variable, or parameter name that identifies an operational feature of the network. A feature-value pair associated with an operational feature in a particular data record thus assigns a specific value (or values) to that operational feature for the particular record. For example, a set of CDRs may all include “base station” as an operational feature, while each respective CDR of the set may have a specific value assigned to the “base station” of the respective CDR, the specific value identifying a specific actual base station of the network that handled a call associated with the respective CDR. The label “base station” and the assigned value in each respective CDR forms a feature-value pair of the respective CDR. Similarly, the term “performance characteristic” may be considered a label or name of an observable, detectable, and/or measurable characteristic of network performance, while the one or more observed performance characteristics included in (or associated with) each respective CDR records actual observations, detections, and/or measurements of performance characteristics of the network during the call associated with the respective CDR.
In accordance with example embodiments, the trained ML model may provide an accurate (and complex) mapping of input feature-value pairs to output predicted performance characteristics, while the analytical interpretation of the model may provide a quantitative evaluation of how specific feature-value pairs contribute to the output predicted performance characteristics. By ensuring that the ML model as trained is an accurate predictor of performance characteristics—i.e., that the predictions of the trained model match the observed performance characteristics to some specified confidence level, for example—the quantitative evaluation of the trained model may thus yield an effective performance and fault analysis for the network with respect to any one or more input feature-value pair combinations.
In the discussion herein, the term “value” is sometimes dropped from “feature-value” in combination. For example, a “feature” described as contributing to a predicted performance characteristic can refer to a class or group of the same type of operational entity, such as a base station. Or a “feature” described as contributing to a predicted performance characteristic can refer to a specific instance of a class or group of the same type of operational entity, such as a particular base station. It should be clear from context whether the discussion applies to the feature generally, e.g., as in the contribution of a group of base stations to throughput, for example, or to the contribution of a particular base station (i.e., feature-value pair) to dropped calls, for example. As described below, PFA generally proceeds on a per-feature-value pair basis, such that results for group or class of features may be derived by aggregating results from specific instances of a given feature type, for example.
Non-limiting examples of features of CDRs and/or of SDRs include: base station ID, cell ID, sector ID, radio frequency, public land mobile network (PLMN) ID, and radio access technology. Non-limiting features performance characteristics include: call completion rate, dropped call count, data throughput, call quality, and signal-to-noise.
In accordance with example embodiments, the ML model may be trained using a full representation of the data—i.e., data associated with both normal and problematic performance. In this way, the model may be trained using both normal and problematic inputs, resulting a baseline against which exclusively problematic inputs may be compared. This strategy allows for specific questions to be posed in connection with input data. Specifically, problematic samples may be selected for further investigation, using SHAP for analysis of a particular sample/problem set to compare against regular data used to produce the baseline, which there serves as form of control. As such, selecting a problematic sample and comparing this to a representative control/baseline without this problem offers a contrastive view highlighting individual features and elements that contributed most to the predicted problematic outcome, effectively highlighting characteristics in the sample that are most associated with the problem in the problematic sample context.
In short, given a particular set of input feature values, Shapley values assign a contribution to each of the input features to the model predicted output value as compared to the average output value across all examples used to form the model. This contribution quantifies how much each input feature contributed to the difference between the particular output value and the average, expected, output value. Non-linear models capture possible inter-dependencies between input features and the associated output value, and SHAP values provide a fair attribution to each feature that is independent of the order in which the input features are applied (should the model be sensitive to such orderings) and whether all features are present.
There are advantages to this strategy in drawing correct conclusions when performing analysis. For instance, given a particular feature that occurs in a high proportion of problematic cases, it may not necessarily be possible to conclude that this feature is itself problematic, since it may occur in an even higher proportion of normal cases. This can be a serious flaw with any strategy that only observes the subset of data associated with a problem and attempts to draw conclusions. Because the ML model has been trained on normal and problematic cases (and given a sufficiently large dataset and a constrained number of model parameters), the model will not be narrowly focused on the specific input elements found in the sample to draw its conclusions. Instead, it can draw from those input elements as they pertain to the telecommunications dataset used to form the model more generally. Put another way, the model has a general representation encoding the way input elements interact and are mapped to the predicted target variable. This idea extends to the use of controls and baselines with SHAP, since a set of one or more features (e.g., cell frequency) can be held constant for both a problematic sample and an otherwise random sample, allowing for a fair comparison across features and input elements as they contribute to the target between the problematic sample and baseline.
Accordingly, then, example embodiments provide for fault characterization using a detection and analysis system that employs a model which can be non-linear to capture relationships in telecommunications data as they relate to measures of possible faults. The system works by forming a model to learn the relationships, and “unpacks” the model with SHAP to perform an analysis on problematic samples of interest. To assess what is problematic in a given sample, the relative contributions of features and elements may be compared to a control/baselines sample.
B. Example System Architecture
In
When the processing unit is a digital device, the components 106, 108, 110, 112, 114 may be communicatively coupled via a local interface. The local interface can be, for example, but not limited to, one or more buses or other wired or wireless connections, as is known in the art. The local interface can have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, among many others, to enable communications. Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the components.
The network interface may be used to enable the processing device to communicate on a network, such as the Internet. The network interface may include, for example, an Ethernet card or adapter (e.g., 10BaseT, Fast Ethernet, Gigabit Ethernet, 10 GbE) or a wireless local area network (WLAN) card or adapter (e.g., 802.11a/b/g/n/ac). The network interface may include address, control, and/or data connections to enable appropriate communications on the network.
A processor is used as a hardware device for executing software instructions within processing device 100. The processor can be any custom made or commercially available processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the processing device, a semiconductor-based microprocessor (in the form of a microchip or chip set), or generally any device for executing software instructions. When the processing device is in operation, the processor is configured to execute software stored within the memory, to communicate data to and from the memory, and to generally control operations of the processing device pursuant to the software instructions. In an exemplary embodiment, the processor may include a mobile-optimized processor such as optimized for power consumption and mobile applications.
The I/O interfaces, including user interface 116 can be used to receive user input from and/or for providing system output. User input can be provided via, for example, a keypad, a touch screen, a scroll ball, a scroll bar, buttons, and the like. System output can be provided via a display device such as a liquid crystal display (LCD), touch screen, and the like. System output may also be provided via a display device and a printer. The I/O interfaces can also include, for example, a serial port, a parallel port, a small computer system interface (SCSI), an infrared (IR) interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, and the like. The I/O interfaces can include a graphical user interface (GUI) that enables a user to interact with the processing device 100.
The data store may be used to store data. The data store may include any of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, and the like)), nonvolatile (non-transitory computer-readable media) memory elements (e.g., ROM, hard drive, tape, CDROM, and the like), and combinations thereof. Moreover, the data store may incorporate electronic, magnetic, optical, and/or other types of storage media.
The memory may include any of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)), nonvolatile memory elements (e.g., ROM, hard drive, etc.), and combinations thereof. Moreover, the memory may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory may have a distributed architecture, where various components are situated remotely from one another but can be accessed by the processor.
The software in memory can include one or more software programs, each of which includes an ordered listing of executable instructions for implementing logical functions. In the example of
The processing device can be incorporated in a test equipment or be in communication with a test equipment. The test equipment can include different physical media test modules. The physical media test modules include ports and connectors to interface to networks for monitoring and troubleshooting. In an embodiment, a mobile device can execute an application which communicates with the test equipment. The mobile device can communicate with the test equipment via Bluetooth, Wi-Fi, wired Ethernet, USB, via combinations, or the like. The mobile device is configured to communicate to the Internet via cellular, Wi-Fi, etc.
Still referring to
C. Example Computing Devices and Cloud-Based Computing Environments
In this example, computing device 200 includes a processor 202, a data storage 204, a network interface 206, and an input/output function 208, all of which may be coupled by a system bus 210 or a similar mechanism. Processor 202 can include one or more CPUs, such as one or more general purpose processors and/or one or more dedicated processors (e.g., application specific integrated circuits (ASICs), graphical processing units (GPUs), digital signal processors (DSPs), network processors, etc.).
Data storage 204, in turn, may comprise volatile and/or non-volatile data storage and can be integrated in whole or in part with processor 202. Data storage 204 can hold program instructions, executable by processor 202, and data that may be manipulated by these instructions to carry out the various methods, processes, or functions described herein. Alternatively, these methods, processes, or functions can be defined by hardware, firmware, and/or any combination of hardware, firmware and software. By way of example, the data in data storage 204 may contain program instructions, perhaps stored on a non-transitory, computer-readable medium, executable by processor 202 to carry out any of the methods, processes, or functions disclosed in this specification or the accompanying drawings.
Network interface 206 may take the form of a wireline connection, such as an Ethernet, Token Ring, or T-carrier connection. Network interface 206 may also take the form of a wireless connection, such as IEEE 802.11 (Wifi), BLUETOOTH®, or a wide-area wireless connection. However, other forms of physical layer connections and other types of standard or proprietary communication protocols may be used over network interface 206. Furthermore, network interface 206 may comprise multiple physical interfaces.
Input/output function 208 may facilitate user interaction with example computing device 200. Input/output function 208 may comprise multiple types of input devices, such as a keyboard, a mouse, a touch screen, and so on. Similarly, input/output function 208 may comprise multiple types of output devices, such as a screen, monitor, printer, or one or more light emitting diodes (LEDs). Additionally or alternatively, example computing device 200 may support remote access from another device, via network interface 206 or via another interface (not shown), such as a universal serial bus (USB) or high-definition multimedia interface (HDMI) port.
In some embodiments, one or more computing devices may be deployed in a networked architecture. The exact physical location, connectivity, and configuration of the computing devices may be unknown and/or unimportant to client devices. Accordingly, the computing devices may be referred to as “cloud-based” devices that may be housed at various remote locations.
For example, server devices 306 can be configured to perform various computing tasks of computing device 200. Thus, computing tasks can be distributed among one or more of server devices 306. To the extent that these computing tasks can be performed in parallel, such a distribution of tasks may reduce the total time to complete these tasks and return a result.
Cluster data storage 308 may be data storage arrays that include disk array controllers configured to manage read and write access to groups of hard disk drives and/or solid state drives. The disk array controllers, alone or in conjunction with server devices 306, may also be configured to manage backup or redundant copies of the data stored in cluster data storage 308 to protect against disk drive failures or other types of failures that prevent one or more of server devices 306 from accessing units of cluster data storage 308.
Cluster routers 310 may include networking equipment configured to provide internal and external communications for the server clusters. For example, cluster routers 310 may include one or more packet-switching and/or routing devices configured to provide (i) network communications between server devices 306 and cluster data storage 308 via cluster network 312, and/or (ii) network communications between the server cluster 304 and other devices via communication link 302 to network 300.
Additionally, the configuration of cluster routers 310 can be based at least in part on the data communication requirements of server devices 306 and cluster data storage 308, the latency and throughput of the local cluster network 312, the latency, throughput, and cost of communication link 302, and/or other factors that may contribute to the cost, speed, fault-tolerance, resiliency, efficiency and/or other design goals of the system architecture.
As noted, server devices 306 may be configured to transmit data to and receive data from cluster data storage 308. This transmission and retrieval may take the form of SQL queries or other types of database queries, and the output of such queries, respectively. Additional text, images, video, and/or audio may be included as well. Furthermore, server devices 306 may organize the received data into web page or web application representations. Such a representation may take the form of a markup language, such as the hypertext markup language (HTML), the extensible markup language (XML), or some other standardized or proprietary format. Moreover, server devices 306 may have the capability of executing various types of computerized scripting languages, such as but not limited to Perl, Python, PUP Hypertext Preprocessor (PUP), Active Server Pages (ASP), JAVASCRIPT®, and so on. Computer program code written in these languages may facilitate the providing of web pages to client devices, as well as client device interaction with the web pages. Alternatively or additionally, JAVA® or other languages may be used to facilitate generation of web pages and/or to provide web application functionality.
C. Example Model Construction and Analysis Procedures
In accordance with example embodiments, ML-based PFA can be used to address specific questions about network performance. Non-limiting examples of possible use cases for telecommunications data could include:
Using the example of an embodiment of a ML-based PFA system described herein can offer specific advantages over conventional techniques. Some specific advantages include at least:
Construction, implementation, and application of an example ML-based PFA system may be summarized as follows.
A model may be created by learning a representation of telecommunication data, as well as any other data sources that can be used to augment or inform the telecommunication data. This corresponds to the first phase 401.
The representation of the model may be assessed by applying it to unseen data to confirm that representation generalizes and, thus, can be extended to an analysis of unseen data. This corresponds to the (optional) second phase 403.
Analysis may be performed of problematic data samples by producing a SHAP explainer from the learned model, using it to provide explanations of data in problematic samples. By framing specific questions, where samples of interest are compared against a representative baseline, control sample, divergence between the relative importance of different features can be quantified to highlight specific problems in the telecom data for the problematic sample. This corresponds to the third, feature analysis phase 405.
Following feature analysis in the third phase 405, visualization the impact and severity of features and feature-value pairs based on aggregate SHAP contributions to network degradation for the problematic sample may be carried out or generated. The analysis and visualization can yield results for both a particular fixed outcome (e.g., dropped calls), or the magnitude of the degradation for a continuous case (e.g., lower download average throughputs). The analysis and visualization results may also provide a basis for corrective actions in the network to mitigate or resolve identified problems.
One aspect of the ML-based approach described herein that differs from customary application of machine learning strategies is how the trained model is used in evaluating the connections between operational features of a communication network and network performance. More specifically, while the trained ML model of the ML-based PFA system can be used to predict performance of runtime or test data (or unseen data), its primary function in the context of certain performance and fault analysis is to ensure that the model provides an accurate representation of the operation and performance of the communication network, given the operational features of the training data and their mutual interactions as model inputs in contributing to target outcomes. If the trained model is determined to be sufficiently representative of the relationship between operational features and target outcomes, the contributions of the operational features to predicted performance characteristics can be determined using SHAP, and the results can be applied to the quantitative estimation of the impact of individual operational features on the complex interactions among features that yield the observed network performance. As such, the trained ML model can be applied in studies and evaluations of the possible impact of features on performance may be undertaken using the one or more subsets (or even all) of the training data. This is because such studies rely on the ML model to be representative of the data being evaluated in a way that contrasts problematic samples from non-problematic samples, but not to produce subsequent predictions from runtime or unseen data as inputs. Thus, while overfitting the model so that it may not properly represent future states of the system is possible, the analysis on sufficiently large data may still provide insight into problem diagnosis and facilitate the localization of problems.
Nevertheless, the trained ML model can also be used in applications for which predicted performance characteristics is also or primarily a goal or purpose. One example is the optional second phase 403 shown in
Further details of training, assessment, and analysis are discussed below.
Model Training
As shown in
In accordance with example embodiments, after data preparation into a tabular form, an extreme gradient boosting algorithm is used (e.g., XGBoost, LightGBM, CatBoost) to form a tree-based model. This operation may be carried out in a conventional way of creating a machine learning model, and is particularly performant in modeling structured data.
Model Evaluation
In accordance with example embodiments, model evaluation involves applying the trained model (from the model training phase) to test data in an evaluation operation 402-b, as shown in
Model evaluation is standard practice for machine learning, as the goal is typically to create a model to perform predictions on unseen data. This step operates as a check to ensure the model has generalized and not simply memorized the data. However, because the goal of a feature analysis system is to use the model to aggregate or summarize the data for interpretation, this validation step may not always be necessary. Consequently, there are two possible approaches:
One approach evaluates the model against seen data while being reasonable in the constraining of the model size (i.e., there should be fewer model parameters than data points to prevent the model from memorize the training data). This purely analytical approach basically uses the model to gain a representation of the data with the intent of evaluation using SHAP. Then, optionally, the approach could only rely on SHAP feature importances when the input-output associations were correct mapped or have a measure of contribution confidence based on the correctness of the prediction. Adding this optional step will make the approach more robust in ensuring that the feature importances used in assessing a fault are correct and likely makes the most sense for this system. Checking against unseen data may still be a benefit in ensuring that the model does not overfit the data in cases, as described in the approach below.
A second approach, more traditional for machine learning, is to test and evaluate against unseen data prior to using the model representation. This is a more robust approach in that it gives a sense of the models ability to generalize to new data, better ensuring that the representation generalizes to other states of the system. Generalization of the model can be particularly important when evaluating feature importances for unseen data. With such an approach, a model would not necessarily need to be retrained to be used in analyzing new problems on the system from which the original training data was obtained. Again, optionally, the approach could only rely on SHAP feature importances when the input-output associations were correct mapped or have a measure of contribution confidence based on the correctness of the prediction which is more likely to deviate given unseen data.
In accordance with example embodiments, a novel model evaluation for which a test set or validation set is not needed may be used, in contrast to traditional machine learning, at least when the primary goal is not to predict values for unseen data, but is instead to use the model as a form of aggregation explain what's “going on” in the data. Unless there is egregious overfitting the entire dataset may be used to form the ML model.
Feature Analysis
In the feature analysis phase 405, the trained model may be applied separately to a control sample, essentially a random while applying constraints if a particular comparison demands it, of the training data and to a sample of interest that represents some problematic aspect of performance that is also present in the training data. The control sample could be a subset of randomly selected data records from the training data records, or could be the entire training data set. In either case, it is taken to be representative of overall network performance, such that it provides a baseline or control sample of predicted network performance against which predicted performance of the sample of interest can be compared. When taking a random sample for the entire training set, the mean values of all feature contributions is expected to be zero (0), and, consequently, the sample is not explicitly required in cases where only the difference in mean feature contributions are compared. In other words, when comparing mean values and a random sample is an appropriate control, a SHAP of zero (0) can be used as the baseline feature contribution instead of explicitly drawing a number of random records as a control sample, which is why the control sample is shown as a dashed line.
In accordance with example embodiments, the control sample and a sample of interest may be input to feature-contribution derivation operation 408. This operation first applies the trained ML model to the two input sets to separately compute predicted performance characteristics. The predicted performance characteristics of the control sample (which, again, may be the entire training data set when the model itself is considered) yields a representation of baseline performance. The predicted performance characteristics of the problematic sample yields a representation of performance that is problematic in some sense. More specifically, the problematic sample may be selected specifically on the basis of an observed performance characteristic that is considered problematic, suboptimal, or otherwise representative of degraded performance. For example, the observed dropped call rate in a particular region may be unacceptably high compared to an observed dropped call rate averaged over all regions. The data composing the problematic sample may thus be selected to include all records for the particular region that were dropped. This is an example of how posing a question may be translated to problematic sample selection tailored to evaluation and/or investigation of specific feature contributions using the ML model and explainer analysis.
Evaluation of the ML model is applied separately to each data record (e.g., each CDR and/or SDR) of each data set which is used to characterize the control sample data set and the problematic sample data set. In this way, the contributions of input elements are computed for each record of both input data sets. To do this, an algorithm such as SHAP is applied to each record, to provide a computational analysis to determine a fair distribution of quantitative contributions of each of the operational features to the one or more of the predicted performance characteristics computed by the ML model for the respective record. Because complex models can produce a non-linear mapping of input feature-value pairs to output performance characteristics, these quantitative contributions cannot be derived by linear analytical methods. Rather, the fair distribution criteria in an additive explanation strategy like SHAP values ensures that for each given predicted performance characteristic, the sum of the contributions of each of the input feature-value pairs adds up to the difference between the predicted performance characteristic for the record and the mean (i.e., expected) performance characteristic of all records in the model training set. As such, the fair-distribution approach effectively yields an empirically derived quantitative contribution for each input feature-value pair in a record.
Carrying out this analysis record-by-record for specific performance characteristics generates a collection of feature-contribution data to which a variety of statistical analyses may be applied in order to investigate a range of questions relating to the impact and/or influence of individual feature-value pairs and/or classes or categories of features on network performance. Further, the feature-contribution data generated from the control sample of data records can provide baseline feature contributions for one or more predicted performance characteristics output by the ML model. Similarly, the feature-contribution data generated from the problematic sample of data records can provide a form of diagnostic feature contributions for one or more predicted performance characteristics for which corresponding observed performance characteristics have been deemed problematic or suboptimal according to some criteria. The baseline contributions values and the problematic sample (diagnostic) feature contributions may be input to performance/fault analysis operation 410, also shown in
In accordance with example embodiments, results and outcomes from the performance/fault analysis operation 410 may be applied to further evaluations, operational interventions, and/or adjustment of actual network components. These operations are grouped collectively as further analyses/actions 412 in
In accordance with example embodiments, determining a fair distribution of quantitative contributions of input feature-value pairs to output predicted performance characteristics—and more generally a fair distribution of quantitative contributions of inputs to output of a ML model—may be accomplished using an interpretability algorithm called SHapley Additive exPlanations to derive Shapley values. In an example implementation, a SHAP explainer may be created for a model based on an open-source framework using the SHAP framework available online at the URL https://github.com/slundberg/shap, which is licensed to allow commercial use under MIT as specified at the URL https://github.com/slundberg/shap/blob/master/LICENSE.
In particular an implementation called TreeExplainer may be used for polynomial time computation, which makes this strategy feasible in for a fault analysis system at a scale of the telecommunications data considered herein. (See, Lundberg et al., “From local explanations to global understanding with explainable AI for trees,” Nature Machine Intelligence, VOL 2, January 2020, pp. 56-67.)
A fault analysis system may work by first framing an initial question for the system. Non-limiting examples of such initial questions include:
SHAP values may then be computed for sets of data records on which to perform fault analysis on a set of data that satisfies the question criteria. By way of example, each record of data could be a CDR or SDR. SHAP values offer a fair contribution for each feature-value pair in the evaluated record to the prediction(s) of the model. Thus, calculating SHAP over a set of records can be used to produce distributions of relative feature importance, thus allowing for the characterization of a larger sample of records.
The characterization of a fault works by evaluating relative importance of specific feature contributions to a problematic outcome as compared to a normal contribution of that feature from a representative baseline control group. To do this, feature impact may be evaluated by comparing SHAP values distributions from the sample of interest to an appropriate baseline, which can be randomly sampled baseline or a selectively sampled baseline constructed to be representative by selective sampling from a non-problematic data set (i.e., control sample). The difference can be quantified, for example, by computing an effect size (e.g, Hedge's g) for a difference that compares the baseline importance of a specific feature in the problematic sample case to its importance in the base case. This offers an objective way to gauge the impact of feature importance given the full context presented in the problematic sample. Distances between sample input features and elements can be compared using non-parametric distances (e.g., Wasserstein distances), or other distribution comparison strategies allowing the quantification of how dissimilar a problematic sample is from a representative baseline.
Visualization
Conventional visualization libraries included in the SHAP project are primarily distribution-based, without aggregation across input elements importances, and do not typically include much in the way of feature-value pair specific comparisons across groups to interpret the contributions of individual features. Yet, this is a particularly important framing when trying to qualify specific values like instances of categorical variables (e.g., handset type, cell-ID) and in reducing the dimensionality of the problem to provide a more concise answer. For example, knowing that handset types used in a particular region were problematic is not as valuable as knowing which handset types were problematic. Similarly, by grouping metrics as quantiles, the SHAP values can be considered for specific ranges.
One approach to show the most significant input elements contributing to degradation for a problematic sample is by computing impact and severity for a problematic sample. An example is shown in Table 2. Specifically, a region of interest was chosen from coordinates on a map and CDRs falling in this region were selected for analysis. In the example, severity is used to indicate a throughput reduction (i.e., negative SHAP value) attributed by the model to examples containing the particular input element or quantile for a metric. Because not all CDRs contain this input element, the severity may be scaled by the fraction of CDRs containing the element (i.e., by multiplying the severity by the fraction) to produce the impact of this input element to throughput reduction across the sample of interest. In an investigation of problematic throughput, the highest impact input elements may therefore be of interest in diagnosing the source the problem based on the model representation.
In Table 2, the items in the column labeled “Elements” are feature-value pairs. For example, the first two table rows each show severity and impact for a feature labeled “End Cell eNodeB” that corresponds to an eNodeB that terminates a call. The feature-value pairs for the two rows are different: each feature valued identifies a specific eNodeB. The impact of each given eNodeB on throughput is a severity, determined from the average SHAP value for that feature-value pairing, scaled by the fraction of data records (e.g., CDRs) corresponding to call traces that used the given eNodeB.
The impact relationships can also be visualized with a Sankey flow-diagram. For the example of throughput degradation, such a diagram shows the relationship between the total degradation relative to the expected throughput.
To facilitate visual inspection, superpositions of the distributions could be used for control and sample of interest distributions. Moreover, manual or visual inspection is not required if, instead, statistical comparison strategies are used to compare the distributions of input element importances automatically. In an automated system like this, visualizations could be used as support for users of the system.
D. Extensions to Analysis Techniques
Contrasting Representative Baselines
In accordance with example embodiments, a ML-based PFA system provides a contrastive view of sample baseline performance versus problematic sample performance. Machine learning has a further advantage that it can be used to represent large amounts of data within which there are many complex interactions with a KPI. Because training and testing large machine learning models can be expensive, it can be impractical to devise specific models to represent every possible baseline. In accordance with example embodiments, a general model may be developed to learn a representation of the data for an outcome of interest (e.g., download throughput). Then when, for example, a particular cell in of the network appears to be problematic relative as a representative baseline (e.g., the surrounding cells, or other cells connected to a particular EnodeB, or cells of the same carrier frequency, or the surrounding geographical area), the model can be applied as described above to evaluate how the particular cell and other operational features interact to impact the problematic performance. While a randomly sampled baseline of the network will offer the average outcome for the observed performance characteristic, an effective SHAP contribution of zero for each feature, a more selective baseline may offer a more appropriate comparison by controlling for certain input elements.
For example, a cell carrier frequency may have physical properties that will allow for better throughput transmission. Thus, selecting CDRs that were connected to a nearby cell of the same frequency may be a better choice in isolating true performance differences and possible problems. This can be used to correct for relative contributions of specific features that may be relevant with respect to all samples introduced to the model, but not necessarily relevant when compared to an appropriate baseline.
The strategies and techniques described above may be extended to provide further analyses and investigative capabilities. Some such extensions are discussed below.
In applying a ML-based PFA system as analytical tool, it may not always be possible to always know the specific question or questions to ask a priori. In such instances, a possible alternative approach may be:
This strategy could be used, for example, by domain experts to ask specific questions while controlling for known differences in possible outcomes by the model.
Evaluating Differences Between the Samples
Evaluating differences between the control sample and problematic sample can be done at different levels. Some examples follow.
Quantifying Deviation Between a Baseline Composition to Sample of Interest
Here a goal is to evaluate whether there is a difference between the two samples based on the general composition of contributions across different features.
In particular, the important composition difference between the problematic sample and the baseline sample will indicate the most substantial differences in the problematic sample relative to some control state of the system. When comparing different problematic samples to each other, having feature importance compositions that are sufficiently different can be used to differentiate between underlying problems or use-cases. While SHAP on its own can provide the expected contributions of features to the outcome, a change in the general composition of feature importances can be used to better characterize whether and where a problem has occurred. This comparison can allow the automation to determine whether samples are different and by how much based on composition and prior to investigating individual feature contributions.
Non-liming examples of possible distribution comparison strategies between feature importance distributions may include:
Quantify Difference Between Individual Feature, Feature-Value Pair Contributions
Here the goal is to evaluate whether there is a difference between the two samples based on the general composition of contributions for a particular feature or feature-value pair.
This is the more straightforward comparison strategy that compares the distributions of SHAP contributions of particular input features or elements in one sample against that same feature's contributions in another sample to see whether there is a meaningful difference. Non-liming examples of possible comparison strategies may include:
Fingerprinting
Fingerprinting extends the idea of qualifying and quantifying sample of interest deviation from the baseline for business applications. A fingerprint may be defined as a input element set of importances, which may be a subset of the set of all possible input elements and include relationships between these input elements importances, used to identify particular use cases. A fingerprint describes the set of key input elements which can be used to distinguish a problematic sample to a baseline/control sample. Further, a fingerprint set of elements may also be used to distinguish between problematic sample types or use cases.
These input element sets can be formed in different ways, such as using the ratio of contributing input elements in order to distinguish between specific use cases and a normal state of the system, as well as between use cases that may share certain input element properties but not others. This approach is useful in further automating the diagnosis and classification of problems in the system as the resulting fingerprint of element contributions can be applied as business rules that map a set of contributing elements and/or the relative contributions of these elements to degradation on a system into a human interpretable characterization of a system deficiency or problem.
In order to uncover the correct composition and ratios for fingerprints, domain experts may characterize expected degradation features to create business rules for particular problems or use-cases. These fingerprints are more general than typical rule-based systems, as the learning model will establish the relative contributions of key components while the business rules will establish what these relative contributions of key components mean as a diagnostic.
Alternatively, simulating the problem in a real system under variable loads and recording the impacting degradation could be used to generate fingerprints for problems or use-cases. In order to classify a fingerprint, the model interpretation will provide the relative contributions used to distinguish between use cases separately from the severity of the use case (i.e., two use cases may vary in severity while having the same relative contributions in fingerprint by their respective input elements).
Comparing between use cases can be done in several ways. For instance, by setting a minimum distance (e.g., Wasserstein distance) between the fingerprint result obtained for a problem case and a use case template fingerprint, where the template was created by evaluating the differences between the baseline and the sample. Another could be to compare the input elements importance for classifying a use case using rule-based procedure (e.g., the relative importance of input elements A>B>Others in a layer implies use case 1 and B>A>Others implies use case 2). Further, classification strategies, for instance through the application of new learned models trained on the determined input element importances, could be used to compare, contrast and evaluate fingerprints. Note that fingerprints can be composed of input elements within layer and between different layers, where the layer filters such as the composition features to monitor and include can be established by domain experts. Furthermore, provided a generating process for system problems in a simulated network environment, high-level classification labels from the generating process could be coupled with any classification algorithms.
Noisy-Neighbor Fingerprint Example
In the example fingerprint graphed in
Because there may unequal numbers of input elements in each group, an alternative representation bar chart in
Pie chart and bar chart representations in
Discovering Clusters of Problematic Input Elements
The following strategy was developed to find clusters of input elements that occur and are problematic together. Evaluating characteristics as they occur together can better explain problems in the sample. Presenting characteristics individually for their contribution to degradation with a Shapley additive explanation (SHAP) provides a fair attribution for each features contribution to the prediction of the model, without deliberate filtering on combination characteristics SHAP will not indicate the importance of clusters as they appear together.
Effectively, this proposed strategy is used find clusters of input elements based on occurrence frequency in the sample set in order of importance (e.g., SHAP) extracted from the machine learning model. First, frequent pattern mining (FPM), also known as Association Rule Mining, is used to find the most prevalent patterns of co-occurrence in records of the sample of interest to ensure the patterns occur frequently enough to explain degradation at in the sample. After discovering frequent patterns to establish “characteristic clusters,” the importance of the characteristic cluster is determined by summing the importances of input elements in the cluster, as determined by the learned model, and multiplying by fraction of cases in the sample having the specific pattern or characteristic cluster. A non-limiting example sequence of operations is described below.
Finding Candidate Characteristic Clusters
1. Prepare data by transforming continuous feature metrics to feature categories in FPM (e.g., by applying a binning strategy like using quantile ranges), while feature that are already categorical can be left unchanged.
2. (Optional) Prefilter features based on satisfying a:
3. Get frequently occurring patterns of input elements as they co-occur in records across the sample of interest by using a FPM algorithm such as FP-growth.
Evaluate Cluster Feature Importance
Given a tabular sample with input feature columns and one record per row, for each cluster, until obtaining the desired characteristic cluster size, do:
4. Select only sample rows with characteristic cluster by filtering.
5. Average per-row contributions of each input element in the characteristic cluster for the rows selected in step 4.
6. Sum the average input element contributions from step 5 and multiply these by the fraction of rows in the sample of interest with the characteristic cluster to get the characteristic cluster impact (i.e., sum(cluster_importances)×fraction_with_cluster=cluster impact).
Sorted in order of decreasing impact, the top characteristic clusters will provide combinations of input elements that most impacted the system by: 1) reducing the characteristic cluster impact of those characteristic clusters with infrequent occurrence in the sample of interest, which will ensure clusters do not get too large; and 2) provide an additive contribution by the model of the severity attributed to those input elements when these occur together.
Further pruning can be done to improve the diversity in the presented characteristic clusters. Pruning can include, but is not limited, enforcing that parent characteristic clusters (i.e., having a subset of input elements) are ignored when child characteristic clusters have higher feature importances or impact on the sample of interest.
Comparison to Analysis Structure Prior to Extension
There are a few key differences between the PFA described prior to the clustering section and the Cluster PFA described in this extensions. In regular PFA, the input features or elements are evaluated or aggregated individually. In Cluster PFA, a frequent pattern mining operation is computed prior to determining input element clusters in records prior to other analysis of input elements. Thus, in Cluster PFA the grouping of characteristics clusters may involve two steps: 1) input elements are evaluated based on their co-occurrence in the sample of interest to find candidate characteristic clusters; and 2) the importance value of these characteristic clusters is attributed based on the model importance which is used to find the most impactful characteristic clusters.
In PFA without clustering, mean severity represents the characteristic contribution relative to a baseline (e.g., mean SHAP difference for input elements in the sample of interest relative to the control sample), the fraction represents the number of occurrences of specific input elements over the total possible occurrences in records of the sample of interest, and the impact on the sample is the calculated fraction multiplied by the severity. In Cluster PFA, the mean severity represents the input element(s) group contribution relative to the baseline sample, the cluster fraction represents the fraction of occurrence of the cluster combination of input elements over the total possible occurrences in records of that sample, and the cluster impact is the cluster impact multiplied by the cluster severity.
Note, clusters composed of one single input element in Cluster PFA will produce the same result as PFA. Thus, if clusters composed of one single input element are allowed, the results of the Cluster PFA are a superset of the results of PFA.
Example Output for Importance Cluster 1
An example characteristic cluster that contains a common pattern of input elements for a geographical sample area with low download throughput (i.e., a sample of interest) is shown in Table 3:
These three input elements appear in 36% of the CDRs of the sample of interest which was selected by highlighting a problematic area of the network. The model importance strategy attributes a total reduction of 19.72% relative to a baseline level of performance from the input elements in the characteristic cluster alone.
A dimension/metric view in Table 7 shows the degradation associated with all input elements in the sample after applying the filter described in step 4. This can provide a more nuanced view by including other possible problems that were not selected in the cluster, such as prb_usage_percent metric levels indicating heavy cell usage and possible congestion. In general, the binning strategy used to transform continuous feature metric values into categorical values may change the impact of the cluster and, thus, different binning strategies may be used when appropriate for the use case. For instance, one strategy would be to dynamically decide the number of bins for a metric based on underlying statistical and distribution properties instead of only relying on fixed quantile ranges.
Example baseline statistics are shown in Table 4:
An example subset of sample with a specific characteristic cluster is shown in Table 5:
Table 6 shows example reductions specific to input elements in the characteristic cluster, accounting for 19.72% of the reduction in Throughput based on the model expectation for the cluster subset:
After discovering a cluster, the other input elements in the set given the characteristic cluster can be analyzed. An example of dimension/metric view is shown in Table 7:
Example Output for Importance Cluster 2
Another example of a characteristic cluster in the same geographical sample area with low download throughput is shown in Table 8:
The four input elements in the characteristic cluster appear in 20% of the CDRs of the sample which was selected by highlighting a problematic area of the network. This characteristic cluster shares all the input elements of its parent/superset in the previously described characteristic cluster example, but has an additional quantile range characteristic for PRB Usage Fraction of (0.966, 0.987]. The subset of the sample of interest with the characteristic cluster has a lower impact, as indicated by way of example in Table 9 and Table 10:
The system could be tuned to favor larger explanations like this through pruning by, for example, setting a minimum ratio of impact_reducti/fraction_reduction from parent to child. Limiting the prb_usage_percent to the quantile range (0.966, 0.987] decreased the fraction of affected cases to 20% from 36% in the previous example, but is more precise in indicating the problem. Such pruning constraints can ensure that the clusters chosen are more descriptive of the problem observed in the sample.
Real-Time and Time-Based Analysis
Example embodiments of PFA techniques and PFA systems may also be extended to be able to provide real-time analysis of communication networks as they operate, as well time-based analyses in which the input data may further be treated as time series data, or the like. In accordance with example embodiments, a time-based algorithm could be used for network monitoring, problem analysis and optimization when analyzing data covering an extended time span. Such an algorithm could also be used as part of real-time operational analytics for communications networks. In an example embodiment, an implementation of the algorithm could be executed periodically or continuously, and can feed an operational analytics system configured for deriving the evolution of possible causes and/or correlations in network performance over time.
When the data are represented across time, the key difference is that, instead of representing input elements from uniquely as a discrete record instance (e.g., a CDR trace on a network), the input elements represent the aggregate performance of metrics and/or other system properties over a period of time (e.g., an aggregate state per minute). By aligning different inputs to the system temporally with an outcome of interest, this strategy can be used to evaluate the relationship between distant inputs, or even layers of analysis, with outcomes.
This strategy can be employed using time-series data provided by network monitoring systems and, thus, has a wide range of applications for communication systems. Given a machine learning model that is trained periodically on normal, and abnormal or problematic data, an event can be triggered, such degradation on a observed performance characteristic below a threshold of acceptable performance or the detection of an anomaly can be used to select an event start and end time on which to perform further analysis through model interpretation. As such, interpretation of the model for the input elements (i.e., the context) for the duration of the time window when the observed performance characteristic degradation or anomaly occurred is used to explain possible causes for the event occurrence.
Service Level Assurance Use Case
One particularly useful application for this model interpretation over time-series is in monitoring and troubleshooting systems. In the telecommunication field, such an approach can be used to offer service level assurance, where the service layer is monitored continuously for degradation and anomalies. The underlying telemetry from layers of the system can then be mapped to the result of one or more service tests qualifying the service layer using a learned model. Given the input elements as context over the time points where degradation or an anomaly occurred, model interpretation can be used to explain the degradation using different layers of the input elements. This allows for the localization of problems to specific layers of the communications network and can help direct further investigation.
For instance, given a 5G telecommunications service running an application layer that provides functionality for end-users, the supporting layers can include:
Telemetry data from one or more layers can be used as input elements to train models to associate to one or more observed performance characteristic that is a key performance indicator (KPI) to qualify the service. These KPIs can include tests for performance, quality of experience, etc. This strategy offers a way to manage the high dimensionality of inputs, providing visibility and the ability to focus on those input elements that were most associated with certain problematic KPI test outcomes. Taken more broadly, layers can present different levels of analysis in which localizing an issue is meaningful for further exploration and remediation, which can be especially challenging in complex communication networks. These layers can be represented by separate models when the goal is fine-grained and specific analysis of the most relevant layers or a unified model when the goal is to better establish the point of failure given multiple possible problems across the layers.
The Model Updater 804 and Explanation Extractor 806 run in parallel to the Triggering System 802. The Model Updater 804 is a process that may be used to train a model 805 on timepoints from t0 to tn that maps timepoints for telemetry sources 803 to one or more service layer metrics 801. The Explanation Extractor 806 gets an explainer function 807 for the trained model 805 that can be used by the Event Characterizer. The Model Updater 804 and Explanation Extractor 806 can run periodically or on demand from users or another process.
The Event Characterizer 808 evaluates telemetry sources 801 for the event times produced by the Triggering System 802 using the explainer function 807. Evaluation by 808 can include the characterization at different distinct timepoints producing degradation by timepoint 811 from tk and tn. There results can also determine the aggregate event degradation 813 as severity, fraction and impact for the entirety of the event (i.e., input element importances in the telemetry sources aggregated across event times 809). Finally, both degradation by timepoint 811 and aggregate event degradation 813 can be processed by a Result Analyzer 810 to make the results more useful by providing one or more additional functions such as updating dashboards, evaluating fingerprints in the input elements associated with degradations, providing remediation options for system users, triggering new analysis.
Service Level Assurance on 5G Network Example
For demonstration and development, among other purposes, a prototype of the described system was built using a 5G server environment. For the use-case, CPU cores were pinned to specific tasks on the system. Perturbations where heavy loads were applied to specific cores were used to generate a noisy-neighbor use cases, either on the group of main cores running the 5G service or the group of sibling cores sharing resources with the core running the 5G service via hyper-threading. A test of the KPI for the 5G service, the connection time, was used as a target for the model and was monitored for degradation. Graphical representation of example results using the described strategy are displayed in
The graph in
The features attribution graph in
At time 12:40 the example of
Using Groundtruth Feedback to Provide Confidence Bounds for Model Error
SHAP attributes establish relative importances of each input element in their contribution to predictions for performance characteristics values made by the model. A goal of the fault analysis system described above is to evaluate samples subject to a fault, assessing the features and feature-value pairs most associated with problems in the sample. This sample is commonly framed or thought of as a question (e.g., given a particular region and data CDRs, what explains the poor average throughput observed?), where the answer can be better characterized by a model with a representation of a wider area of the telecommunication network.
However, models that sufficiently complex with regard to real-world data are generally not perfect. As such, in making predictions there will tend to be a difference (i.e., error) between the model predicted value and actual value of a target for any given example. The average error on a set of examples might be taken to represent the quality of the model in representing the relationships between the inputs and target variable for these data.
The described fault analysis strategy uses SHAP to understand data in a sample believed to have an underlying fault, which is unlike some other applications of SHAP which aim primarily to understand the model. When a SHAP explainer is used to evaluate an individual case, a SHAP value is typically given to represent the relative importance of each feature-value pair in the prediction made by the model. This can sometimes lead to some ambiguity in the approach: if model error is high, SHAP values will be divided between the features summing together to the predicted value which will be quite different than the actual value.
Thus, by summarizing the SHAP values to represent a sample, the resulting feature or feature-value pair contributions may be misleading if the errors in individual cases is not accounted for in some way. For instance, a systematic error for a subset of cases may lead to the over or under representation of the importance of a specific feature. Providing visibility to this error may be helpful in providing confidence bounds for the contributions of individual features
To address this problem, a feature importance confidence level may be established based on a prediction error distributed across SHAP importances. Confidence intervals are often used to represent a range of possible deviation from the expected outcome. In this case, however, a determination is being made of confidence for the SHAP result of any given feature based on the error of the overall prediction.
Because the prediction error for the target variable can be known, but whether this error just represents noise in the measurements or contributions from unknown variables may now be known, a reasonable assumption is that the error is evenly distributed across the features by their SHAP magnitude. There is an implicit assumption in such a confidence level assignment, however, that the relationship between the inputs as determined by the model and SHAP is correct. In the absence of a confidence boundary this implicit assumption may also be reasonable.
Non-liming examples of implementation for these confidence bounds could include a linear based feedback, where prediction error is distributed evenly between the feature used in the prediction. Alternatively, another possibility would be to scale the feature importance down or up depending on an overprediction or underprediction, respectively. Ultimately, these strategies uncover divergences between the local model and the ground truth target values observed in the sample being analyzed and, thus, provide a good feedback signal when the model is consistently making incorrect predictions in the presence of certain features.
Broader System Analysis
While the focus of this invention is primarily on the analysis of faults and system degradation, isolating periods of above average system performance could also be used to discover protective network characteristics to inform system remediation strategies.
The embodiments of
The example method 1200 may also be embodied as instructions executable by one or more processors of the one or more server devices of the system or virtual machine or container. For example, the instructions may take the form of software and/or hardware and/or firmware instructions. In an example embodiment, the instructions may be stored on a non-transitory computer readable medium. When executed by one or more processors of the one or more servers, the instructions may cause the one or more servers to carry out various operations of the example method.
Block 1202 of example method 1200 may involve obtaining a set of computer-readable training data records that each characterize operation of a communication network. Each given training data record may include a plurality of operational features of the communication network and one or more observed performance characteristics of the communication network. Further, each operational feature may be associated with one or more feature-value pairs specific to the given training record, and each of the one or more observed performance characteristics corresponds to an observation specific to the given training record.
Block 1204 of example method 1200 may involve using at least a portion of the set of training data records to train a machine learning (ML) model of network performance to predict expected performance characteristics given the plurality of operational features in the training data records as input and the one or more observed performance characteristics as ground truths. The ML model may be configured for computing mappings of given input feature-value pairs to output predicted performance characteristics. Additionally, for each input training data record, the mappings may represent relationships and/or interactions between one or more combinations among the plurality of operational features and one or more predicted performance characteristics.
Block 1206 of example method 1200 may involve, for each input data record of a first subset of the set of training data records, computing a fair distribution of first respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model. The first subset may include at least those training data records sufficient to represent a baseline of observed performance characteristics.
Block 1208 of example method 1200 may involve, for each input data record of a second subset of the set of training data records, computing a fair distribution of second respective quantitative contributions of each of the plurality of operational features to the one or more predicted performance characteristics of the trained ML model. The second subset may include only those training data records representing at least one problematic observed performance characteristic. In accordance with example embodiments, each training record in the second subset does not necessarily have to include a problematic observed performance characteristic in order to “represent” at least one problematic observed performance characteristic. Rather each training record in the second subset may include one or more operational feature, for example, that associate the record in some way with a observed problematic performance characteristic. For example, a given record may be associated with a base station or cell that had low average throughput (observed problematic performance characteristic), but the given record itself may not necessarily be associated resulting performance degradation. However, other records may be associated with both the problematic base station or cell as well as the degraded performance.
Finally, block 1210 of example method 1200 may involve comparing the first and second respective quantitative contributions to determine a respective degradation metric for associating each of the plurality of operational features of the second subset with the at least one problematic observed performance characteristic of the second subset.
In some embodiments, block 1206 may be omitted, such that the fair distributions computed at block 1208 may be used to directly evaluate the degradation metrics, without necessarily comparing to a baseline.
In accordance with example embodiments, computing the fair distribution for each input data record of the first subset of the set of training data records may involve computing respective first Shapley Additive Explanations (SHAP) values for each of the plurality of operational features in each input data record of the first subset. Each given SHAP value may indicate a quantitative contribution of a given operational feature to a given predicted performance characteristic. With this arrangement, computing the fair distribution of second respective quantitative contributions for each input data record of the second subset of the set of training data records may involve computing respective second SHAP values for each of the plurality of operational features in each input data record of the second subset.
In accordance with example embodiments, comparing the first and second respective quantitative contributions to determine the respective degradation metric for associating each of the plurality of operational features of the second subset with the at least one problematic observed performance characteristic may involve, for each respective operational feature of the second subset, computing a respective severity metric based on the second respective aggregation of SHAP values across the second subset for the respective operational feature. Then, for each respective operational feature of the second subset, the respective severity metric may be scaled by a fraction of the total number of data records in the second subset having feature-value pairs associated with the respective operational feature.
In accordance with example embodiments, comparing the first and second respective quantitative contributions to determine the respective degradation metric for associating each of the plurality of operational features of the second subset with the at least one problematic observed performance characteristic may involve, for each respective operational feature of the first subset, computing a respective first statistical distribution of respective first SHAP values across the first subset, and for each respective operational feature of the second subset, computing a respective second statistical distribution of respective second SHAP values across the second subset. Then, for each respective operational feature in common in both the first and second subsets, the respective second statistical distribution and the respective first statistical distribution may be compared.
In accordance with example embodiments, the example method may further entail determining respective clusters of operational features within records of the second subset, determining a respective frequency among the records of each respective cluster, and identifying respective operational clusters as all respective clusters having respective frequencies above a threshold. Next, for each respective operational cluster of the second subset, a respective severity metric may be computed based on the second respective aggregation of SHAP values for operational features that are part of the respective operational cluster across the second subset. Then, for each respective operational feature of the second subset, the respective severity metric may be scaled by a fraction of the total number of training data records in the second subset having the exact combinations of feature-value pairs associated with the respective operational cluster.
In accordance with example embodiments, the example method may further entail identifying respective operational events of the second subset as time windows during which a performance characteristic is observed as being problematic. With this arrangement, comparing the first and second respective quantitative contributions may involve, for each respective operational event of the second subset, computing a respective severity metric for each respective operational feature based on the second respective aggregation of SHAP values across the second subset during the respective operational event. Then, for each respective operational feature of the second subset, the respective severity metric may be scaled by the total number of timepoints of the respective operational event.
In accordance with example embodiments, the example method may further entail identifying problematic case baselines according to the determined respective degradation metrics of specific operational features of the second subset as measured by their association with one or more observed performance characteristics. Templates of operational features may then be created according to at least one of: (i) a magnitude of the measured associations of operational features with the one or more observed performance characteristics, or (ii) a relative magnitude of the measured associations between operational features with respect to one or more observed performance characteristics, or (iii) the positive or negative relationship of the measured associations of operational features with the one or more observed performance characteristics. Templates may be compared to categorize problematic performance.
In accordance with example embodiments, the example method may further entail computing a model prediction error in the second subset and using the prediction error to adjust an attributed importance of respective operational features, and then qualifying an accuracy of representation based on computed model prediction errors.
In accordance with example embodiments, comparing the first and second respective quantitative contributions to determine the respective degradation metric may involve generating a visualization of a comparison of the second respective quantitative contributions to a baseline corresponding to the first respective quantitative contributions. The visualization may be a digital display presented on a display device, or a printed graphic produced by a printing device.
In accordance with example embodiments, the communication network may be a telecommunications network and/or or a data communications network. In this arrangement, each training data record may include a communication history record, the communication history record being at least one of a call detail record, or a session detail record, and the plurality of operational features may include a base station ID, cell ID, sector ID, radio frequency, PLMN ID, signal-to-noise ratio, call quality, radio access technology, user terminal device type, geographical coordinates, and/or user terminal device manufacturer. In addition, the observed performance characteristics include one or more observed instances of defined performance characteristics in a performance list consisting of at least one of: call completion status, dropped call status, blocked call status, data throughput rate, or call quality.
In accordance with example embodiments, in accordance with example embodiments, the communication network may be a telecommunications network and/or a data communications network. Further, each training data record may include a communication history record or system telemetry from one or more network layers of the communication network, where the one or more network layers are a 5G Core, a RAN, a User Plane, a Control Plan, a virtualization layer, and/or a physical infrastructure layer, of the communication network. With this arrangement, the example method may further involve monitoring one or more performance characteristics observed during runtime operations of the communication network, and then localizing a fault to the operational features of one or more communication network layers.
The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those described herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims.
The above detailed description describes various features and operations of the disclosed systems, devices, and methods with reference to the accompanying figures. The example embodiments described herein and in the figures are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations.
With respect to any or all of the message flow diagrams, scenarios, and flow charts in the figures and as discussed herein, each step, block, and/or communication can represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, operations described as steps, blocks, transmissions, communications, requests, responses, and/or messages can be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or operations can be used with any of the message flow diagrams, scenarios, and flow charts discussed herein, and these message flow diagrams, scenarios, and flow charts can be combined with one another, in part or in whole.
A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data). The program code can include one or more instructions executable by a processor for implementing specific logical operations or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including RAM, a disk drive, a solid state drive, or another storage medium.
The computer readable medium can also include non-transitory computer readable media such as computer readable media that store data for short periods of time like register memory and processor cache. The computer readable media can further include non-transitory computer readable media that store program code and/or data for longer periods of time. Thus, the computer readable media may include secondary or persistent long term storage, like ROM, optical or magnetic disks, solid state drives, compact-disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.
Moreover, a step or block that represents one or more information transmissions can correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions can be between software modules and/or hardware modules in different physical devices.
The particular arrangements shown in the figures should not be viewed as limiting. It should be understood that other embodiments can include more or less of each element shown in a given figure. Further, some of the illustrated elements can be combined or omitted. Yet further, an example embodiment can include elements that are not illustrated in the figures.
In addition to the illustrations presented in
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purpose of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application Ser. No. 63/222,040, filed on Jul. 15, 2021, which is incorporated herein in its entirety by reference.
Number | Date | Country | |
---|---|---|---|
63222040 | Jul 2021 | US |