Since the creation of computerized systems, a need has existed to identify problems with system functionality and to derive solutions to repair undesirable artifacts such as transmission delays, data corruption, etc., or to predict when system failure may occur or when maintenance is needed. Such work has been referred to as “automated solutions,” “quality of service (QoS),” “predictive services,” etc.
A foundational part of automated system solutions are the acts of gathering data and extracting relevant pieces of information in a correlated manner. Performance is measured and data pertaining thereto is analyzed to determine if performance deficiencies exist and, if so, a determination is made as to how the deficiencies may be remedied, or even if performance problems may arise in the future. Such work requires significant human interaction to perform these tasks. Furthermore, many enterprises do not have the ability to support the kind of trained professionals that are able to do such work and they are often left to hire specialized contractors to manage portions of work related to supporting enterprise systems.
Automated solutions exist that monitor and analyze performance of systems and provide information to system technicians that help the technicians identify and resolve problems or proactively identify future issues. Although such solutions conserve human activity and interaction, they are complex and rely on heuristic models to a significant degree. This requires extensive effort to build and fine-tune logic for each automated solution, but each solution is directed to solving problems on a particular system and is typically difficult to adapt to different environments. Furthermore, such systems typically focus only on a specific aspect of an operation and are not operating with a holistic view of a system.
The Detailed Description, below, makes reference to the accompanying figures. In the figures, the left-most digit(s) of a reference use of the same reference numbers in different figures indicates similar or identical items.
The techniques described herein relate to generalization of creation of applications, based on artificial intelligence (i.e. machine learning), that classify problems in managed stages and identify a problem, and are sometimes able to recommend one or more solutions. Using stages in a classification process requires less human interaction while increasing the likelihood that results will be meaningful. Such techniques can be used to create system solutions applications that are able to find a root cause of a problem and provide one or more possible solutions to the problem. The tools described herein can be used to support an application development process—from machine learning models to user interface widgets used to train a system. Such tools that use staged machine learning can be used to more easily create logic that is directed to a particular problem.
Typical application of machine learning involves receiving a data set, running a machine learning algorithm, recognizing patterns, and reporting issues. Supervised learning posits a structure, i.e., a model, that usually comprises a set of categories and Key Performance Indicators (KPI) specified by a subject matter expert. Examples of supervised algorithms include Naïve Bayes, S M, Logistic Regression, Random Forest, etc. Unsupervised learning lets the machine learning algorithm find its own patterns.
One problem that can arise with supervised learning is that if a structure is used that is too complex (i.e., there are too many categories), data won't converge to a meaningful solution. Patterns will be detected, but confidence in the results will not be statistically significant.
In the techniques described herein, a version of supervised learning is described that uses stages. Rather than doing supervised learning in a single stage, a machine learning algorithm is applied only using a partial structure made up of a number of categories and KPIs specified by an expert. Because the structure is simpler, convergence is more likely. The machine learning algorithm is applied and, based on the resulted, another stage is selected. The latter stage uses a different partial structure. This process is repeated until reliable results are obtained.
With each stage, granularity gets finer. For example, if a system relates to an automobile, an initial stage, or model, may indicate that there is a problem with the automobile. A subsequent stage (more granular) may indicate that there is a problem with a specific sub-system of the automobile, such as with an engine cooling system. Working with increasingly granular models in stages allows a problem to be focused in on as the process progresses.
Some of the features of the described techniques are: (1) that the machine learning algorithm can automatically pick which structure (categories and KPIs) to use moving onto the subsequent stage; (2) that the machine learning algorithm can let a subject matter expert intervene and add new categories and KPIs; (3) The machine learning algorithm can automatically suggest new categories and KPIs (similar to unsupervised learning); and (4) when making a new structure, the new structure can be automatically trained with derived data.
The process of solving system problems can typically be broken down into categories. As initial questions are answered, new dimensions of the problem become apparent. For example, once it is known that there is a problem due to alarms in a site, a question arises as to whether this sort of a problem requires escalation. For another example, if an initial problem is detected in a certain geographical area (e.g., a cluster), a question arises as to whether the problem is localized or if it is part of a wider problem.
The number and type of information pieces (i.e. “features”) needed to resolve a problem depends on the specific issue that needs to be addressed. In the example, above, regarding determining whether a product is localized or on a larger scale, a new set of Key Performance Indicators (KPIs) is required to resolve the issue, possibly including common core, transport, etc.
As an example, consider a two stage scenario. Input to a first model in the example includes: DL Power Lever, UL Power Level, Channel Quality Index, Channel Utilization, Drop Rate, Block Rate, Alarms in Site, etc. An output from the first model may indicate that there is an interference problem. Subsequently, input to a second model may include: DL Power Level, UL Power Level, Power from Outside Sectors, Power in the Edge, Power in the Core, etc. Output from the first model may indicate that there is a problem of interference due to an overshooter cell. Through the addition of a secondary stage, the problem was able to be recognized at a lower level of granularity.
To simplify creation of such logic chains, the following set of generic utilities is described herein, which are described in detail, below. Those generic utilities include:
1. A feature definition component. The feature definition component is a generic utility that permits definition of new features based on a configuration/IDE (Integrated Development Environment) approach.
2. UI Utilities. The user interface utilities enable the creation and training of the model via UI (User Interface) supporting screens.
3. A feature adjustment component that that automatically pre-processes input data based on generic characteristics, e.g. types of data and ranges, number of available training samples for each category, etc.
4. A feature simplification component that attempts to determine what the most relevant feature set is for each model, to try and simplify the convergence and ongoing training.
5. A new category detector. The new category detector is a utility that detects, once the model has been trained, if a new sample would likely belong to a new category that has not yet been covered.
6. A reliability calculator configured to calculates how ready a machine learning stage is to provide accurate recommendations, and estimates the reliability of a given answer.
The example computing device 200 includes one or more processors 202 that process computer-executable instructions. Each of the one or more processors 202 may be a single-core processor or a multi-core processor. The example computing device 200 also includes user interfaces 204 and one or more communication interfaces 206. The user interfaces 204 provide hardware components that provide an interface between a user and the example computing device 200. The user interfaces 204 can include a display monitor, knobs, dials, readouts, printers, keyboards, styluses, etc.
The communication interfaces 206 facilitate communication with components located outside the example computing device 200, and provides networking capabilities for the example computing device 200. For example, the computing device 200, by way of the communications interface 206, may exchange data with other electronic devices (e.g., laptops, computers, etc.) via one or more networks, such as a private network, the Internet, etc. Communications between the example computing device 200 and other electronic devices may utilize any sort of communication protocol known in the art for sending and receiving data and/or voice communications.
The example computing device 200 also includes miscellaneous hardware 208. The miscellaneous hardware 208 includes hardware components and associated software and/or or firmware used to carry out device operations. Included in the miscellaneous hardware 208 are one or more user interface hardware components not shown individually—such as a keyboard, a mouse, a display, a microphone, a camera, and/or the like—that support user interaction with the example computing device 200.
The example computing device 200 also includes memory 210 that stores data, executable instructions, modules, components, data structures, etc. The memory 210 can be implemented using computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer storage media and communications media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. Computer storage media may also be referred to as “non-transitory” media. Although in theory, all storage media are transitory, the term “non-transitory” is used to contrast storage media from communication media, and refers to a tangible component that can store computer-executable programs, applications, instructions, etc. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. Communication media may also be referred to as “transitory” media, in which electronic data may only be stored in a non-tangible form.
An operating system 212 is stored in the memory 210 of the example computing system 200. The operating system 212 controls functionality of the processor 202, the communications interfaces 204, the communication interfaces 206, the miscellaneous hardware 208, and memory operations. Furthermore, the operating system 212 includes components that enable the example computing device 200 to receive and transmit data via various inputs (e.g., user controls, network interfaces, and/or memory devices), as well as process data using the processor 202 to generate output. The operating system 212 can include a presentation component that controls presentation of output (e.g., display the data on an electronic display, store the data in memory, transmit the data to another electronic device, etc.). Additionally, the operating system 212 can include other components that perform various additional functions generally associated with a typical operating system. The memory 210 also stores miscellaneous software applications 214, or programs, that provide or support functionality for the example computing device 200, or provide a general or specialized device user function that may or may not be related to the example computing device 200 per se. The software applications 214 can include system software applications and executable applications that carry out non-system functions.
A multi-stage machine learning application 216 is stored in the memory and drives the multi-stage machine learning operations described herein. The multi-stage machine learning application 216 includes a feature definition component 218, user interface (UI) utilities 220, and an automatic feature adjustment component 222. The multi-stage machine learning application 216 also includes a feature simplification component 224, a new category detector 226, and a reliability calculator 228. A database 230 is also stored in the memory 210 and is configured to store data from and provide data to the multi-stage machine learning application 216 and other components of the computing device 200.
The components and features of the multi-stage learning application 216 will be described in greater detail below, with respect to one or more subsequent figures. In the following discussion, continuing reference is made to the elements and referenced numerals shown in
To simplify the complexity of the machine learning algorithms, relatively complex features are created, which are derived from specific domain knowledge from network technicians. Instead of simply feeding data inputs like KPIs, parameters, etc., the techniques described herein contemplate digesting this information into typical information bits that technicians typically use in a decision-making process. Examples of such features include, but are not limited to: (a) whether or not a node on a network is congested; (b) whether or not a system terminal supports LTE 700 band; (c) whether a membrane in a water desalinization system is operating at low efficiency; and (d) whether or not there was a critical alarm active in a system element for a previous period of time.
Implementations of the techniques described herein support the following generic utilities: (a) Definition of special metrics based on text variables, e.g., the operating system name contains “Android,” a description includes the phrase “at home,” etc.; (b) Calculation of KPIs at various aggregation level, for various metrics such as counters, alarms, user Call Detail Records (CDRs), etc.; (c) Calculation of alerts for any given KPI; (d) Calculation of anomalies for any given KPI, comparing a specific hour/day to previous x weeks for the same hour/day period; and (e) Any combination of results from any of the previously defined functions.
The example user interface 300 displays a set of incidences 302, e.g., detected system issues, customer complaints, etc., together with a high-level summary of relevant metrics or features that help a user decide what a potential resolution should be. A feature can include complex representations of a combination of data feeds. The example user interface 300 also includes an “OK” button 304, a “Train” button 306, and a “Review Performance” button 308.
When the “OK” button 304 is actuated, a currently displayed resolution is accepted. When the “Train” button 306 is actuated, a new training sample is created. When the “Review Performance” button 308 is actuated, an overall performance of the machine learning system is presented for review.
Once a category has been created, the user has the ability to add new categories within the model using a user interface element 500 similar to that shown and described with respect to
To derive at least a part of the example user interface 600 shown in
Once the machine learning model has been trained and is in operation, the system will detect when a given new data sample does not seem to fit within one of the existing categories. This is indicated to the user when a resolution field shows “unknown resolution—potential new category.” The user is then able to create a new category and add the sample to the training set.
On the model training analysis screen 800, a user can see an overall performance of a specific model: the training samples used, the training error, and the overall accuracy. It also has utilities to select a different model, to modify the current feature set (add/remove), or to retrain the model. It is noted that models and training data can be stored for each unique user. Furthermore, a master model may be utilized that is common to multiple users, and user-specific training data may be applied to the master model.
The model training analysis screen 800 indicates the training samples as well as new data samples. The model training analysis screen 800 is further configured to invoke functions that are described in detail below. A “Modify Features” button 802 is also included that, when selected, presents the display shown and described with respect to
The feature implementation interface 900 also provides a utility 904 to remove all features having a score lower than a certain threshold.
The automatic feature adjustment module 220 (
Preparation of a scaling function. Based on the training set, the feature adjustment module analyzes types of data and value ranges for each individual feature. Then a mean and standard deviation are derived for each of them.
Application of the scaling function. For each data sample (both training and new data sets), a normalized data set is calculated. The normalized data set is user defined. For example, a user may set the normalized data set to be equal to x-mean/std.
Balancing of Categories. In cases where the training data presents a serious imbalance between categories (e.g., there are 10 times more samples for category 1 than for category 2), the system may produce inaccurate results, typically favoring the category that has more data samples. A “Balancing of Categories” function is configured to calculate a number of training samples in each category, and if a serious imbalance is found, it will oversample the less frequent categories, copying random samples from the less frequent categories. The deviation that must be present to be considered a “serious” imbalance is configurable.
The automatic feature simplification module 224 (
The automatic feature simplification module 224 is also configured to provide a user option to automatically simplify the feature set based on relative scores. If a number of features is higher than a specified threshold, features with an absolute weight less than a configured threshold (e.g., 10%) of an average of absolute weight of the top x features (e.g., 3, etc.) may be eliminated.
The reliability calculator 228 (
A model accuracy is calculated as the sum of true positives plus true negatives divided by a total number of validation samples. A model recall feature is included that is configured to provide statistics (“Recall”) of True Positives divided by the sum of True Positives plus False Negatives for the validation data set. A model precision feature provides statistics (“Precision”) of True Positives divided by all positive guesses (true plus false positives) for the validation set.
An F-Score is a harmonic mean between Recall and Precision. It can be used as a way to have a single value that represents the performance of the model. A known form of an F-Score is (2*Precision*Recall / (Precision+Recall)). However, this formula is configurable to give more weight to Precision or Recall as desired.
A sample reliability estimation function indicates a probability of error for an estimated result of a given input data vector. A Receiver Operating Curve (ROC) summarizes classifier performance over a range of trade-offs between True Positive and False Positive error rates. The x-axis represents a percentage of False Positives (FPR=FP/TN+FP) and the y-axis represents percentage of True Positives (TPR=TP/TP+FN).
A Projection utility to project data samples in 2-D may also be included. Such a utility provides a 2-D representation of a given set of data vectors. This is useful to display the data in the screen for analysis purposes, and is sometimes referred to as “dimensionality reduction.” This representation may be implemented based on one of various methods, including a t-SNE (t-distributed Stochastic Neighbor Embedding) method, a Sammon projection, or the like.
In at least one implementation, an alternative method to produce training samples during the initial stage is used (“New Category Detector” 224,
Once data groups have been created, a user can decide to reclassify certain data samples. This effectively produces a new training set that can be used to train a supervised classification model. Once a stage model has been trained, new samples are classified based on the trained model. For every given sample, the model attempts to decide a corresponding category. If the reliability of the result is low (e.g. <60% or some other pre-defined threshold), a determination is made as to whether the sample belongs in a new category.
The example user interface training window 1100 also includes a similar incidence table 1110 that shows information related to samples 1108 that are similar to the selected sample 1106. Additionally, a machine learning summary table 1112 is includes in the training window 1100 and shows various statistics related to the incidences. Although certain statistics are shown in the machine learning summary table 1112, additional, fewer, and/or different statistics may be displayed.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
This application claims the benefit of Provisional U.S. Patent Application No. 62/471,319 filed on Mar. 14, 2017.
Number | Date | Country | |
---|---|---|---|
62471319 | Mar 2017 | US |