Anomaly detection refers to identifying data values that deviate from an observed norm. Oftentimes anomaly detection may indicate an issue that requires attention. For example, in the context of network traffic, anomaly detection may include identifying traffic loads that deviate from historical norms, which may indicate a service outage or a network intrusion event. Anomaly detection may also be used to identify the source of the issue so that mitigative action can be performed. One problem that arises in anomaly detection is early detection. Oftentimes issues are identified when it is too late. Services may have been down for too long or a network intrusion may have already occurred by the time the issue is identified. Systems that attempt to provide early warning oftentimes produce false positive warnings.
False positive warnings may result from the massive scale of data being analyzed using a single analysis technique, statistical fluctuations that aren't real anomalies, and the tuning of models that overfit data indicative of anomalies of concern that flags too many anomalies in the input data. False positive warnings may cause undue burden on computer systems and networks. For example, investigating these false positive warnings may require processing and memory power to trace the root cause of non-existent or benign anomalies, cause network downtimes, and hamper investigations into anomalies that should be mitigated.
On the other end of the spectrum, tuning detection systems in a more restrictive way may result in underfitting the data, which causes false negative reporting. False negative reporting may result in failing to detect anomalies that should be investigated, leading to extended service outages, network intrusions, or other issues in network systems or other systems in which anomalies may indicate a problem. These and other issues may exist in systems that attempt to detect anomalies.
Various systems and methods may address the foregoing and other problems. For example, to address false positive and negative results from machine learning models for anomaly detection, the system may aggregate the outputs of a plurality of machine learning models that are each trained to detect anomalies. Each machine learning model may be trained to learn a respective behavioral pattern of a given metric to detect an anomaly. Each machine learning may generate an anomaly score based on a respective learned behavioral pattern.
The system may generate an aggregate anomaly score based on the anomaly scores from the machine learning models, thereby detecting anomalies based on different behavioral patterns of the same metric. In this way, the system may determine whether a data value of a metric is an anomaly based on multiple learned behaviors of the metric. For example, in operation, the system may access a data value to determine whether the data value is anomalous. The system may provide the data value as input to the plurality of machine learning models, which may each output a respective anomaly score. Each anomaly score may represent a prediction, by the machine learning model that generated the anomaly score, that the data value is anomalous.
The machine learning models may use different techniques from one another that depend on the behavioral pattern that each respective machine learning model will be trained to learn. For example, a first machine learning model may be trained on a time series of historical data values of a metric to learn seasonality and trending behavior of the metric. Based on the learned seasonality and trending behavior, the first machine learning model may forecast upper and lower bounds of expected data values for a given point in time. The system may analyze a data value at a particular date or time, determine a deviation of the data value from the forecasted upper and lower bounds for the particular date or time, and generate a first anomaly score based on the deviation.
A second machine learning model may be trained on the time series of historical data values of the metric to learn rarest occurrence behavior of the metric. For example, second machine learning model may learn data distributions from the historical data values and output a probability that a data value for the particular date and time is within the data distribution for that particular date and time. The system may generate a second anomaly score based on the probability.
A third machine learning model may be trained on multiple metrics collectively within a context, which ensure that the metrics are related to one another. The third machine learning model may be trained to learn combinations of values of the metrics that occur together. In this way, the system may determine, using the third machine learning model, whether the data value, together with other data values, are expected. The system may output a third anomaly score based on the probability that the observed data values and related other data values are expected. It should be noted that the terms “first,” “second,” and “third” do not denote an order or requirement that one of terms be required. For example, a second machine learning model may be omitted such that only the first and third machine learning models are used.
In some examples, the system may monitor detected anomalies over time so that a duration of an anomaly for a given metric may be used in the aggregate anomaly score. For example, the system may generate a duration score that is based on a duration of time that an anomaly was previously detected for a given metric and is currently being detected. The duration score may be positively correlated with the duration of time. The system may aggregate the duration score with the anomaly scores from the machine learning models to generate the aggregate anomaly score. In this way, the aggregate anomaly score takes into account a duration in which the anomaly has persisted.
The system may implement the plurality of machine learning models using a pluggable architecture in which different machine learning models may be added and/or removed as needed based on the context in which anomaly detection is performed. For example, a user may select machine learning models that are to be used for anomaly detection. In this way, each user may specify machine learning models—and therefore which behaviors of a metric—to be used for anomaly detection.
Features of the present disclosure may be illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
Each metric 101-105 may be associated with a context. A context may refer to information that indicates a source with which a metric is associated. Continuing the weather domain examples, an example of a context may be a geographic location for which the temperature, humidity, and precipitation relate. The data value of each metric 101-105 may be associated with a time value that indicates a date and/or time at which the data value occurred. For example, the temperature (value of a metric 101) at a given location (context) that occurred at a specific date and/or time (time value) may be stored in association with one another. This association may be stored in a time series for historical analysis and model training.
The computer system 110 may access the metrics 101-105 from various sources, depending on the context of these metrics. For example, metrics 101-105 may relate to a computer network domain, as will be described in other examples throughout this disclosure. In the computer network domain, the computer system 110 may obtain a metric 101-105 from one or more network devices of a monitored system (not shown). In another example, for application-level contexts, the computer system 110 may obtain a metric 101-105 from one or more applications or services executing on the monitored system. Thus, as will be apparent, the metrics 101-105 may relate to different contexts and be accessed from a wide range of sources.
The computer system 110 may include one or more processors 112, a historical data values datastore 114, a labels and metrics datastore116, a machine learning models datastore 118 (referred to as “ML models datastore 118”), and/or other components. The processor 112 may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Although processor 112 is shown in
As shown in
Processor 112 may be configured to execute or implement 115, 120, 130, and 140 by software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on processor 112. It should be appreciated that although 115, 120, 130, and 140 are illustrated in
The pluggable architecture 115 may include a plurality of (two or more) ML models 120A-N. Each ML model 120A-N may be trained to detect an anomaly from one or more metrics 101-105 based on a respective behavior of that metric. For example, ML model 120A may be trained to detect whether a given value of a metric 101 is anomalous based on a first behavior of the metric 101. ML model 120B may be trained to detect whether the given value of the metric 101 is anomalous based on a second behavior of the metric 101. ML model 120N may be trained to detect whether a given value of the metric 101 is anomalous based on a third behavior of the metric 101. Each of the ML models 120A-N may be trained to detect anomalous values in other metrics 103-105 as well or instead. The ML models 120A-N may be trained to detect anomalous values (also referred to herein as “anomalies”) based on historical data values of various metrics 101-105. The historical data values may be stored, for example, in the historical data values datastore 114 for training and/or re-training purposes. The historical data values stored in the historical data values datastore114 may be based on live data values that are stored for training and/or curated data values that are selected for training and/or re-training.
Continuing the weather domain examples, the output of the ML model 120A for a real-time temperature value (which is an example of a metric 101) may represent an assessment of whether the real-time temperature value is anomalous based on a first behavior of temperatures values observed in a training dataset. The output of the ML model 120B for the real-time temperature value may represent an assessment of whether the real-time temperature value is anomalous based on a second behavior of temperatures values observed in the training dataset. The output of the ML model 120N for the real-time temperature value may represent an assessment of whether the real-time temperature value is anomalous based on a third behavior of temperatures values observed in the training dataset. Each of the ML models 120A-N may be trained to learn respective behaviors of other metrics, such as precipitation and humidity as well or instead.
The output of each ML model 120A-N may include an anomaly score. Each anomaly score may represent an assessment by a corresponding ML model 120A-N that a data value for a metric 101-105 being analyzed is anomalous. Each anomaly score may be normalized to be equal to a value between 0.0 and 1.0, in which 0.0 indicates a minimum likelihood of anomaly and 1.0 indicates a maximum likelihood of anomaly according to the respective behavior modeled by a given ML model 120A-N.
Thus, collectively, the anomaly scores of the ML models 120A-N provide an aggregate view of whether or not a data value for a metric 101-105 analyzed by each of the ML models 120A-N is anomalous based on multiple behaviors of that metric. The ML models 120A-N may be further executed on other metrics 103, 105, such as humidity and precipitation metrics as well or instead. Further details of the ML models 120A-N and the respective learned behaviors are described in
The anomaly scores outputted by the ML models 120A-N may be provided to the scoring subsystem 130, which may aggregate the anomaly scores and generate an aggregate anomaly score. The aggregate anomaly score refers to an assessment of whether or not a data value of a metric 101-105 is anomalous based on multiple behaviors of the metric 101-105. Each anomaly score may be weighted based on a weighting value that is assigned to one or more corresponding ML models 120A-N.
In some implementations, the scoring subsystem 130 may generate a duration score based on a duration time associated with each anomaly score. The duration time may indicate a length of time that a detected anomaly has persisted, as indicated by the date and/or time associated with each value of the metric 101-105 being analyzed. The scoring subsystem 130 may assign a larger duration score for longer duration times. For example, the scoring subsystem 130 may increment the duration score based on the duration time up to a maximum value. In a particular example, the scoring subsystem 130 may add 0.1 to the duration score for each 5 minutes of duration time, with a maximum value of 1.0. The scoring subsystem 130 may aggregate the duration score with the aggregated anomaly scores to generate the aggregate anomaly score. In some of these implementations, the duration score may be weighted by its own weighting value, similar to the manner in which the anomaly scores are weighted by their respective weighting value.
It should be noted that the weighting values, whether for respective ML models 120A-N or duration scores, may be adjusted as needed and/or automatically. In some implementations, the weighting values are defaulted to be equal to one another so that the anomaly scores and duration score are each weighted equally with respect to one another. In some implementations, the scoring subsystem 130 may assign a mitigation category based on the aggregate anomaly score. Further details of the scoring subsystem 130 are described in
The user interface (UI) subsystem 140 may provide the aggregate anomaly score and/or mitigative action via a user interface. Furthermore, in some implementations, the pluggable architecture 115 facilitates the addition or removal of various ML models 120A-N as needed. In operation, the UI subsystem 140 may provide a configuration UI (not illustrated) for receiving input that configures the pluggable architecture 115. For example, the configuration UI may receive a specification of one or more ML models 120A-N to include or exclude in the pluggable architecture 115. Thus, users may be able to decide which ML models 120A-N to use for anomaly detection. In some implementations, each ML model 120A-N is trained specifically for a given metric 101-105. In these implementations, the specific ML models 120A-N and corresponding metric 101-105 that are used may be configured for detecting anomalies.
The ML models 120A-N may each be trained using respective machine learning techniques to detect respective behaviors of the metric 101. For example, the ML model 120A may be trained to learn seasonality and trend in historical data values used as training data for the metric 101. To do so, ML model 120A may be trained to analyze a time series of values for the metric 101 to learn seasonality and trends in the time series. For example, the ML model 120A may analyze seasonality and trend in the data to generate feature prediction. Thus, for a given point in time, the ML model 120A may determine whether a data value being assessed at the given point is anomalous compared to similar points in time (seasonally or trend adjusted). ML model 120A may also generate an upper bound and a lower bound by analyzing uncertainty based on the historical seasonal and trend data. If an analyzed input data value for the metric 101 is outside the upper and lower bounds, then the ML model 120A may determine that the data value the metric 101 is anomalous.
One example of the ML model 120A is the PROPHET forecasting model, which may be implemented using the Python or R programming languages. PROPHET may use an additive model to forecast time series data in which non-linear trends are fit with yearly, weekly, and daily seasonality. In addition, any holiday effects may be taken into account. In this way, the ML model 120A may use forecasting to predict what a data value should be at a given point in time based on historical seasonality and holiday effects (if any). For example, the ML model 120A may forecast upper and lower bounds for the given point in time. The resulting anomaly score 121A may be generated based on a deviation of the data value from the forecasted upper and lower bounds. An example of generating the anomaly score 121A using the PROPHET forecasting model is provided below for illustration. The example will use the following definitions:
Observed Value: The actual value of the data point
PROPHET Predicted Value: Predicted value provided by PROPHET model
PROPHET Upper Bound: Predicted Upper Boundary provided by PROPHET model
PROPHET Lower Bound: Predicted Lower Boundary provided by PROPHET model
Deviation: The delta between PROPHET Bound and the Observed value
Calculated Thresholds: Utilizing the PROPHET provided bounds generates too much noise, so we utilize a custom threshold which is the PROPHET Bound minus PROPHET Predicted value multiplied by the Boundary Multiplier
Boundary Multiplier: Used to determine the upper threshold by which observed values will be measured
Normalized Score: Deviation divided by the respective upper or lower threshold, we target a value from zero to one, anything greater than 1 is set to 1, with 1 being far out of the range and a likely anomaly.
Calculation Logic
If Observed Value>PROPHET Upper Bound, then:
Upper Threshold=(PROPHET Upper Bound−PROPHET Predicted Value)*Boundary Multiplier; Deviation=Observed Value—PROPHET Upper Bound;
Normalized Score=Deviation/Upper Threshold; Else if Observed Value<PROPHET Lower Bound, then
Lower Threshold=(PROPHET Predicted Value−PROPHET Lower Bound)*Boundary Multiplier; Deviation=PROPHET Lower Bound−Observed Value;
Normalized Score=Deviation/Lower Threshold;
Examples of Values:
Observed Value=3
PROPHET Predicted Value=0.62
PROPHET Upper Bound=0.87
PROPHET Lower Bound=0.36
Boundary Multiplier=10 Calculations
Calculated Upper Threshold=(0.87-0.62)*10=2.5
Calculated Lower Threshold=(0.36-0.62)*10=−2.6
Observed Value to PROPHET Upper Bound Deviation=3−0.87=2.13
Normalized Score=2.13/2.5=0.85 If>1 the score may be set to 1.
The ML model 120B may be trained to learn rarest occurrence involving the metric 101. For rarest occurrence, the ML model 120B may be trained to analyze the probability of a data distribution and generate a probability of a given value occurring in the data distribution. If the probability of a data value for the metric 101 is less than a threshold probability, then the ML model 120A may determine that the data value the metric 101 is anomalous. Thus, the anomaly score 121B from the ML model 120B may depend on the probability that the data value being assessed is within the predicted data distribution. For example, the anomaly score 121B may be based on a deviation of the data value from the predicted distribution boundary. One example of rarest occurrence modeling may include a robust covariance approach.
Robust covariance methods may remove extreme outliers from a distribution. In one example, robust covariance models may assume that at least a portion of a distribution is “normal” and is not an outlier. Robust covariance may be trained to analyze a set of random samples to estimate statistics such as the mean, sum, and absolute sum. Distances of each of these samples from one another based on the statistics may be learned and sorted. The values with the smallest distances may be used to update the statistics for the random samples, and a subset with the lowest absolute sum is considered for computation until convergence. The estimate with the smallest absolute sum is returned as output for filtering the distribution. The ML model 120B may use the filtered distribution for the prediction distribution based on which the anomaly score 121B is determined. An example of generating the anomaly score 121B using robust covariance is provided below for illustration. The example will use the following definitions:
Observed Value: The actual value of the data point
Standard Deviation: Calculated from the Observed Value dataset
Decision Score: It is equal to the shifted Mahalanobis distances which is provided by the Robust covariance model. This means the higher the value the data is farther away from the norms. The Mahalanobis distance is a measure of the distance between a point P and a distribution D.
Normalized Score: Final score, 1 being highly anomalous and 0 being normal
Calculation Logic:
Normalized Score=1−(1/(SQRT(Decision Score)/Standard Deviation))
Example of Values:
Observed Value=0.3729983333333
Decision Score=138.82374363
Standard Deviation=0.01324 Calculation
Normalized Score=1−(1/(SQRT(138.82374363)/0.01324))=0.998
The ML model 120N may be trained to learn normal observations of a combination of two or more of the metrics 101-105. For the combination, the ML model 120N may be trained to learn a normal observation set of a combination of values for two or more metrics 101-105. Thus, the ML model 120N may detect any abnormal combination of values in a real-time combination of values. One approach to model the combination is through a Bidirectional Generative Adversarial Network (BiGAN). The anomaly score 121N generated by the ML model 120N may be based on the discriminator output of the BiGAN.
A BiGAN is a Generative Adversarial Network (GAN) with an encoder component. A GAN includes a generator and a discriminator. The generator learns how to generate data values from a latent space (noise). The objective of the discriminator is altered to classify between a real data value (in the historical data set) and a “synthetic” sample (one that is anomalous). The discriminator may also make the classification based on an encoder/decoder architecture. An encoder may encode a dataset in a way to compress the data while the decoder may attempt to recreate the original dataset from the compressed data. If decoding a test dataset is successful (as determined by being similar or identical to the original dataset), then the test dataset may be deemed to be equivalent to the original dataset. Thus, in this context, the encoder/decoder may be used to determine whether the data value is anomalous based on historical data values encoded by the encoder.
An example of generating the anomaly score 121BN using BIGAN is provided below for illustration. The example will use the following definitions:
Observed Value: The set of actual value for a given context at a given time.
Discriminator Output: The model output, Closer to 1 is normal and Closer to 0 is abnormal.
Normalized Score: Final score, 1 being anomalous and 0 being normal.
Normalized Score=1—‘Discriminator Output’
In some implementations, the scoring subsystem 130 may generate a duration score based on a duration time associated with each anomaly score 121A-N. The duration time may indicate a length of time that a detected anomaly has persisted, as indicated by the date and/or time associated with the data value of metric 101. The scoring subsystem 130 may assign a larger duration score for longer duration times. For example, the scoring subsystem 130 may increment the duration score based on the duration time up to a maximum value. In a particular example, the scoring subsystem 130 may add 0.1 to the duration score for each 5 minutes of duration time, with a maximum value of 1.0. The scoring subsystem 130 may aggregate the duration score with the aggregated anomaly scores to generate the aggregate anomaly score. In some of these implementations, the duration score may be weighted by its own weighting value, similar to the manner in which the anomaly scores are weighted by their respective weighting value. Using the duration score enables a distinction between real anomalies from statistical fluctuations, which may occur intermittently but briefly over time.
The scoring subsystem 130 may aggregate the anomaly scores 121A-N to generate an aggregate anomaly score 131. The scoring subsystem 130 may define the aggregate anomaly score 131, (y), as a function (f(x)), which may be denoted by equation 1:
y=f(x)=(s(x)+r(x)+c(x)+d(x))/4 (1),
in which:
y=f(x)=the aggregate anomaly score 131, which may be a normalized anomaly score between 0 to 1;
s(x)=the anomaly score 121A output by the ML model 120A (seasonality and trend);
r(x)=the anomaly score 121B output by the ML model 120B (rarest occurrence);
c(x)=the anomaly score 121N output by the ML model 120N (combined related metrics);
d(x) =the duration score indicating the duration of the anomalous reading based on the date or time associated with the metric 101.
In equation (1), the model weightings are 1 for each of s(x), r(x), c(x), and d(x). Thus, s(x), r(x), c(x), and d(x) are equally weighted according to the above. The normalization factor of 4 may be used since four normalized scores each using the same scale (in this case, values between 0.0 to 1.0) are used for each of s(x), r(x), c(x), and d(x). It should be noted that other model weightings may be used to weight some anomaly scores higher than others.
In some implementations, the scoring subsystem 130 may assign a mitigation category based on the aggregate anomaly score. For example, Table 3 illustrates non-limiting examples of the aggregate anomaly score ranges and corresponding mitigation categories.
It should be noted that the ML models 120A-N of the pluggable architecture 115 may be trained to learn respective behaviors of different metrics. For example,
Data structure 302 may store a set of label identifiers (IDs) and labels that indicate a context of a given metric that is linked with the label ID. For example, as shown, label ID 1010 identifies a label that indicates a context “Application 1” and “Service x1.” An anomalous value of a metric 101-105 associated with this label ID indicates that Application 1 and Service x1 may be the culprit of the anomaly.
Data structure 304 may store a set of metric IDs and corresponding metric name for a given metric 101-105. For example, metric ID 3 identifies a “database alert count” metric.
Data structure 306 may store a set of context IDs and corresponding context. For example, context ID 2000 identifies “Application 1 and Service x1” contexts.
Each of the data structures 302, 304, 306, 402, and 502 may be implemented in a relational database table and/or other data structure. These or other data structures may be stored in the label and metrics datastore116.
Examples of anomaly scores 121A-N based on respective ML models 120A-N will now be described in with reference to the data structure 502 shown in
The ML model 120A may generate an anomaly score 121A based on seasonality and trend behaviors, as given by s(x) in Equation 1. Each label ID-metric ID pair may be analyzed by the ML model 120A to generate a corresponding anomaly score for the pair. If the anomaly score indicates an anomaly has been detected, the metric ID and corresponding label ID may be used to determine the source of the anomaly.
The following shows examples of calculated anomaly scores 121A for seasonality and trends as determined by the ML model 120A, using PROPHET time series predictions based on historical data values.
For Label ID 1010:
Label ID=1010, Metric ID=3, Time=10:50 AM, Value=8.0: s(x)=0.52.
Label ID=1010, Metric ID=4, Time=10:50 AM, Value=0: s(x)=0.0.
Label ID=1010, Metric ID=5, Time=10:50 AM, Value=33: s(x)=0.0.
For Label ID 1011:
Label ID=1011, Metric ID=3, Time=10:50 AM, Value=10.0: s(x)=0.55.
Label ID=1011, Metric ID=4, Time=10:50 AM, Value=7.0: s(x)=0.71.
Label ID=1011, Metric ID=5, Time=10:50 AM, Value=52.0: s(x)=0.30.
The ML model 120B may generate an anomaly score 121B based on rarest occurrence behavior, as given by r(x) in Equation 1. Each label ID-metric ID pair may be analyzed by the ML model 120B to generate a corresponding anomaly score for the pair. If the anomaly score indicates an anomaly has been detected, the metric ID and corresponding label ID may be used to determine the source of the anomaly.
The following shows examples of calculated anomaly scores 121B based on rarest occurrence as determined by the ML model 120B, using robust covariance based on historical data values. An anomaly score 121B may be based on the probability of the observed data value of a metric occurring based on the historical data values. For Label ID 1010:
Label ID=1010, Metric ID=3, Time=10:50 AM, Value=8.0: r(x)=0.0.
Label ID=1010, Metric ID=4, Time=10:50 AM, Value=0: r(x)=0.0.
Label ID=1010, Metric ID=5, Time=10:50 AM, Value=33: r(x)=0.0.
For Label ID 1011:
Label ID=1011, Metric ID=3, Time=10:50 AM, Value=10.0: r(x)=0.0.
Label ID=1011, Metric ID=4, Time=10:50 AM, Value=7.0: r(x)=1.0.
Label ID=1011, Metric ID=5, Time=10:50 AM, Value=52.0: r(x)=0.0.
Context/Combination Anomaly Score
The ML model 120N may generate an anomaly score 121N based on combined metrics, as given by c(x) in Equation 1. Each Context ID-metric ID pair may be analyzed by the ML model 120N to generate a corresponding anomaly score for the pair. If the anomaly score indicates an anomaly has been detected, the metric ID and corresponding Context ID may be used to determine the source of the anomaly.
The following shows examples of calculated anomaly scores 121N based on a combination of related metrics as determined by the ML model 120N, using a BiGAN on historical data values. An anomaly score 121B may be based on the probability of the observed data values of a combination of metrics occurring based on the historical data values.
Thus, a given Context ID will be mapped to multiple metrics, and an anomaly score 121N is based on a probability that the combined values of the multiple metrics will occur in the historical data values. In other words, an anomaly score 121N represents a probability that the combined values of the multiple metrics are “normal” or has been seen in the historical data values.
For example, Table 4 below shows an example of context ID 2000 in which Metric IDs 3 4, and 5 and their paired label ID 1010 are combined and analyzed. In the result shown in Table 4, the combined label ID-metric ID pairs 1010_3, 1010_4, and 1010_5 and corresponding values resulted in in a predicted anomaly score of 0.0, indicating low probability of an anomaly. In other words, the combination of values 8.0, 0.0, 33.0 for the respective label IDS-metric pairs 1010_3, 1010_4, and 1010_5 is not out of the ordinary based on training from the historical data values.
For example, Table 5 below shows an example of context ID 2001 in which Metric IDs 3, 4, and 5 and their paired label ID 1011 are combined and analyzed. In the result shown in Table 5, the combined label ID-metric ID pairs 1011_3, 1011_4, and 1011_5 and corresponding values resulted in in a predicted anomaly score of 1.0, indicating high probability of an anomaly. In other words, the combination of values 10.0, 7.0, and 52.0 for the respective label IDS-metric pairs 1011_3, 1011_4, and 1011_5 is out of the ordinary (anomalous) based on training from the historical data values.
The scoring subsystem 130 may generate duration score based on a duration of time that a detected anomaly is active, as given by d(x) in Equation 1. Each time an anomaly is detected by any one of the ML models 120A-N, a fault start time may be recorded for the content ID (if relevant and known), label ID, and metric ID. In this way, if the anomaly persists after future iterations of anomaly detection, the future iterations may be able to determine a duration of the anomaly based on the fault start time and current time.
For Label ID1010:
Label ID=1010, Metric ID=3, Time=10:50 AM, Value=8.0: d(x)=0.0.
Label ID=1010, Metric ID=4, Time=10:50 AM, Value=0: d(x)=0.0.
Label ID=1010, Metric ID=5, Time=10:50 AM, Value=33: d(x)=0.0.
Assuming the current time for this instance of anomaly detection is 10:50 AM, the duration score d(x) (anomaly score 121N in
For Label ID 1011:
Label ID=1011, Metric ID=3, Time=10:50 AM (fault start at 10:00), Value=10.0: d(x)=1.0.
Label ID=1011, Metric ID=4, Time=10:50 AM, Value=7.0: d(x)=0.0.
Label ID=1011, Metric ID=5, Time=10:50 AM, Value=52.0: d(x)=0.0.
Assuming the current time for this instance of anomaly detection is 10:50 AM, the duration score d(x) (anomaly score 121N in
The scoring subsystem 130 may aggregate the anomaly scores 121A-N and the duration score to generate an aggregate score.
For context ID 2000, the scoring subsystem 130 may generate the following aggregate anomaly scores for each metric ID, as shown in Table 8 below. Each of the aggregate scores may be generated based on Equation 1.
For context ID 2001, the scoring subsystem 130 may generate the following aggregate anomaly scores for each metric ID, as shown in Table 9 below. Each of the aggregate scores may be generated based on Equation 1.
At 602, the method 600 may include accessing a data value of the metric for which an anomaly prediction is to be made. For example, the metric may be any of the metrics 101-105 (or other metrics).
At 604, the method 600 may include providing the data value to the pluggable plurality of machine learning models (such as the ML models 120A-N of the pluggable architecture 115).
At 606, the method 600 may include generating, via the pluggable plurality of models, a plurality of anomaly scores (such as anomaly scores 121A-N). The anomaly scores may include at least a first anomaly score (such as any one of the anomaly scores 121A-N) generated by the first model (such as any one of the ML models 120A-N) based on the first behavior of the historical data values of the metric and at least a second anomaly score generated by the second model (such as any other one of the ML models 120A-N) based on the second behavior of the historical data values of the metric. Each anomaly score from among the plurality of anomaly scores represents a prediction that the data value is anomalous based on a respective machine learning model that models a corresponding behavior of the historical data values of the metric.
At 608, the method 600 may include generating an aggregate anomaly score (such as aggregate anomaly score 131) based on the plurality of anomaly scores, the aggregate anomaly score representing an aggregate prediction that the data value is anomalous.
At 610, the method 600 may include identifying a mitigative action based on the aggregate anomaly score. For example, the mitigative actions may be mapped to aggregate anomaly scores. Table 3 shows an example of such mapping.
At 612, the method 600 may include performing a lookup of a stored association of a metric identifier and label identifier pair based on the metric identifier to identify a source of the data value using the label identifier. For example, the lookup may be performed based on a query or other data recall action against one or more of the data structures 302, 304, 306, 402, and 502. For example, the source of the data value in a computer network domain may include various network devices such as switches, routers, hubs, application server devices, bridges, access points, and/or other devices that are involved in a computer network. Examples of metrics 101-105 in this domain may include a number of active network sessions, a number of new network sessions, a number of network transactions, and so forth. In an application services domain, the source of the data value may include a software application service, a specific software application, a specific routine of a software application, and so forth.
At 614, the method 600 may include generating for display an indication of the mitigative action and the identified source of the data value based on the stored association. For example, the UI subsystem 140 may generate data, for display via a user interface of a client device 160, an indication of the mitigative action and identified source.
It should be noted that the different sources may be provide different types of metrics 101-105. It should be further noted that the different sources may be arranged hierarchically so that identifying a source of the data value may have multiple sources. It this way, anomaly detection and reporting for the display may also be made hierarchically. For example, mitigative actions may be provided to engineers responsible for firewalls and/or to engineers responsible for specific server devices within a firewall. Likewise, mitigative actions may be provided to engineers responsible for application-level issues. Furthermore, the mitigative actions provided to various engineers may result from the same core issue. For example, an application that causes an anomaly may cause anomalous readings across a range of sources. In this example, the application may cause an endless loop of calls to an application service, which may make network calls to a server device. The endless loop of calls may therefore cause anomalous readings across a different range of sources. Each party responsible for each of the different range of sources may be alerted to the anomalous readings to aid in troubleshooting and mitigative efforts.
At 702, the method 700 may include accessing a data value of the metric for which an anomaly prediction is to be made, wherein the metric is identified by a metric identifier that is stored in association with a label identifier that identifies a label, the label indicating a source of the data value. At 704, the method 700 may include providing the data value to a plurality of machine learning models trained to detect anomalies based on behaviors of historical data values of the metric.
At 706, the method 700 may include generating, based on execution of the plurality of models, a plurality of anomaly scores comprising at least a first anomaly score generated by a first model trained to detect anomalies based on a first behavior of the historical data values of the metric and at least a second anomaly score generated by a second model trained to detect anomalies based on a second behavior of the historical data values of the metric. Each anomaly score from among the plurality of anomaly scores represents a prediction that the data value is anomalous based on a respective machine learning model that models a corresponding behavior of the historical data values of the metric.
At 708, the method 700 may include generating an aggregate anomaly score based on the plurality of anomaly scores, the aggregate anomaly score representing an aggregate prediction that the data value is anomalous. At 710, the method 700 may include identifying a mitigative action to take based on the aggregate anomaly score.
As indicated on each of the graphs 802, on “Date 2,” an outage occurred at approximately 1400. To test whether early warning anomalies were detectable, three metrics for each host was captured. In this example, the three metrics (examples of metrics 101-105) were: (1) number of processes running, (2) 5-minute CPU utilization and (3) 15-minute CPU utilization. These metrics were obtained at periodic intervals from various log sources.
The computer system 110 detected anomalous readings (indicated by “anomaly start”) starting five hours prior to the outage (indicated by “outage start”). It should be noted that each point after the “anomaly start” that exhibited anomalous readings had their duration score incremented.
CPU_15 and CPU_5 both exhibited spikes that peaked at around 8:30. Using the pluggable architecture of ML models 120, the following scores were determined:
s(x)=1.0. The model 120A using seasonality and trend behavior indicated that the analyzed data value was out of the predicted range.
r(x)=1.0. the model 120B using rarest occurrence indicated that the analyzed data value was unlikely to have occurred in the historical data value.
c(x)=1.0. The multiple related metrics behavior indicated that the combination of the three metrics used was not expected.
d(x)=1.0. The duration score indicated that the issue persisted for more than an hour after the anomaly start. Thus, the total raw aggregate score was 4.0. The normalized aggregate score (accounting for in this example equal weighting) was 1.0. Using the mitigative actions illustrated in Table 3, this anomaly would have been flagged to be escalated, providing an early warning for mitigation to potentially prevent the outage.
The computer system 110 detected the anomalous data points at around 11:00 and raised a ‘warn’ level alert for the sudden spike in traffic. The roughly fifty-fold spike in traffic is usually symptomatic of a potential Denial-of-Service (DoS) attack.
The anomaly scores were:
s(x)=1.0. The model 120A using seasonality and trend behavior indicated that the analyzed data value was out of the predicted range.
r(x)=0.84. the model 120B using rarest occurrence indicated that the analyzed data value was unlikely to have occurred in the historical data value.
c(x)=0.0. The multiple related metrics behavior indicated that the combination of the three metrics used was expected (not anomalous).
Thus, the total raw aggregate score was 2.32. The normalized aggregate score (accounting for in this example equal weighting) was 0.58. Using the mitigative actions illustrated in Table 3, this anomaly would have been flagged to be “warn”, providing an early warning for mitigation to detect anomalous network traffic, including potential DoS attacks. The root cause was determined to be an application defect that caused the traffic spike, but the computer system 110 accurately warned of the anomaly.
It should be noted that the ML models 120A-N, while illustrated as being part of the pluggable architecture 115, is not necessarily limited to being pluggable. For example, in some implementations, some or all of the ML models 120A-N may not be pluggable. Instead, some or all of the ML models 120A-N may be set and not able to be added or removed by a user for the purpose of changing which models will be used for anomaly detection (other than by a system administrator or other user that configures the computer system 110 for anomaly detection).
Once trained, the ML models 120A-N may be stored in the ML models datastore 118. For example, the model parameters, model weights, and/or other data relating to the trained models 120A-N may be stored in the ML models datastore 118 along with model identifiers for each trained model. The metrics 101-105 and their associated labels may be stored in the labels and metrics datastore 116.
The datastores (such as 114, 116, 118) may be a database, which may include, or interface to, for example, an Oracle™ relational database sold commercially by Oracle Corporation. Other databases, such as Informix™, DB2 or other data storage, including file-based, or query formats, platforms, or resources such as OLAP (On Line Analytical Processing), SQL (Structured Query Language), a SAN (storage area network), Microsoft AccessTM or others may also be used, incorporated, or accessed. The database may comprise one or more such databases that reside in one or more physical devices and in one or more physical locations. The datastores may include cloud-based storage solutions. The database may store a plurality of types of data and/or files and associated data or file descriptions, administrative information, or any other data. The various datastores may store predefined and/or customized data described herein.
Each of the computer system 110 and client devices 160 may also include memory in the form of electronic storage. The electronic storage may include non-transitory storage media that electronically stores information. The electronic storage media of the electronic storages may include one or both of (i) system storage that is provided integrally (e.g., substantially non-removable) with servers or client devices or (ii) removable storage that is removably connectable to the servers or client devices via, for example, a port (e.g., a USB port, a firewire port, etc.) or a drive (e.g., a disk drive, etc.). The electronic storages may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. The electronic storages may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources). The electronic storage may store software algorithms, information determined by the processors, information obtained from servers, information obtained from client devices, or other information that enables the functionalities described herein.
The computer system 110 and the one or more client devices 160 may be connected to one another via a communication network (not illustrated), such as the Internet or the Internet in combination with various other networks, like local area networks, cellular networks, or personal area networks, internal organizational networks, and/or other networks. It should be noted that the computer system 110 may transmit data, via the communication network, conveying the predictions one or more of the client devices 160. The data conveying the predictions may be a user interface generated for display at the one or more client devices 160, one or more messages transmitted to the one or more client devices 160, and/or other types of data for transmission. Although not shown, the one or more client devices 160 may each include one or more processors, such as processor 112.
The systems and processes are not limited to the specific implementations described herein. In addition, components of each system and each process can be practiced independent and separate from other components and processes described herein. Each component and process also can be used in combination with other assembly packages and processes. The flow charts and descriptions thereof herein should not be understood to prescribe a fixed order of performing the method blocks described therein. Rather the method blocks may be performed in any order that is practicable including simultaneous performance of at least some method blocks. Furthermore, each of the methods may be performed by one or more of the system features illustrated in
This written description uses examples to disclose the implementations, including the best mode, and to enable any person skilled in the art to practice the implementations, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the disclosure is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.