Metric alert rules are used to proactively detect service problems. Many of today's alerts are applied on various metrics generated by a service and rely on threshold values that are manually defined. An effective alert rule alerts when the metric does not behave as expected, while on the other hand, should not create too many false positives. Configuring static thresholds is a complex task, requiring the service owner to learn the historical behavior of each metric, apply some of his deep domain knowledge of the service, and make a prediction of what value ranges should be considered within the norm. The challenge scales up when a metric has one or more dimensions slicing it to multiple time series with different normal behaviors. In the dynamic environment in which modern services operate, services undergo frequent updates, and there are frequent changes to the way services are consumed. This requires an ongoing adjustment of static thresholds which means repeating the complex task every time a change happens.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Methods, systems, apparatuses, and computer-readable storage mediums described herein are configured to provide dynamic thresholds for alerting users of anomalous resource usage of computing resources. The dynamic thresholds may be based on the historical behavior of compute metrics (or a time series obtained therefor) associated with the computing resources and a detected seasonality in that time series. The seasonality is detected based on an analysis of several, different time series combinations that are based on the original time series, which advantageously increases the probability of successful seasonality detection. Based on characteristics of the time series metric, a model for generating dynamic thresholds may be determined. The dynamic thresholds track the detected seasonality of the compute metrics, rather than being a static (or straight-line) threshold. As utilization of the computing resources continues, the determined thresholds are applied to the compute metrics. If the determined thresholds are exceeded, an alert indicating an anomalous resource usage (which may be indicative of an issue with respect to the computing resource(s)) may be provided to a user.
Further features and advantages, as well as the structure and operation of various example embodiments, are described in detail below with reference to the accompanying drawings. It is noted that the example implementations are not limited to the specific embodiments described herein. Such example embodiments are presented herein for illustrative purposes only. Additional implementations will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate example embodiments of the present application and, together with the description, further serve to explain the principles of the example embodiments and to enable a person skilled in the pertinent art to make and use the example embodiments.
The features and advantages of the implementations described herein will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.
The present specification and accompanying drawings disclose numerous example implementations. The scope of the present application is not limited to the disclosed implementations, but also encompasses combinations of the disclosed implementations, as well as modifications to the disclosed implementations. References in the specification to “one implementation,” “an implementation,” “an example embodiment,” “example implementation,” or the like, indicate that the implementation described may include a particular feature, structure, or characteristic, but every implementation may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same implementation. Further, when a particular feature, structure, or characteristic is described in connection with an implementation, it is submitted that it is within the knowledge of persons skilled in the relevant art(s) to implement such feature, structure, or characteristic in connection with other implementations whether or not explicitly described.
In the discussion, unless otherwise stated, adjectives such as “substantially” and “about” modifying a condition or relationship characteristic of a feature or features of an implementation of the disclosure, should be understood to mean that the condition or characteristic is defined to within tolerances that are acceptable for operation of the implementation for an application for which it is intended.
Furthermore, it should be understood that spatial descriptions (e.g., “above,” “below,” “up,” “left,” “right,” “down,” “top,” “bottom,” “vertical,” “horizontal,” etc.) used herein are for purposes of illustration only, and that practical implementations of the structures described herein can be spatially arranged in any orientation or manner.
Numerous example embodiments are described as follows. It is noted that any section/subsection headings provided herein are not intended to be limiting. Implementations are described throughout this document, and any type of implementation may be included under any section/subsection. Furthermore, implementations disclosed in any section/subsection may be combined with any other implementations described in the same section/subsection and/or a different section/subsection in any manner
Embodiments described herein provide dynamic thresholds for alerting users of anomalous resource usage of computing resources. The dynamic thresholds may be based on the historical behavior of compute metrics (or a time series obtained therefor) associated with the computing resources and a detected seasonality in that time series. The seasonality is detected based on an analysis of several, different time series combinations that are based on the original time series, which advantageously increases the probability of successful seasonality detection. Based on characteristics of the time series, a model for generating dynamic thresholds may be determined. The dynamic thresholds track the detected seasonality of the compute metrics, rather than being a static (or straight-line) threshold. As utilization of the computing resources continue, the determined thresholds are applied to the compute metrics. If the determined thresholds are exceeded, an alert indicating an anomalous resource usage (which may be indicative of an issue with respect to the computing resource(s)) may be provided to a user.
The foregoing techniques advantageously enable the automatic detection of seasonal behavior and automatically set the thresholds such that an alert will be triggered only on deviation from the expected seasonal behavior. For example, alerts based on dynamic thresholds will not be triggered if a service is regularly idle on the weekends and then spikes every Monday. The techniques described herein recognize this seasonality and generate the dynamic thresholds based thereon. Static thresholds, on the other hand, are not very effective for such seasonal metrics. Instead, static thresholds issue alerts during spikes caused by seasonal behaviors, and as a result, unnecessary diagnostics are performed on the associated compute resource. This in turn causes significant downtime with respect to the compute resource. Accordingly, the techniques described herein improve the functionality of a system in which such compute resources are included, as any issues with the compute resources are accurately detected (and thus resolvable), while also avoiding unnecessary downtime due from false positives.
Moreover, the embodiments described herein improve the functioning of the computing devices for which the metrics are being obtained. For instance, conventional techniques that utilize static thresholds may mask legitimate issues. For instance, if the static threshold is set large enough to accommodate a large seasonal spike, then anomalous behaviors may go undetected. As such, a user may never be alerted when such a behavior occurs and subsequently remedy the issue. This may have a detrimental effect on the computing device. For instance, the computing device may be suffering from abnormal memory usage and/or network usage, which would go unnoticed by the user. Accordingly, the computing device may operate much more slowly and/or may be unable to properly handle requests. In contrast, because the embodiments described herein dynamically track metrics based on their seasonality, such a situation is avoided.
For example,
Clusters 102A, 102B and 102N may form a network-accessible server set. Each of clusters 102A, 102B and 102N may comprise a group of one or more nodes and/or a group of one or more storage nodes. For example, as shown in
In an embodiment, one or more of clusters 102A, 102B and 102N may be co-located (e.g., housed in one or more nearby buildings with associated components such as backup power supplies, redundant data communications, environmental controls, etc.) to form a datacenter, or may be arranged in other manners. Accordingly, in an embodiment, one or more of clusters 102A, 102B and 102N may be a datacenter in a distributed collection of datacenters.
Each of node(s) 108A-108N, 112A-112N and 114A-114N may be configured to execute one or more software applications (or “applications”) and/or services and/or manage hardware resources (e.g., processors, memory, etc.), which may be utilized by users (e.g., customers) of the network-accessible server set. Node(s) 108A-108N, 112A-112N and 114A-114N may also be configured for specific uses. For example, as shown in
Dynamic threshold-based alert engine 118 may be configured to determine dynamic thresholds for alerting users of anomalous resource usage of resources maintained by system 100. For instance, a monitor may obtain metrics associated with resources, such as, but not limited to, operating systems, applications, services executing on one or more of nodes 108A-108N, 112A-112N and/or 114A-114N, hardware and virtual resources maintained by the network-accessible server set (e.g., nodes 108A-108N, 112A-112N and/or 114A-114N, virtual machines, central processor units (CPUs), storage (e.g., storage disks 122), memories, etc.), and/or I/O network bandwidth, power, etc., associated therewith. The metrics may represent numerical data values that describe an aspect of such resources at a particular point of time. For example, the metrics may represent CPU usage, a number of requests issued by a particular application or service, memory or storage utilization, etc. Such metrics may be collected at regular intervals (e.g., each second, each minute, each hour, each day, etc.) and may be aggregated as a time series (i.e., a series of data points indexed in time order). The monitor may collect multiple days or weeks of worth data to obtain the historical behavior of the metric. The time series for each metric may be stored in a storage, such as storage disks 122.
Dynamic threshold-based alert engine 118 may analyze the historical behavior of the metric (i.e., the time series) to determine a seasonal pattern (i.e., a seasonality) therein. A seasonal pattern is a characteristic of the time series in which the data experiences regular or predictable changes that occur at particular time interval, such as hourly, daily, weekly, etc. Examples of seasonal patterns include, but are not limited to, increased network traffic on weekdays than compared to weekends, increased network traffic during business hours than compared to non-business hours, a daily spike in CPU and/or storage utilization (e.g., due to a backup process), etc. The foregoing may be determined by generating several different time series combinations based on the original time series. Additional details regarding determining a seasonal pattern is described below in Subsection A.
The historical time series and/or determined seasonal pattern for a given metric may be utilized by a model selector, which is configured to automatically select a modeler for generating dynamic thresholds with regards to the metric. The model selector may utilize the determined seasonal pattern and/or the diversity of values of the metric to determine which model best fits. Examples of modelers include, but are not limited to, a low dispersion-based modeler, a seasonal adjusted boxplot-based modeler and a Box-Cox transformation-based modeler. The selected modeler is utilized to determine the dynamic thresholds for the metric. Additional details regarding the model selector is described below in Subsection B.
As the computing resources continue to operate, the monitor continues to obtain computing metrics associated with such resources. The determined thresholds are applied to such computing metrics. If the determined thresholds are exceeded, an alert indicating anomalous resource usage with respect to the computing resource(s) may be provided to a user (e.g., via computing device 104).
A user may access dynamic threshold-based alert engine 118 via computing device 104, for example to enable dynamic threshold generation and/or to receive anomalous resource usage alerts. As shown in
A. Seasonal Pattern Detection
Monitor 202 may obtain metrics associated with resources 218, such as, but not limited to, operating systems, applications, services executing on one or more of nodes 108A-108N, 112A-112N and/or 114A-114N, hardware and virtual resources maintained by the network-accessible server set (e.g., nodes 108A-108N, 112A-112N and/or 114A-114N, virtual machines, central processor units (CPUs), storage (e.g., storage disks 122), memories, etc.), and/or I/O, network bandwidth, power, etc., associated therewith. Such metrics may be collected at regular intervals and may be aggregated as a time series 216. Monitor 202 may collect multiple days or weeks of worth data to obtain the historical behavior of the metric. The time series for each metric may be stored in storage 214, which may be an example of storage disk(s) 122.
Seasonality detector 204 may be configured to analyze the historical behavior of the metric (i.e., time series 216) to determine a seasonal pattern (i.e., a seasonality) therein. Using known techniques to detect seasonality in a time series is problematic in many real-world scenarios due to noise in the metric that prevents the seasonality from being detected (e.g., by using Fast Fourier Transforms (FFTs)). To overcome this, seasonality detector 204 may generate several different time series combinations that are generated based on the original time series (e.g., time series 216)), which advantageously increases the probability to detect the seasonality. An FFT may be applied to each combination to detect seasonality for each of the generated combinations. The foregoing may be performed using unsupervised machine learning-based techniques, which utilize time series 216 during a training phase in which the seasonal pattern is detected. In accordance with an embodiment, the training phase may be performed approximately every 24 hours. In accordance with such an embodiment, data newly available since the last training phase is added while trailing data is omitted. In accordance with a further embodiment, the history time span used for training may be 10 days, except when weekly seasonality is detected, in which case 28 days of historic span is used. As will be described below, once the seasonal pattern is detected (e.g., via the training phase), dynamic thresholds may be generated based thereon, and the dynamic thresholds may be applied to current compute metrics to detect anomalous behavior. Such techniques may continuously learn a particular metric's behavior and adapts to metric changes. That is, the seasonal pattern and amount of data used for training determined for a particular metric may change over time as the behavior of the metric being monitored changes.
Each time series combination may be generated based on a combination of one or more parameters. The parameter(s) may include, but are not limited to, a clipped (or non-clipped) version of the time series) and/or one or more filtered versions of the non-clipped and/or clipped version of the time series, where the time series are filtered based on different window sizes.
For instance, time series 216 may be clipped by clipper 206. Clipper 206 may be configured to remove outlying data points of time series 216 (e.g., to remove spikes in the metric). For instance, clipper 206 may be configured to remove a certain percentage of the highest and lowest values of time series 216 (e.g., 5% of the highest and lowest values) to generate a clipped time series 220. Each of time series 216 and clipped time series 220 may be provided to time-based filter(s) 208.
Time-based filter(s) 208 may be configured to perform a filtering (or smoothing) function on time series 216 based on different window sizes. The filtering function is configured to reduce the noise in the metric, while preserving the seasonal pattern. The different window sizes may be computed to match the seasonal spans that are frequently recurring in time series 216 (e.g., hourly, daily, weekly, etc.). For instance, time-based filter(s) 208 may generate a first filtered time series 222 based on time series 216 in accordance with a first window size (e.g., hourly). In particular, time-based filter(s) 208 may generate first filtered time series 222 by performing a filtering function on time series 216 that, for each data point in time series 216, combines adjacent data points with the data point to determine an average value. The average values are used to generate first filtered time series 222. Time-based filter(s) 208 may generate a second filtered time series 224 and a third filtered time series 226 based on time series 216 in accordance with a second window size (e.g., daily) and a third window size (e.g., weekly), respectively, in a similar manner as described with reference to first filtered time series 222 However, when generating second filtered time series 224, time-based filter(s) 208 may combine adjacent points for a given data point that are further in vicinity than the adjacent data points utilized to generate first filtered time series 222. Similarly, when generating third filtered time series 226, time-based filter(s) 208 may combine adjacent points for a given data point that are further in vicinity than the adjacent data points utilized to generate second filtered time series 224.
Time-based filter(s) 228 may also be configured to perform a filtering function on clipped time series 220 based on different window sizes in a similar manner as described above. For instance, as shown in
Combiner 210 may be configured to combine time series 216 with first filtered time series 222 to generate a first combined time series 234, combine time series 216 with second filtered time series 224 to generate a second combined time series 236, combine time series 216 with third filtered time series 226 to generate a third combined time series 238, combine clipped time series 220 with first filtered, clipped time series 228 to generate a fourth combined time series 240, combine clipped time series 220 with second filtered, clipped time series 230 to generate a fifth combined time series 242, and combine clipped time series 220 with third filtered, clipped time series 232 to generate a sixth combined time series 244. Combiner 210 may perform a Cartesian multiplication operation to perform the above-referenced combinations to generate first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244.
Transformer 212 may be configured to perform an FFT on each of time series 216, clipped time series 220, first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244 to detect seasonality for each time series 216, clipped time series 220, first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244. The foregoing process considerably increases the probability of detecting the seasonality in at least one of the generated time series. It is noted that transformer 212 attempts to find seasonality in original time series (i.e., time series 216) to not hinder the seasonality detection in the event that clipped time series 220 no longer includes the seasonality to do the clipping operation performed by clipper 206. If a seasonal pattern is detected, transformer 212 outputs a detected seasonal pattern 246. In the event that more than one seasonal pattern is detected (e.g., a daily seasonality and a weekly seasonality), the longest seasonal pattern (e.g., the weekly seasonality) is used to model the data because they are multiples of each other. When modeling, for example, a weekly seasonal pattern also having a daily seasonality, the entire daily seasonality is contained within the weekly seasonal pattern.
It is noted that the window sizes utilized by time-based filter(s) 208 are purely exemplary and that any window size (e.g., monthly, yearly, etc.) may be utilized to determine seasonality for different time frames (e.g., monthly, seasonality, yearly seasonality, etc.).
Accordingly, a seasonal pattern may be determined in a time series in many ways. For example,
Flowchart 300 begins with step 302. In step 302, a predetermined percentage of the highest and lowest values from a time series is removed to generate a clipped time series. For example, with reference to
In step 304, the time series is filtered in accordance with at least one window size to generate at least one filtered time series. For example, with reference to
In step 306, the clipped time series is filtered in accordance with the at least one window size to generate at least one filtered, clipped time series. For example, with reference to
In step 308, the seasonal pattern is determined based on applying a respective transform to the time series, the clipped time series, a combination of the time series and the at least one filtered time series, and a combination of the clipped time series and the at least one filtered, clipped time series. For example, with reference to
B. Model Selection for Generating Dynamic Thresholds
Once seasonal pattern 246 is detected from time series 216 using the techniques described above with reference to Subsection A, seasonal pattern 246 and time series 216 (e.g., the most recent data values collected for time series 216) may be provided to a model selector to determine the optimal model for generating dynamic thresholds for a particular metric. For example,
Model selector 402 may be configured to perform a statistical analysis based on time series 216 and/or seasonal pattern 246 to automatically determine which modeler should be utilized to determine the automatic thresholds. For instance, model selector 402 may analyze the range, mean, variance, standard deviation, spread, etc., of time series 216 and/or seasonal pattern 246. Low dispersion modeler 406 may be selected if the analysis indicates that time series 216 and/or seasonal pattern 246 is relatively constant and/or rarely changes). If time series 216 and/or seasonal pattern 246 is relatively variable (e.g., has a sinusoidal pattern, high variance, etc.), model selector 402 may select one of a seasonal adjusted boxplot-based modeler 408 or Box-Cox transformation-based modeler 410. Model selector 402 may select Box-Cox transformation-based modeler 410 if time series 216 and/or seasonal pattern 246 has a majority of positive values (e.g., does not include values that are less than or equal to 0) and may select seasonal adjusted boxplot-based modeler 408 if time series 216 and/or seasonal pattern 246 includes values that are less than or equal to 0.
Box-Cox transformation-based modeler 410 has an inherent limitation dealing with non-positive values. Thus, non-positive values may be removed from time series 216 before applying Box-Cox transformation-based modeler 410. In some metrics, a substantial part of the data (more than 80%) is zeros. A common example for such metrics is a request count or network traffic for processes which are active only on a part of the day. In such situations, Box-Cox transformation-based modeler 410 performs poorly because too little values remain after removal of zeros. Accordingly, seasonal adjusted boxplot-based modeler 408 may be utilized in certain scenarios.
Generally, Box-Cox transformation-based modeler 410 generates more accurate thresholds than compared to seasonal adjusted boxplot-based modeler 408, as more data points are analyzed (as will be described below with reference to Subsection B.2). Seasonal adjusted boxplot-based modeler 408 may be utilized as a fallback modeler in the event that time series 216 and/or seasonal pattern 246 includes values that are less than or equal to 0.
Additional details regarding seasonal adjusted boxplot-based modeler 408 and Box-Cox transformation-based modeler 410 are described below with reference to Subsections B.1 and B.2, respectively.
After determining which modeler to utilize, the selected modeler automatically generates the dynamic thresholds. In the event that the analysis determines that none of the modelers are applicable, then no dynamic thresholds are automatically generated. It is noted that model bank 404 may store any number of modelers and that low dispersion-based modeler 406, seasonal adjusted boxplot-based modeler 408 and Box-Cox transformation-based modeler 410 are some examples of the modelers that may be utilized to generate thresholds.
Accordingly, a modeler for generating dynamic thresholds may be selected in many ways. For example,
Flowchart 500 begins with step 502. In step 502, a determination is made as to whether the data values of the time series are relatively constant or have a relatively high variance. For example, with reference to
In step 504, the low dispersion-based modeler is selected. For example, with reference to
In step 506, one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler is selected. For example, with reference to
In accordance with one or more embodiments, selecting one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler comprises determining whether the data values of the time series comprise non-positive values. Based at least on determining that the data values of the time series comprise non-positive values, the seasonal adjusted boxplot-based modeler is selected. Based at least on determining that the data values of the time series do not comprise non-positive values, the Box-Cox transformation-based modeler is selected. For example, with reference to
1. Seasonal Adjusted Boxplot Modeler
Bin-based time series generator 602 may be configured to de-seasonalize seasonal pattern 246 to generate a plurality of different time series based on seasonal pattern 246. Each generated time series may be based on a particular bin (or bucket) associated with seasonal pattern 246. A bin may represent a particular time interval. For example, each bin may represent a particular hour of a given day. In this case, the number of bins would be 24 (1 bin for each hour of the day), and therefore, the number of time series generated would also be 24. In another example, each bin may represent a ten-minute interval. In a scenario in which seasonal pattern 246 comprises 7 days of data (or 10,008 minutes' worth of data), the number of bins would be 1,008, and therefore, the number of time series generated would also be 1,008. It is noted that any time interval may be utilized.
For a particular bin, bin-based time series generator 602 may determine each of value of the metric for that bin in seasonal pattern 246. For example,
Referring again to
The most straightforward and popular method for generating thresholds is the “3-sigma” rule, according to which one estimates the sample mean and standard deviation and sets the boundaries at 3 standard deviations from the mean. However, it is well known that the sample mean and standard deviation are non-robust statistics, prone to significant errors in the presence of outliers.
To deal with this caveat, several methods based on robust statistics have been proposed. In accordance with an embodiment, threshold determiner 604 utilizes a modified version of Tukey's method (a.k.a. boxplot) to estimate the boundaries of the normal behavior of the data.
Given that P25 and P75 are the 25th and 75th percentiles of the data, the classic Tukey's test sets the higher and lower boundaries at P75+K·(P75−P25) and P25−K·(P75−P25), respectively. K in this equation is a predetermined factor, and usually K=1.5 defines boundaries for “mild outliers” and K=3 defines boundaries for “extreme outliers”.
In the modified version, since it is not very rare that the interquartile range (P75−P25) in telemetry data is 0, the inter-percentile range between the 90th and 10th percentiles (P90−P10) is used instead. Thus, the minimum and/or maximum thresholds may be set at P90+{tilde over (K)}·(P90−P10) and P10−{tilde over (K)}·(P90−P10), respectively.
The {tilde over (K)}(≈0.55) factor applied may be determined so that if the data were normally distributed, the minimum and/or maximum thresholds will be the same as those set by the classic Tukey's method with K=1.5. In addition, minimum and/or maximum thresholds for different detection sensitivity levels can be set by using a factor larger than {tilde over (K)}. In accordance with an embodiment, 1.5×{tilde over (K)} and 2×{tilde over (K)} for used for medium and extreme outliers, respectively. Furthermore, to account for possible skewness of the data, an adjusted boxplot may be used that adds a compensating factor that is a function of the medcouple.
Threshold determiner 604 provides the determined thresholds (shown as thresholds (thresholds 610A-610N) to dynamic threshold generator 606. Dynamic threshold generator 606 may be configured to compose (or combine) each of thresholds 610A-610N to generate a continuous, dynamic threshold 612, which tracks the seasonality of seasonal pattern 246. For instance, with reference to
Accordingly, a seasonal adjusted boxplot-based modeler may be utilized to generate dynamic thresholds in many ways. For example,
Flowchart 900 begins with step 902. In step 902, a plurality of bins for the seasonal pattern is determined. For example, with reference to
In step 904, for each bin of the plurality of bins, data values from the time series are selectively assigned to the bin. For example, with reference to
In step 906, for each bin of the plurality of bins, a bin-based time series is generated based on the assigned data values for the bin. For example, with reference to
In step 908, for each bin of the plurality of bins, a bin-based threshold is determined based on the bin-based time series. For example, with reference to
In step 910, each of the bin-based thresholds are combined to generate the threshold. For example, with reference to
2. Box-Cox Transformation-based Modeler
When forecasting a seasonal metric, several values of the same seasonal phase are usually required to establish a forecasting threshold; since forecasting metrics based on past behavior requires estimating the variation of metric values. Variation cannot be computed on a single value or be reliable on very few measurements. In some seasonality spans, such as weekly patterns, taking few weeks back to build a forecasting model might be wrong in terms of adapting to changes in models. So, in such cases, the use of several seasons is simply misleading. There is a need to reduce the time span used in building the forecast model.
A different approach is to use the entire data to predict the variation and make it a constant variation in all the seasonal phases. This approach is also problematic since in many cases, the variation of service metrics is somewhat correlative to the signal values. For example, the variation of traffic data is higher during business hours than during the night or weekends. As a result, after decomposing the seasonal behavior of the data, the random (or noisy) components remain. Such components might have constant variance or variance that changes over time.
By applying Box-Cox transformation-based modeler 410 to the seasonal decomposed variants of the data, all the residuals are changed to a common space on which bandwidth of the forecast belt can be estimated. At that point, an adjusted boxplot can be applied to the residuals. Therefore, a backward (or inverse) transform is applied to go back to the original metric values to have a ready model with seasonally-adjusted boxplot thresholds. The foregoing advantageously enables the creation of a forecasting seasonal model by using a very few numbers of seasons.
Seasonal decomposer 1002 may be configured to decompose the seasonal behavior of time series 216. For instance, seasonal decomposer 1002 may remove seasonal pattern 246 from time series 216. This may be performed by subtracting seasonal pattern 246 from time series 216. The remaining portion of time series 216 represents a natural random variation (or residual data) of time series 216. For instance,
The issue is the residual data is that there is a varying degree of noise. When performing statistical analysis, it is desired to have the same level of noise all the time. Accordingly, a power transformation is performed to stabilize the noise in the residual data. For example, with reference to
where yi and yi(λ) are the ith data point in the original and transformed scale, respectively. This transformation formula is defined as such so the transformation is continuous in λ as it approaches 0. The adequate λ for a given time series is estimated by maximizing the log likelihood function of the residuals given they are normally distributed.
Dynamic threshold generator 1006 may be configured generate a continuous, dynamic threshold 1012, which tracks the seasonality of seasonal pattern 246. For example, dynamic threshold generator 1006 may be configured to generate at least one of a minimum threshold and a maximum threshold for transformed residual data 1010. Because transformed residual data 1010 is relatively constant, the minimum and/or maximum thresholds may be relatively constant threshold(s). After determining the minimum and/or maximum thresholds, dynamic threshold generator 1006 may perform an inverse transformation on the minimum and maximum thresholds to reintroduce the variance (i.e., the transform performed by variance stabilizer 1004 is reversed). Thereafter, the transformed minimum and/or maximum thresholds are combined with seasonal pattern 246, thereby resulting in minimum and/or maximum thresholds (i.e., dynamic thresholds 1012) that tracks seasonal pattern 246.
Accordingly, a Box-Cox transformation-based modeler may be utilized to generate dynamic thresholds in many ways. For example,
Flowchart 1200 begins with step 1202. In step 1202, the seasonal pattern is decomposed from the time series to obtain residual data associated with the time series. For example, with reference to
In step 1204, a Box-Cox transform is applied to the residual data to stabilize a variance of the residual data to generate transformed residual data. For example, with reference with
In step 1206, a transformed residual data-based threshold is determined based on the transformed residual data. For example, with reference to
In step 1208, an inverse Box-Cox transform is applied to the transformed residual data-based threshold to reintroduce the variance, thereby generating a transformed threshold. For example, with reference to
In step 1210, the seasonal pattern is combined with the transformed threshold to generate the threshold. For example, with reference to
C. Method for Issuing Alerts Indicative of Anomalous Resource Usage based on Dynamic Thresholds
Once the dynamic thresholds are determined for a particular metric, the determined dynamic thresholds may be utilized to determine whether the metric exhibits anomalous behavior (e.g., an excessive amount or abnormally low number of requests, an excessive usage of CPU cycles, memory and/or storage, etc.) as the corresponding computing resource(s) continue to operate. If the determined thresholds are exceeded, an alert indicating anomalous resource usage with respect to the computing resource(s) may be provided to a user.
For example,
Flowchart 1300 begins with step 1302. In step 1302, a time series of data values corresponding to a metric associated with a computing resource is obtained. For example, with reference to
In step 1304, a seasonal pattern in the time series is detected. For example, with reference to
In accordance with one or more embodiments, the seasonal pattern may be determined in accordance with system 200 of
In step 1306, a statistical analysis of the time series is performed. For example, with reference to
In step 1308, a modeler is selected from among a plurality of different modelers based on results of the statistical analysis. For example, with reference to
In accordance with one or more embodiments, the plurality of different modelers comprises at least one of a low dispersion-based modeler, a seasonal adjusted boxplot-based modeler, or a Box-Cox transformation-based modeler. For example, with reference to
In accordance with one or more embodiments, the modeler may be selected in accordance with system 400 of
In step 1310, the selected modeler is utilized to generate a threshold based on the seasonal pattern. For example, with reference to
In step 1312, the metric associated with the computing resource is monitored to determine whether the metric exceeds the threshold. For example, with reference to
In step 1314, an indication is provided based at least on determining that the metric exceeds the threshold. For example, with reference to
In accordance with one or more embodiments, providing the indication includes issuing an alert. The alert (e.g., indication 1418) may be issued to computing device 104, as shown in
In accordance with one or more embodiments, indication 1418 may trigger one or more actions to be performed with respect to resources 1410. For instance, additional resources of resources 1410 may be automatically allocated to handle an excessive amount of network requests, CPU usage, memory usage, etc. to compensate for the anomalous behavior.
As shown in
System 1500 also has one or more of the following drives: a hard disk drive 1514 for reading from and writing to a hard disk, a magnetic disk drive 1516 for reading from or writing to a removable magnetic disk 1518, and an optical disk drive 1520 for reading from or writing to a removable optical disk 1522 such as a CD ROM, DVD ROM, BLU-RAY™ disk or other optical media. Hard disk drive 1514, magnetic disk drive 1516, and optical disk drive 1520 are connected to bus 1506 by a hard disk drive interface 1524, a magnetic disk drive interface 1526, and an optical drive interface 1528, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer. Although a hard disk, a removable magnetic disk and a removable optical disk are described, other types of computer-readable memory devices and storage structures can be used to store data, such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.
A number of program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These program modules include an operating system 1530, one or more application programs 1532, other program modules 1534, and program data 1536. In accordance with various embodiments, the program modules may include computer program logic that is executable by processing unit 1502 to perform any or all of the functions and features of nodes 108A-108B, 112A-112N, and/or 114A-114N, storage node(s) 110, computing device 104, and dynamic threshold-based alert engine 118 of
A user may enter commands and information into system 1500 through input devices such as a keyboard 1538 and a pointing device 1540 (e.g., a mouse). Other input devices (not shown) may include a microphone, joystick, game controller, scanner, or the like. In one embodiment, a touch screen is provided in conjunction with a display 1544 to allow a user to provide user input via the application of a touch (as by a finger or stylus for example) to one or more points on the touch screen. These and other input devices are often connected to processing unit 1502 through a serial port interface 1542 that is coupled to bus 1506, but may be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB). Such interfaces may be wired or wireless interfaces.
Display 1544 is connected to bus 1506 via an interface, such as a video adapter 1546. In addition to display 1544, system 1500 may include other peripheral output devices (not shown) such as speakers and printers.
System 1500 is connected to a network 1548 (e.g., a local area network or wide area network such as the Internet) through a network interface 1550, a modem 1552, or other suitable means for establishing communications over the network. Modem 1552, which may be internal or external, is connected to bus 1506 via serial port interface 1542.
As used herein, the terms “computer program medium,” “computer-readable medium,” and “computer-readable storage medium” are used to generally refer to memory devices or storage structures such as the hard disk associated with hard disk drive 1514, removable magnetic disk 1518, removable optical disk 1522, as well as other memory devices or storage structures such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like. Such computer-readable storage media are distinguished from and non-overlapping with communication media (do not include communication media or modulated data signals). Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wireless media such as acoustic, RF, infrared and other wireless media. Embodiments are also directed to such communication media.
As noted above, computer programs and modules (including application programs 1532 and other program modules 1534) may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received via network interface 1550, serial port interface 1542, or any other interface type. Such computer programs, when executed or loaded by an application, enable system 1500 to implement features of embodiments discussed herein. Accordingly, such computer programs represent controllers of the system 1500. Embodiments are also directed to computer program products comprising software stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a data processing device(s) to operate as described herein. Embodiments may employ any computer-useable or computer-readable medium, known now or in the future. Examples of computer-readable mediums include, but are not limited to memory devices and storage structures such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs, zip disks, tapes, magnetic storage devices, optical storage devices, MEMs, nanotechnology-based storage devices, and the like.
In alternative implementations, system 1500 may be implemented as hardware logic/electrical circuitry or firmware. In accordance with further embodiments, one or more of these components may be implemented in a system-on-chip (SoC). The SoC may include an integrated circuit chip that includes one or more of a processor (e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits and/or embedded firmware to perform its functions.
A method is described herein. The method includes: obtaining a time series of data values corresponding to a metric associated with a computing resource; detecting a seasonal pattern in the time series; performing a statistical analysis of the time series; selecting a modeler from among a plurality of different modelers based on results of the statistical analysis; utilizing the selected modeler to generate a threshold based on the seasonal pattern; monitoring the metric associated with the computing resource to determine whether the metric exceeds the threshold; and providing an indication based at least on determining that the metric exceeds the threshold.
In one implementation of the foregoing method, the plurality of different modelers comprises at least one of: a low dispersion-based modeler;
In another implementation of the foregoing method, the selecting comprises: determining whether the data values of the time series are relatively constant or have a relatively high variance; based at least on determining that the data values of the time series are relatively constant, selecting the low dispersion-based modeler; and based at least on determining that the data values of the time series have a relatively high variance, selecting one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler.
In another implementation of the foregoing method, said selecting one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler comprises: determining whether the data values of the time series comprise non-positive values; based at least on determining that the data values of the time series comprise non-positive values, selecting the seasonal adjusted boxplot-based modeler; and based at least on determining that the data values of the time series do not comprise non-positive values, selecting the Box-Cox transformation-based modeler.
In another implementation of the foregoing method, the selected modeler is the seasonal adjusted boxplot-based modeler, and wherein the generating comprises: determining a plurality of bins for the seasonal pattern; for each bin of the plurality of bins: selectively assigning data values from the time series to the bin; generating a bin-based time series based on the assigned data values for the bin; and determining a bin-based threshold based on the bin-based time series; and combining each of the bin-based thresholds to generate the threshold.
In another implementation of the foregoing method, the selected modeler is the Box-Cox transformation-based modeler, and wherein the generating comprises: decomposing the seasonal pattern from the time series to obtain residual data associated with the time series; applying a Box-Cox transform to the residual data to stabilize a variance of the residual data to generate transformed residual data; determining a transformed residual data-based threshold based on the transformed residual data; applying an inverse Box-Cox transform to the transformed residual data-based threshold to reintroduce the variance, thereby generating a transformed threshold; and combining the seasonal pattern with the transformed threshold to generate the threshold.
In another implementation of the foregoing method, said detecting comprises: removing a predetermined percentage of the highest and lowest values from the time series to generate a clipped time series; filtering the time series in accordance with at least one window size to generate at least one filtered time series; filtering the clipped time series in accordance with the at least one window size to generate at least one filtered, clipped time series; and determining the seasonal pattern based on applying a respective transform to the time series, the clipped time series, a combination of the time series and the at least one filtered time series, and a combination of the clipped time series and the at least one filtered, clipped time series.
In another implementation of the foregoing method, providing the indication includes issuing an alert.
A system is also described herein. The system includes: at least one processor circuit; and at least one memory that stores program code configured to be executed by the at least one processor circuit, the program code comprising: a monitor configured to obtain a time series of data values corresponding to a metric associated with a computing resource; a seasonality detector configured to detect a seasonal pattern in the time series; and a model selector configured to: perform a statistical analysis of the time series; and select a modeler from among a plurality of different modelers based on results of the statistical analysis, the selected modeler being utilized to generate a threshold based on the seasonal pattern; the monitor further configured to monitor the metric associated with the computing resource to determine whether the metric exceeds the threshold, and providing an indication based at least on determining that the metric exceeds the threshold.
In one implementation of the foregoing system, the plurality of different modelers comprises at least one of: a low dispersion-based modeler; a seasonal adjusted boxplot-based modeler; or a Box-Cox transformation-based modeler.
In another implementation of the foregoing system, the model selector is further configured to: determine whether the data values of the time series are relatively constant or have a relatively high variance; based at least on determining that the data values of the time series are relatively constant, select the low dispersion-based modeler; and based at least on determining that the data values of the time series have a relatively high variance, select one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler.
In another implementation of the foregoing system, the model selector is further configured to: determine whether the data values of the time series comprise non-positive values; based at least on determining that the data values of the time series comprise non-positive values, select the seasonal adjusted boxplot-based modeler; and based at least on determining that the data values of the time series do not comprise non-positive values, select the Box-Cox transformation-based modeler.
In another implementation of the foregoing system, the selected modeler is the seasonal adjusted boxplot-based modeler, and wherein the generating comprises: determining a plurality of bins for the seasonal pattern; for each bin of the plurality of bins: selectively assigning data values from the time series to the bin; generating a bin-based time series based on the assigned data values for the bin; and determining a bin-based threshold based on the bin-based time series; and combining each of the bin-based thresholds to generate the threshold.
In another implementation of the foregoing system, the selected modeler is the Box-Cox transformation-based modeler, and wherein the generating comprises: decomposing the seasonal pattern from the time series to obtain residual data associated with the time series; applying a Box-Cox transform to the residual data to stabilize a variance of the residual data to generate transformed residual data; determining a transformed residual data-based threshold based on the transformed residual data; applying an inverse Box-Cox transform to the transformed residual data-based threshold to reintroduce the variance, thereby generating a transformed threshold; and combining the seasonal pattern with the transformed threshold to generate the threshold.
In another implementation of the foregoing system, said detecting comprises: removing a predetermined percentage of the highest and lowest values from the time series to generate a clipped time series; filtering the time series in accordance with at least one window size to generate at least one filtered time series; filtering the clipped time series in accordance with the at least one window size to generate at least one filtered, clipped time series; and determining the seasonal pattern based on applying a respective transform to the time series, the clipped time series, a combination of the time series and the at least one filtered time series, and a combination of the clipped time series and the at least one filtered, clipped time series.
A computer-readable storage medium having program instructions recorded thereon that, when executed by at least one processor, perform a method. The method includes: obtaining a time series of data values corresponding to a metric associated with a computing resource; detecting a seasonal pattern in the time series; performing a statistical analysis of the time series; selecting a modeler from among a plurality of different modelers based on results of the statistical analysis; utilizing the selected modeler to generate a threshold based on the seasonal pattern; monitoring the metric associated with the computing resource to determine whether the metric exceeds the threshold; and providing an indication based at least on determining that the metric exceeds the threshold.
In another implementation of the foregoing computer-readable storage medium, the plurality of different modelers comprises at least one of: a low dispersion-based modeler; a seasonal adjusted boxplot-based modeler; or a Box-Cox transformation-based modeler.
In another implementation of the foregoing computer-readable storage medium, the selecting comprises: determining whether the data values of the time series are relatively constant or have a relatively high variance; based at least on determining that the data values of the time series are relatively constant, selecting the low dispersion-based modeler; and based at least on determining that the data values of the time series have a relatively high variance, selecting one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler.
In another implementation of the foregoing computer-readable storage medium, said selecting one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler comprises: determining whether the data values of the time series comprise non-positive values; in response to determining that the data values of the time series comprise non-positive values, selecting the seasonal adjusted boxplot-based modeler; and in response to determining that the data values of the time series do not comprise non-positive values, selecting the Box-Cox transformation-based modeler.
In another implementation of the foregoing computer-readable storage medium, the selected modeler is the seasonal adjusted boxplot-based modeler, and wherein the generating comprises: determining a plurality of bins for the seasonal pattern; for each bin of the plurality of bins: selectively assigning data values from the time series to the bin; generating a bin-based time series based on the assigned data values for the bin; and determining a bin-based threshold based on the bin-based time series; and combining each of the bin-based thresholds to generate the threshold.
In another implementation of the foregoing computer-readable storage medium, the selected modeler is the Box-Cox transformation-based modeler, and wherein the generating comprises: decomposing the seasonal pattern from the time series to obtain residual data associated with the time series; applying a Box-Cox transform to the residual data to stabilize a variance of the residual data to generate transformed residual data; determining a transformed residual data-based threshold based on the transformed residual data; applying an inverse Box-Cox transform to the transformed residual data-based threshold to reintroduce the variance, thereby generating a transformed threshold; and combining the seasonal pattern with the transformed threshold to generate the threshold.
While various example embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art(s) that various changes in form and details may be made therein without departing from the spirit and scope of the embodiments as defined in the appended claims. Accordingly, the breadth and scope of the disclosure should not be limited by any of the above-described example embodiments, but should be defined only in accordance with the following claims and their equivalents.