CONFIDENCE APPROXIMATION-BASED DYNAMIC THRESHOLDS FOR ANOMALOUS COMPUTING RESOURCE USAGE DETECTION

Information

  • Patent Application
  • 20210026698
  • Publication Number
    20210026698
  • Date Filed
    November 06, 2019
    4 years ago
  • Date Published
    January 28, 2021
    3 years ago
Abstract
Embodiments described herein provide dynamic thresholds for alerting users of anomalous resource usage of computing resources. The dynamic thresholds are based on the historical behavior of compute metrics (or a time series obtained therefor) associated with the computing resources and a detected seasonality in that time series. Based on characteristics of the time series, a model for generating dynamic thresholds is determined. The dynamic thresholds track the detected seasonality of the compute metrics. As utilization of the computing resources continue, the determined thresholds are applied to the compute metrics. If the determined thresholds are exceeded, an alert indicating an anomalous resource usage is provided to a user. The dynamic threshold may be adjusted (e.g., tightened or relaxed) based on a confidence level of the detected seasonality. This advantageously reduces the number of false alerts.
Description
BACKGROUND

Metric alert rules are used to proactively detect service problems. Many of today's alerts are applied on various metrics generated by a service and rely on threshold values that are manually defined. An effective alert rule alerts when the metric does not behave as expected, while on the other hand, should not create too many false positives. Configuring static thresholds is a complex task, requiring the service owner to learn the historical behavior of each metric, apply some of his deep domain knowledge of the service, and make a prediction of what value ranges should be considered within the norm. The challenge scales up when a metric has one or more dimensions slicing it to multiple time series with different normal behaviors. In the dynamic environment in which modern services operate, services undergo frequent updates, and there are frequent changes to the way services are consumed. This requires an ongoing adjustment of static thresholds which means repeating the complex task every time a change happens.


Forecasting future metric values based on past behavior is widely used in alerting systems, where a prediction mechanism provides not only a predicted single value for a future timestamp but an additional range around the value considered as the model estimation on the possible error around the prediction. It is important that this uncertainty range will be estimated efficiently for the system to provide valuable anomaly detections. Using too wide of a range will make the prediction not useful, while making the range too narrow will result in many anomalies


SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.


Methods, systems, apparatuses, and computer-readable storage mediums described herein are configured to provide dynamic thresholds for alerting users of anomalous resource usage of computing resources. The dynamic thresholds may be based on the historical behavior of compute metrics (or a time series obtained therefor) associated with the computing resources and a detected seasonality in that time series. The seasonality is detected based on an analysis of several, different time series combinations that are based on the original time series, which advantageously increases the probability of successful seasonality detection. Based on characteristics of the time series, a model for generating dynamic thresholds may be determined. The dynamic thresholds track the detected seasonality of the compute metrics, rather than being a static (or straight-line) threshold. As utilization of the computing resources continue, the determined thresholds are applied to the compute metrics. If the determined thresholds are exceeded, an alert indicating an anomalous resource usage (which may be indicative of an issue with respect to the computing resource(s)) may be provided to a user. The dynamic threshold may be adjusted (e.g., tightened or relaxed) based on a confidence level of the detected seasonality. This advantageously reduces the number of false alerts.


Further features and advantages, as well as the structure and operation of various example embodiments, are described in detail below with reference to the accompanying drawings. It is noted that the example implementations are not limited to the specific embodiments described herein. Such example embodiments are presented herein for illustrative purposes only. Additional implementations will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.





BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate example embodiments of the present application and, together with the description, further serve to explain the principles of the example embodiments and to enable a person skilled in the pertinent art to make and use the example embodiments.



FIG. 1 shows a block diagram of an example network-based computing system configured to dynamically determine a threshold for alerting users of anomalous resource usage in accordance with an example embodiment.



FIG. 2 is a block diagram of a system for determining a seasonal pattern in a time series for a particular metric in accordance with an example embodiment.



FIG. 3 shows a flowchart of a method for determining a seasonal pattern in a time series in accordance with an example embodiment.



FIG. 4 is a block diagram of a system for determining a model for generating dynamic thresholds for a particular metric in accordance with an embodiment.



FIG. 5 shows a flowchart of a method for selecting a modeler for generating dynamic thresholds in accordance with an example embodiment.



FIG. 6 is a block diagram of a seasonal adjusted boxplot-based modeler in accordance with an example embodiment.



FIG. 7 shows a graph depicting a seasonal pattern in accordance with an example embodiment.



FIG. 8 depicts a graph showing a minimum threshold and a maximum threshold generated based on a seasonal pattern in accordance with an example embodiment.



FIG. 9 shows a flowchart of a method for generating dynamic thresholds using a seasonal adjusted boxplot-based modeler in accordance with an example embodiment.



FIG. 10 is a block diagram of a Box-Cox transformation-based modeler in accordance with an example embodiment.



FIG. 11A depicts a graph that shows a time series and a seasonal pattern in accordance with an example embodiment.



FIG. 11B depicts a graph that shows residual data obtained as a result of a seasonal pattern being removed from a time series in accordance with an example embodiment.



FIG. 11C depicts a graph showing transformed residual data in accordance with an example embodiment.



FIG. 12 shows a flowchart of a method for generating dynamic thresholds using a Box-Cox transformation-based modeler in accordance with an example embodiment.



FIG. 13 shows a flowchart of a method for issuing alerts indicative of anomalous resource usage based on dynamic thresholds in accordance with an example embodiment.



FIG. 14 is a block diagram of a dynamic threshold-based alert engine in accordance with an embodiment.



FIGS. 15A and 15B depict graphs illustrating relationships between a testing error and a training error in accordance with example embodiments.



FIG. 16 is a flowchart of a method for adjusting a dynamic threshold in accordance with an example embodiment.



FIG. 17 is a block diagram of a dynamic threshold-based alert engine in accordance with another embodiment.



FIG. 18 shows a flowchart of a method for adjusting a dynamic threshold in accordance with an example embodiment.



FIG. 19 is a block diagram of a dynamic threshold-based alert engine in accordance with a further embodiment.



FIG. 20 is a block diagram of an example processor-based computer system that may be used to implement various embodiments.





The features and advantages of the implementations described herein will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.


DETAILED DESCRIPTION
I. INTRODUCTION

The present specification and accompanying drawings disclose numerous example implementations. The scope of the present application is not limited to the disclosed implementations, but also encompasses combinations of the disclosed implementations, as well as modifications to the disclosed implementations. References in the specification to “one implementation,” “an implementation,” “an example embodiment,” “example implementation,” or the like, indicate that the implementation described may include a particular feature, structure, or characteristic, but every implementation may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same implementation. Further, when a particular feature, structure, or characteristic is described in connection with an implementation, it is submitted that it is within the knowledge of persons skilled in the relevant art(s) to implement such feature, structure, or characteristic in connection with other implementations whether or not explicitly described.


In the discussion, unless otherwise stated, adjectives such as “substantially” and “about” modifying a condition or relationship characteristic of a feature or features of an implementation of the disclosure, should be understood to mean that the condition or characteristic is defined to within tolerances that are acceptable for operation of the implementation for an application for which it is intended.


Furthermore, it should be understood that spatial descriptions (e.g., “above,” “below,” “up,” “left,” “right,” “down,” “top,” “bottom,” “vertical,” “horizontal,” etc.) used herein are for purposes of illustration only, and that practical implementations of the structures described herein can be spatially arranged in any orientation or manner.


Numerous example embodiments are described as follows. It is noted that any section/subsection headings provided herein are not intended to be limiting. Implementations are described throughout this document, and any type of implementation may be included under any section/subsection. Furthermore, implementations disclosed in any section/subsection may be combined with any other implementations described in the same section/subsection and/or a different section/subsection in any manner


II. EXAMPLE IMPLEMENTATIONS

Embodiments described herein provide dynamic thresholds for alerting users of anomalous resource usage of computing resources. The dynamic thresholds may be based on the historical behavior of compute metrics (or a time series obtained therefor) associated with the computing resources and a detected seasonality in that time series. The seasonality is detected based on an analysis of several, different time series combinations that are based on the original time series, which advantageously increases the probability of successful seasonality detection. Based on characteristics of the time series, a model for generating dynamic thresholds may be determined. The dynamic thresholds track the detected seasonality of the compute metrics, rather than being a static (or straight-line) threshold. As utilization of the computing resources continue, the determined thresholds are applied to the compute metrics. If the determined thresholds are exceeded, an alert indicating an anomalous resource usage (which may be indicative of an issue with respect to the computing resource(s)) may be provided to a user. The dynamic threshold may be adjusted (e.g., tightened or relaxed) based on a confidence level of the detected seasonality. This advantageously reduces the number of false alerts.


The foregoing techniques advantageously enable the automatic detection of seasonal behavior and automatically set the thresholds such that an alert will be triggered only on deviation from the expected seasonal behavior. For example, alerts based on dynamic thresholds will not be triggered if a service is regularly idle on the weekends and then spikes every Monday. The techniques described herein recognize this seasonality and generate the dynamic thresholds based thereon. Static thresholds, on the other hand, are not very effective for such seasonal metrics. Instead, static thresholds issue alerts during spikes caused by seasonal behaviors, and as a result, unnecessary diagnostics are performed on the associated compute resource. This in turn causes significant downtime with respect to the compute resource. Accordingly, the techniques described herein improve the functionality of a system in which such compute resources are included, as any issues with the compute resources are accurately detected (and thus resolvable), while also avoiding unnecessary downtime due from false positives.


Moreover, the embodiments described herein improve the functioning of the computing devices for which the metrics are being obtained. For instance, conventional techniques that utilize static thresholds may mask legitimate issues. For instance, if the static threshold is set large enough to accommodate a large seasonal spike, then anomalous behaviors may go undetected. As such, a user may never be alerted when such a behavior occurs and subsequently remedy the issue. This may have a detrimental effect on the computing device. For instance, the computing device may be suffering from abnormal memory usage and/or network usage, which would go unnoticed by the user. Accordingly, the computing device may operate much more slowly and/or may be unable to properly handle requests. In contrast, because the embodiments described herein dynamically track metrics based on their seasonality, such a situation is avoided.


For example, FIG. 1 shows a block diagram of an example network-based computing system 100 configured to dynamically determine a threshold for alerting users of anomalous resource usage, according to an example embodiment. As shown in FIG. 1, system 100 includes a plurality of clusters 102A, 102B and 102N. A computing device 104 is communicatively coupled with system 100 via a network 116. Furthermore, each of clusters 102A, 102B and 102N are communicatively coupled to each other via network 116, as well as being communicatively coupled with computing device 104 through network 116. Network 116 may comprise one or more networks such as local area networks (LANs), wide area networks (WANs), enterprise networks, the Internet, etc., and may include one or more of wired and/or wireless portions.


Clusters 102A, 102B and 102N may form a network-accessible server set. Each of clusters 102A, 102B and 102N may comprise a group of one or more nodes and/or a group of one or more storage nodes. For example, as shown in FIG. 1, cluster 102A includes nodes 108A-108N and one or more storage nodes 110, cluster 102B includes nodes 112A-112N, and cluster 102N includes nodes 114A-114N. Each of nodes 108A-108N, 112A-112N and/or 114A-114N are accessible via network 116 (e.g., in a “cloud-based” embodiment) to build, deploy, and manage applications and services. Each of storage node(s) 110 comprises a plurality of physical storage disks 122 that is accessible via network 116 and is configured to store data associated with the applications and services managed by nodes 108A-108N, 112A-112N, and/or 114A-114N.


In an embodiment, one or more of clusters 102A, 102B and 102N may be co-located (e.g., housed in one or more nearby buildings with associated components such as backup power supplies, redundant data communications, environmental controls, etc.) to form a datacenter, or may be arranged in other manners. Accordingly, in an embodiment, one or more of clusters 102A, 102B and 102N may be a datacenter in a distributed collection of datacenters.


Each of node(s) 108A-108N, 112A-112N and 114A-114N may be configured to execute one or more software applications (or “applications”) and/or services and/or manage hardware resources (e.g., processors, memory, etc.), which may be utilized by users (e.g., customers) of the network-accessible server set. Node(s) 108A-108N, 112A-112N and 114A-114N may also be configured for specific uses. For example, as shown in FIG. 1, node 108A is configured to execute a dynamic threshold-based alert engine 118. It is noted that cluster 102B and/or cluster 102N may also include storage node(s) 110.


Dynamic threshold-based alert engine 118 may be configured to determine dynamic thresholds for alerting users of anomalous resource usage of resources maintained by system 100. For instance, a monitor may obtain metrics associated with resources, such as, but not limited to, operating systems, applications, services executing on one or more of nodes 108A-108N, 112A-112N and/or 114A-114N, hardware and virtual resources maintained by the network-accessible server set (e.g., nodes 108A-108N, 112A-112N and/or 114A-114N, virtual machines, central processor units (CPUs), storage (e.g., storage disks 122), memories, etc.), and/or I/O network bandwidth, power, etc., associated therewith. The metrics may represent numerical data values that describe an aspect of such resources at a particular point of time. For example, the metrics may represent CPU usage, a number of requests issued by a particular application or service, memory or storage utilization, etc. Such metrics may be collected at regular intervals (e.g., each second, each minute, each hour, each day, etc.) and may be aggregated as a time series (i.e., a series of data points indexed in time order). The monitor may collect multiple days or weeks of worth data to obtain the historical behavior of the metric. The time series for each metric may be stored in a storage, such as storage disks 122.


Dynamic threshold-based alert engine 118 may analyze the historical behavior of the metric (i.e., the time series) to determine a seasonal pattern (i.e., a seasonality) therein. A seasonal pattern is a characteristic of the time series in which the data experiences regular or predictable changes that occur at particular time interval, such as hourly, daily, weekly, etc. Examples of seasonal patterns include, but are not limited to, increased network traffic on weekdays than compared to weekends, increased network traffic during business hours than compared to non-business hours, a daily spike in CPU and/or storage utilization (e.g., due to a backup process), etc. The foregoing may be determined by generating several different time series combinations based on the original time series. Additional details regarding determining a seasonal pattern is described below in Subsection A.


The historical time series and/or determined seasonal pattern for a given metric may be utilized by a model selector, which is configured to automatically select a modeler for generating dynamic thresholds with regards to the metric. The model selector may utilize the determined seasonal pattern and/or the diversity of values of the metric to determine which model best fits. Examples of modelers include, but are not limited to, a low dispersion-based modeler, a seasonal adjusted boxplot-based modeler and a Box-Cox transformation-based modeler. The selected modeler is utilized to determine the dynamic thresholds for the metric. Additional details regarding the model selector is described below in Subsection B.


As the computing resources continue to operate, the monitor continues to obtain computing metrics associated with such resources. The determined thresholds are applied to such computing metrics. If the determined thresholds are exceeded, an alert indicating anomalous resource usage with respect to the computing resource(s) may be provided to a user (e.g., via computing device 104).


A user may access dynamic threshold-based alert engine 118 via computing device 104, for example to enable dynamic threshold generation and/or to receive anomalous resource usage alerts. As shown in FIG. 1, computing device 104 includes a display screen 124 and a browser 126. A user may access dynamic threshold-based alert engine 118 by interacting with an application at computing device 104 capable of accessing dynamic threshold-based alert engine 118. For example, the user may use browser 126 to traverse a network address (e.g., a uniform resource locator) to dynamic threshold-based alert engine 118, which invokes a user interface 128 (e.g., a web page) in a browser window rendered on computing device 104. By interacting with the user interface, the user may utilize dynamic threshold-based alert engine 118 to enable dynamic threshold generation and/or to receive anomalous resource usage alerts. Computing device 104 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., a Microsoft® Surface® device, a laptop computer, a notebook computer, a tablet computer such as an Apple® iPad™, a netbook, etc.), a wearable computing device (e.g., a head-mounted device including smart glasses such as Google® Glass™, etc.), or a stationary computing device such as a desktop computer or PC (personal computer).


A. Seasonal Pattern Detection



FIG. 2 is a block diagram of a system 200 for determining a seasonal pattern in a time series for a particular metric in accordance with an embodiment. For example, as shown in FIG. 2, system 200 includes a monitor 202, a seasonality detector 204, resources 218, and storage 214. Seasonality detector 204 includes a clipper 206, one or more time filters 208, a combiner 210, and a transformer 212. Each of monitor 202, seasonality detector 204, clipper 206, time filter(s) 208, combiner 210 and transformer 212 may be included in dynamic threshold-based alert engine 118, as described above with reference to FIG. 1.


Monitor 202 may obtain metrics associated with resources 218, such as, but not limited to, operating systems, applications, services executing on one or more of nodes 108A-108N, 112A-112N and/or 114A-114N, hardware and virtual resources maintained by the network-accessible server set (e.g., nodes 108A-108N, 112A-112N and/or 114A-114N, virtual machines, central processor units (CPUs), storage (e.g., storage disks 122), memories, etc.), and/or I/O, network bandwidth, power, etc., associated therewith. Such metrics may be collected at regular intervals and may be aggregated as a time series 216. Monitor 202 may collect multiple days or weeks of worth data to obtain the historical behavior of the metric. The time series for each metric may be stored in storage 214, which may be an example of storage disk(s) 122.


Seasonality detector 204 may be configured to analyze the historical behavior of the metric (i.e., time series 216) to determine a seasonal pattern (i.e., a seasonality) therein. Using known techniques to detect seasonality in a time series is problematic in many real-world scenarios due to noise in the metric that prevents the seasonality from being detected (e.g., by using Fast Fourier Transforms (FFTs)). To overcome this, seasonality detector 204 may generate several different time series combinations that are generated based on the original time series (e.g., time series 216)), which advantageously increases the probability to detect the seasonality. An FFT may be applied to each combination to detect seasonality for each of the generated combinations. The foregoing may be performed using unsupervised machine learning-based techniques, which utilize time series 216 during a training phase in which the seasonal pattern is detected. In accordance with an embodiment, the training phase may be performed approximately every 24 hours. In accordance with such an embodiment, data newly available since the last training phase is added while trailing data is omitted. In accordance with a further embodiment, the history time span used for training may be 10 days, except when weekly seasonality is detected, in which case 28 days of historic span is used. As will be described below, once the seasonal pattern is detected (e.g., via the training phase), dynamic thresholds may be generated based thereon, and the dynamic thresholds may be applied to current compute metrics to detect anomalous behavior. Such techniques may continuously learn a particular metric's behavior and adapts to metric changes. That is, the seasonal pattern and amount of data used for training determined for a particular metric may change over time as the behavior of the metric being monitored changes.


Each time series combination may be generated based on a combination of one or more parameters. The parameter(s) may include, but are not limited to, a clipped (or non-clipped) version of the time series) and/or one or more filtered versions of the non-clipped and/or clipped version of the time series, where the time series are filtered based on different window sizes.


For instance, time series 216 may be clipped by clipper 206. Clipper 206 may be configured to remove outlying data points of time series 216 (e.g., to remove spikes in the metric). For instance, clipper 206 may be configured to remove a certain percentage of the highest and lowest values of time series 216 (e.g., 5% of the highest and lowest values) to generate a clipped time series 220. Each of time series 216 and clipped time series 220 may be provided to time-based filter(s) 208.


Time-based filter(s) 208 may be configured to perform a filtering (or smoothing) function on time series 216 based on different window sizes. The filtering function is configured to reduce the noise in the metric, while preserving the seasonal pattern. The different window sizes may be computed to match the seasonal spans that are frequently recurring in time series 216 (e.g., hourly, daily, weekly, etc.). For instance, time-based filter(s) 208 may generate a first filtered time series 222 based on time series 216 in accordance with a first window size (e.g., hourly). In particular, time-based filter(s) 208 may generate first filtered time series 222 by performing a filtering function on time series 216 that, for each data point in time series 216, combines adjacent data points with the data point to determine an average value. The average values are used to generate first filtered time series 222. Time-based filter(s) 208 may generate a second filtered time series 224 and a third filtered time series 226 based on time series 216 in accordance with a second window size (e.g., daily) and a third window size (e.g., weekly), respectively, in a similar manner as described with reference to first filtered time series 222 However, when generating second filtered time series 224, time-based filter(s) 208 may combine adjacent points for a given data point that are further in vicinity than the adjacent data points utilized to generate first filtered time series 222. Similarly, when generating third filtered time series 226, time-based filter(s) 208 may combine adjacent points for a given data point that are further in vicinity than the adjacent data points utilized to generate second filtered time series 224.


Time-based filter(s) 228 may also be configured to perform a filtering function on clipped time series 220 based on different window sizes in a similar manner as described above. For instance, as shown in FIG. 2, time-based filter(s) 208 may generate a first filtered, clipped time series 228 for a first window size (e.g., hourly) based on a filtering function that, for each data point in clipped time series 220, combines adjacent data points with the data point to determine an average value, which are used to generate first filtered, clipped time series 228. Time-based filter(s) 208 may generate a second filtered, clipped time series 230 for a second window size (e.g., daily) based on a filtering function that, for each data point in clipped time series 220, combines adjacent points for a given data point of clipped time series 220 that are further in vicinity than the adjacent data points utilized to generate first filtered, clipped time series 228. Time-based filter(s) 208 may generate a third filtered, clipped time series 232 for a third window size (e.g., weekly) based on a filtering function that, for each data point in clipped time series 220, combines adjacent points for a given data point of clipped time series 220 that are further in vicinity than the adjacent data points utilized to generate second filtered, clipped time series 230. Time series 216, clipped time series 220, first filtered time series 222, second filtered time series 224, third filtered time series 226, first filtered, clipped time series 228, second filtered, clipped time series 230, and third filtered, clipped time series 232 may be provided to combiner 210


Combiner 210 may be configured to combine time series 216 with first filtered time series 222 to generate a first combined time series 234, combine time series 216 with second filtered time series 224 to generate a second combined time series 236, combine time series 216 with third filtered time series 226 to generate a third combined time series 238, combine clipped time series 220 with first filtered, clipped time series 228 to generate a fourth combined time series 240, combine clipped time series 220 with second filtered, clipped time series 230 to generate a fifth combined time series 242, and combine clipped time series 220 with third filtered, clipped time series 232 to generate a sixth combined time series 244. Combiner 210 may perform a Cartesian multiplication operation to perform the above-referenced combinations to generate first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244.


Transformer 212 may be configured to perform an FFT on each of time series 216, clipped time series 220, first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244 to detect seasonality for each time series 216, clipped time series 220, first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244. The foregoing process considerably increases the probability of detecting the seasonality in at least one of the generated time series. It is noted that transformer 212 attempts to find seasonality in original time series (i.e., time series 216) to not hinder the seasonality detection in the event that clipped time series 220 no longer includes the seasonality to do the clipping operation performed by clipper 206. If a seasonal pattern is detected, transformer 212 outputs a detected seasonal pattern 246. In the event that more than one seasonal pattern is detected (e.g., a daily seasonality and a weekly seasonality), the longest seasonal pattern (e.g., the weekly seasonality) is used to model the data because they are multiples of each other. When modeling, for example, a weekly seasonal pattern also having a daily seasonality, the entire daily seasonality is contained within the weekly seasonal pattern.


It is noted that the window sizes utilized by time-based filter(s) 208 are purely exemplary and that any window size (e.g., monthly, yearly, etc.) may be utilized to determine seasonality for different time frames (e.g., monthly, seasonality, yearly seasonality, etc.).


Accordingly, a seasonal pattern may be determined in a time series in many ways. For example, FIG. 3 shows a flowchart 300 of a method for determining a seasonal pattern in a time series in accordance with an example embodiment. In an embodiment, flowchart 300 may be implemented by system 200 shown in FIG. 2, although the method is not limited to that implementation. Accordingly, flowchart 300 will be described with continued reference to FIG. 2. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 300 and system 200 of FIG. 2.


Flowchart 300 begins with step 302. In step 302, a predetermined percentage of the highest and lowest values from a time series is removed to generate a clipped time series. For example, with reference to FIG. 2, clipper 206 may remove a predetermined percentage of the highest and lowest values (e.g., 5% of the highest and lowest values) from time series 216 to generate clipped time series 220.


In step 304, the time series is filtered in accordance with at least one window size to generate at least one filtered time series. For example, with reference to FIG. 2, time-based filter(s) 208 may filter time series 216 in accordance with at least one window size to generate at least one filtered time series. For instance, time-based filter(s) 208 may filter time series 216 in accordance with a first window size (e.g., hourly) to generate a first filtered time series 222, filter time series 216 in accordance with a second window size (e.g., daily) to generate a second filtered time series 224, and so on and so forth.


In step 306, the clipped time series is filtered in accordance with the at least one window size to generate at least one filtered, clipped time series. For example, with reference to FIG. 2, time-based filter(s) 208 may filter clipped time series 220 in accordance with at least one window size to generate at least one filtered, clipped time series. For instance, time-based filter(s) 208 may filter clipped time series 220 in accordance with a first window size (e.g., hourly) to generate a first filtered, clipped time series 228, filter clipped time series 220 in accordance with a second window size (e.g., daily) to generate a second filtered, clipped time series 230, and so on and so forth.


In step 308, the seasonal pattern is determined based on applying a respective transform to the time series, the clipped time series, a combination of the time series and the at least one filtered time series, and a combination of the clipped time series and the at least one filtered, clipped time series. For example, with reference to FIG. 2, transformer 212 may apply a respective transform to time series 216, clipped time series 220, a first combination of time series 216 and filtered time series 222 (i.e., first combined time series 234), a second combination of time series 216 and filtered time series 224 (i.e., second combined time series 236), a third combination of time series 216 and filtered time series 226 (i.e., third combined time series 238), a fourth combination of clipped time series 220 and filtered, clipped time series 228 (i.e., fourth combined time series 240), a fifth combination of clipped time series 220 and filtered, clipped time series 230 (i.e., fifth combined time series 242), and/or a sixth combination of clipped time series 220 and filtered, clipped time series 236 (i.e., sixth combined time series 244). The transform applied to each of time series 216, clipped time series 220, first combined time series 234, second combined time series 236, third combined time series 238, fourth combined time series 240, fifth combined time series 242, and sixth combined time series 244 may be an FFT. In the event that more than one seasonal pattern is detected (e.g., a daily seasonality determined based on an FFT applied to second combined time series 236 or fifth combined time series 242 and a weekly seasonality determined based on an FFT applied to third combined time series 238 or sixth combined time series 244), the longest seasonal pattern (e.g., the weekly seasonality) is used to model the data because they are multiples of each other.


B. Model Selection for Generating Dynamic Thresholds


Once seasonal pattern 246 is detected from time series 216 using the techniques described above with reference to Subsection A, seasonal pattern 246 and time series 216 (e.g., the most recent data values collected for time series 216) may be provided to a model selector to determine the optimal model for generating dynamic thresholds for a particular metric. For example, FIG. 4 is a block diagram of a system 400 for determining an optimal model for generating dynamic thresholds for a particular metric in accordance with an embodiment. As shown in FIG. 4, system 400 includes a model selector 402 and a model bank 404. Model bank 404 may comprise a memory that stores a plurality of different statistical modelers that are utilized to determine dynamic thresholds based on time series 216 and seasonal pattern 246. For example, as shown in FIG. 4, model bank 404 includes a low dispersion-based modeler 406, a seasonal adjusted boxplot-based modeler 408, and a Box-Cox transformation-based modeler 410. Each of model selector 402 and model bank 404 may be included in dynamic threshold-based alert engine 118, as described above with reference to FIG. 1.


Model selector 402 may be configured to perform a statistical analysis based on time series 216 and/or seasonal pattern 246 to automatically determine which modeler should be utilized to determine the automatic thresholds. For instance, model selector 402 may analyze the range, mean, variance, standard deviation, spread, etc., of time series 216 and/or seasonal pattern 246. Low dispersion modeler 406 may be selected if the analysis indicates that time series 216 and/or seasonal pattern 246 is relatively constant and/or rarely changes). If time series 216 and/or seasonal pattern 246 is relatively variable (e.g., has a sinusoidal pattern, high variance, etc.), model selector 402 may select one of a seasonal adjusted boxplot-based modeler 408 or Box-Cox transformation-based modeler 410. Model selector 402 may select Box-Cox transformation-based modeler 410 if time series 216 and/or seasonal pattern 246 has a majority of positive values (e.g., does not include values that are less than or equal to 0) and may select seasonal adjusted boxplot-based modeler 408 if time series 216 and/or seasonal pattern 246 includes values that are less than or equal to 0.


Box-Cox transformation-based modeler 410 has an inherent limitation dealing with non-positive values. Thus, non-positive values may be removed from time series 216 before applying Box-Cox transformation-based modeler 410. In some metrics, a substantial part of the data (more than 80%) is zeros. A common example for such metrics is a request count or network traffic for processes which are active only on a part of the day. In such situations, Box-Cox transformation-based modeler 410 performs poorly because too little values remain after removal of zeros. Accordingly, seasonal adjusted boxplot-based modeler 408 may be utilized in certain scenarios.


Generally, Box-Cox transformation-based modeler 410 generates more accurate thresholds than compared to seasonal adjusted boxplot-based modeler 408, as more data points are analyzed (as will be described below with reference to Subsection B.2). Seasonal adjusted boxplot-based modeler 408 may be utilized as a fallback modeler in the event that time series 216 and/or seasonal pattern 246 includes values that are less than or equal to 0.


Additional details regarding seasonal adjusted boxplot-based modeler 408 and Box-Cox transformation-based modeler 410 are described below with reference to Subsections B.1 and B.2, respectively.


After determining which modeler to utilize, the selected modeler automatically generates the dynamic thresholds. In the event that the analysis determines that none of the modelers are applicable, then no dynamic thresholds are automatically generated. It is noted that model bank 404 may store any number of modelers and that low dispersion-based modeler 406, seasonal adjusted boxplot-based modeler 408 and Box-Cox transformation-based modeler 410 are some examples of the modelers that may be utilized to generate thresholds.


Accordingly, a modeler for generating dynamic thresholds may be selected in many ways. For example, FIG. 5 shows a flowchart 500 of a method for selecting a modeler for generating dynamic thresholds in accordance with an example embodiment. In an embodiment, flowchart 500 may be implemented by system 400 shown in FIG. 4, although the method is not limited to that implementation. Accordingly, flowchart 500 will be described with continued reference to FIG. 4. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 500 and system 400 of FIG. 4.


Flowchart 500 begins with step 502. In step 502, a determination is made as to whether the data values of the time series are relatively constant or have a relatively high variance. For example, with reference to FIG. 4, model selector 402 may determine whether the data values of time series 216 are relatively constant or have a relatively high variance. If a determination is made that the data values of the time series are relatively constant, flow continues to step 504. Otherwise, flow continues to step 506.


In step 504, the low dispersion-based modeler is selected. For example, with reference to FIG. 4, model selector 402 selects low dispersion-based modeler 406.


In step 506, one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler is selected. For example, with reference to FIG. 4, model selector 402 selects one of seasonal adjusted boxplot-based modeler 408 or Box-Cox transformation-based modeler 410.


In accordance with one or more embodiments, selecting one of the seasonal adjusted boxplot-based modeler or the Box-Cox transformation-based modeler comprises determining whether the data values of the time series comprise non-positive values. Based at least on determining that the data values of the time series comprise non-positive values, the seasonal adjusted boxplot-based modeler is selected. Based at least on determining that the data values of the time series do not comprise non-positive values, the Box-Cox transformation-based modeler is selected. For example, with reference to FIG. 4, model selector 402 may select seasonal adjusted boxplot-based modeler 408 based at least on determining that the data values of the time series comprise non-positive values and may select Box-Cox transformation-based modeler 410 based at least on determining that the data values of the time series do not comprise non-positive values.


1. Seasonal Adjusted Boxplot Modeler



FIG. 6 is a block diagram of a seasonal adjusted boxplot-based modeler 600 in accordance with an example embodiment. Seasonal adjusted boxplot-based modeler 600 is an example of seasonal adjusted boxplot-based modeler 408, as shown in FIG. 4. As shown in FIG. 6, seasonal adjusted boxplot-based modeler 408 may include a bin-based time series generator 602, a threshold determiner 604, and a dynamic threshold generator 606.


Bin-based time series generator 602 may be configured to de-seasonalize seasonal pattern 246 to generate a plurality of different time series based on seasonal pattern 246. Each generated time series may be based on a particular bin (or bucket) associated with seasonal pattern 246. A bin may represent a particular time interval. For example, each bin may represent a particular hour of a given day. In this case, the number of bins would be 24 (1 bin for each hour of the day), and therefore, the number of time series generated would also be 24. In another example, each bin may represent a ten-minute interval. In a scenario in which seasonal pattern 246 comprises 7 days of data (or 10,008 minutes' worth of data), the number of bins would be 1,008, and therefore, the number of time series generated would also be 1,008. It is noted that any time interval may be utilized.


For a particular bin, bin-based time series generator 602 may determine each of value of the metric for that bin in seasonal pattern 246. For example, FIG. 7 shows a graph 700 depicting seasonal pattern 246 in accordance with an example embodiment. In the example shown in FIG. 7, seasonal pattern 246 includes 5 days' worth of data. For a first bin, bin-based time series generator 602 determines the values of seasonal pattern 246 located at the first bin. For a second bin, bin-based time series generator 602 determines the values of seasonal pattern 246 located at the second bin, and so on and so forth. In the example shown in FIG. 7, each bin represents a particular of the day. Because there are 5 days' worth of data, the number of data points collected for a particular bin is 5. For instance, with further reference to FIG. 7, suppose a first bin corresponds to 7:00 AM and a second bin corresponds to 1:00 PM. For the 7:00 AM bin, bin-based time series generator 602 determines the value of the metric at each 7:00 AM bin and generates a first time series based on the 5 values collected for the 7:00 AM bin. Similarly, for the 1:00 PM bin, bin-based time series generator 602 determines the value of the metric at each 1:00 PM bin and generates a second time series based on the 5 values collected for the 1:00 PM bin.


Referring again to FIG. 6, bin-based time series generator 602 provides the generated bin-based time series (shown as bin-based time series 608A-608N) to threshold determiner 604. Threshold determiner 604 may be configured to generate a minimum and/or maximum threshold for each of bin-based time series 608A-608N. For instance, for each of bin-based time series 608A-608N, threshold determiner 604 may determine its minimum value and generate a minimum threshold based thereon. Similarly, threshold determiner 604 may determine its maximum value and generate a maximum threshold based thereon. Threshold determiner 604 may utilize an adjusted boxplot algorithm to determine the minimum and/or maximum values and/or thresholds.


The most straightforward and popular method for generating thresholds is the “3-sigma” rule, according to which one estimates the sample mean and standard deviation and sets the boundaries at 3 standard deviations from the mean. However, it is well known that the sample mean and standard deviation are non-robust statistics, prone to significant errors in the presence of outliers.


To deal with this caveat, several methods based on robust statistics have been proposed. In accordance with an embodiment, threshold determiner 604 utilizes a modified version of Tukey's method (a.k.a. boxplot) to estimate the boundaries of the normal behavior of the data.


Given that P25 and P75 are the 25th and 75th percentiles of the data, the classic Tukey's test sets the higher and lower boundaries at P75+K·(P75−P25) and P25−K·(P75−P25), respectively. K in this equation is a predetermined factor, and usually K=1.5 defines boundaries for “mild outliers” and K=3 defines boundaries for “extreme outliers”.


In the modified version, since it is not very rare that the interquartile range (P75−P25) in telemetry data is 0, the inter-percentile range between the 90th and 10th percentiles (P90−P10) is used instead. Thus, the minimum and/or maximum thresholds may be set at P90+{tilde over (K)}·(P90−P10) and P10−{tilde over (K)}·(P90−P10), respectively.


The {tilde over (K)}(≈0.55) factor applied may be determined so that if the data were normally distributed, the minimum and/or maximum thresholds will be the same as those set by the classic Tukey's method with K=1.5. In addition, minimum and/or maximum thresholds for different detection sensitivity levels can be set by using a factor larger than {tilde over (K)}. In accordance with an embodiment, 1.5×{tilde over (K)} and 2×{tilde over (K)} for used for medium and extreme outliers, respectively. Furthermore, to account for possible skewness of the data, an adjusted boxplot may be used that adds a compensating factor that is a function of the medcouple.


Threshold determiner 604 provides the determined thresholds (shown as thresholds (thresholds 610A-610N) to dynamic threshold generator 606. Dynamic threshold generator 606 may be configured to compose (or combine) each of thresholds 610A-610N to generate a continuous, dynamic threshold 612, which tracks the seasonality of seasonal pattern 246. For instance, with reference to FIG. 8, FIG. 8 depicts a graph showing a minimum threshold 802 and a maximum threshold 804 generated based on seasonal pattern 246 in accordance with an example embodiment. As shown in FIG. 8, minimum threshold 802 and a maximum threshold 804 dynamically track the seasonality of seasonal pattern 246 (as opposed to static, straight-line thresholds).


Accordingly, a seasonal adjusted boxplot-based modeler may be utilized to generate dynamic thresholds in many ways. For example, FIG. 9 shows a flowchart 900 of a method for generating dynamic thresholds using a seasonal adjusted boxplot-based modeler in accordance with an example embodiment. In an embodiment, flowchart 900 may be implemented by system 600 shown in FIG. 6, although the method is not limited to that implementation. Accordingly, flowchart 900 will be described with continued reference to FIG. 6. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 900 and system 600 of FIG. 6.


Flowchart 900 begins with step 902. In step 902, a plurality of bins for the seasonal pattern is determined. For example, with reference to FIG. 6, bin-based time series generator 602 may determine a plurality of bins for seasonal pattern 246.


In step 904, for each bin of the plurality of bins, data values from the time series are selectively assigned to the bin. For example, with reference to FIG. 6, bin-based time series generator 602 selectively assigns data values from time series 216 to the bin.


In step 906, for each bin of the plurality of bins, a bin-based time series is generated based on the assigned data values for the bin. For example, with reference to FIG. 6, for each bin of the plurality of bins, bin-based time series generator 602 generates a bin-based time series (shown as bin-based time series 608A-608N) based on the assigned data values for the bin.


In step 908, for each bin of the plurality of bins, a bin-based threshold is determined based on the bin-based time series. For example, with reference to FIG. 6, threshold determiner 604 determines bin-based thresholds 610A-610N based on bin-based time series 608A-608N.


In step 910, each of the bin-based thresholds are combined to generate the threshold. For example, with reference to FIG. 6, dynamic threshold generator 606 combines bin-based thresholds 610A-610N to generate dynamic threshold 612.


2. Box-Cox Transformation-Based Modeler


When forecasting a seasonal metric, several values of the same seasonal phase are usually required to establish a forecasting threshold; since forecasting metrics based on past behavior requires estimating the variation of metric values. Variation cannot be computed on a single value or be reliable on very few measurements. In some seasonality spans, such as weekly patterns, taking few weeks back to build a forecasting model might be wrong in terms of adapting to changes in models. So, in such cases, the use of several seasons is simply misleading. There is a need to reduce the time span used in building the forecast model.


A different approach is to use the entire data to predict the variation and make it a constant variation in all the seasonal phases. This approach is also problematic since in many cases, the variation of service metrics is somewhat correlative to the signal values. For example, the variation of traffic data is higher during business hours than during the night or weekends. As a result, after decomposing the seasonal behavior of the data, the random (or noisy) components remain. Such components might have constant variance or variance that changes over time.


By applying Box-Cox transformation-based modeler 410 to the seasonal decomposed variants of the data, all the residuals are changed to a common space on which bandwidth of the forecast belt can be estimated. At that point, an adjusted boxplot can be applied to the residuals. Therefore, a backward (or inverse) transform is applied to go back to the original metric values to have a ready model with seasonally-adjusted boxplot thresholds. The foregoing advantageously enables the creation of a forecasting seasonal model by using a very few numbers of seasons.



FIG. 10 is a block diagram of a Box-Cox transformation-based modeler 1000 in accordance with an example embodiment. Box-Cox transformation-based modeler 1000 is an example of Box-Cox transformation-based modeler 410, as shown in FIG. 4. As shown in FIG. 10, Box-Cox transformation-based modeler 1000 may include a seasonal decomposer 1002, a variance stabilizer 1004, and a dynamic threshold generator 1006.


Seasonal decomposer 1002 may be configured to decompose the seasonal behavior of time series 216. For instance, seasonal decomposer 1002 may remove seasonal pattern 246 from time series 216. This may be performed by subtracting seasonal pattern 246 from time series 216. The remaining portion of time series 216 represents a natural random variation (or residual data) of time series 216. For instance, FIGS. 11A-11B depict graphs 1100A-1100B that show time series 216 and seasonal pattern 246. Graph 1100A shows time series 216 and seasonal pattern 246 superimposed thereon, before seasonal pattern 246 is removed. Graph 1100B shows residual data, which is the result of seasonal pattern decomposer 1002 removing seasonal pattern 246 from time series 216.


The issue is the residual data is that there is a varying degree of noise. When performing statistical analysis, it is desired to have the same level of noise all the time. Accordingly, a power transformation is performed to stabilize the noise in the residual data. For example, with reference to FIG. 10, seasonal decomposer 1002 provides residual data 1008 (which is represented in graph 1100B) to variance stabilizer 1004. Variance stabilizer 1004 performs a transform to stabilize the variance in noise in residual data 1008 to generate transformed residual data 1010. For instance, variance stabilizer 1004 may apply a non-linear, monotone transformation of residual data 1008. The transformation may be a Box-Cox transformation-based transformation. The transformation may include both a logarithmic and power transformation in accordance with Equation 1, which is shown below:











y
i

(
λ
)


:

=

{







y
i
λ

-
1

λ

,




λ

0







ln


(

y
i

)


,




λ
=
0









(

Equation





1

)







where yi and yi(λ) are the ith data point in the original and transformed scale, respectively. This transformation formula is defined as such so the transformation is continuous in λ as it approaches 0. The adequate λ for a given time series is estimated by maximizing the log likelihood function of the residuals given they are normally distributed. FIG. 11C depicts a graph 1100C showing transformed residual data 1010 (i.e., after seasonal pattern 246 is subtracted from time series 216 and after applying the Box-Cox-based transformation on residual data 1008). As shown in graph 1100C, there is no apparent change in the variation during the week (i.e., transformed residual data 1010 is more evenly distributed throughout the week than compared to residual data 1008).


Dynamic threshold generator 1006 may be configured generate a continuous, dynamic threshold 1012, which tracks the seasonality of seasonal pattern 246. For example, dynamic threshold generator 1006 may be configured to generate at least one of a minimum threshold and a maximum threshold for transformed residual data 1010. Because transformed residual data 1010 is relatively constant, the minimum and/or maximum thresholds may be relatively constant threshold(s). After determining the minimum and/or maximum thresholds, dynamic threshold generator 1006 may perform an inverse transformation on the minimum and maximum thresholds to reintroduce the variance (i.e., the transform performed by variance stabilizer 1004 is reversed). Thereafter, the transformed minimum and/or maximum thresholds are combined with seasonal pattern 246, thereby resulting in minimum and/or maximum thresholds (i.e., dynamic thresholds 1012) that tracks seasonal pattern 246.


Accordingly, a Box-Cox transformation-based modeler may be utilized to generate dynamic thresholds in many ways. For example, FIG. 12 shows a flowchart 1200 of a method for generating dynamic thresholds using a Box-Cox transformation-based modeler in accordance with an example embodiment. In an embodiment, flowchart 1200 may be implemented by system 1000 shown in FIG. 10, although the method is not limited to that implementation. Accordingly, flowchart 1200 will be described with continued reference to FIG. 10. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 1200 and system 1000 of FIG. 10.


Flowchart 1200 begins with step 1202. In step 1202, the seasonal pattern is decomposed from the time series to obtain residual data associated with the time series. For example, with reference to FIG. 10, seasonal decomposer 1002 decomposes (e.g., removes) seasonal pattern 246 from time series 216 to obtain residual data 1008 associated with time series 216. For instance, FIG. 11B shows an example of residual data 1008 that is obtained from removing seasonal pattern 246 from time series 216.


In step 1204, a Box-Cox transform is applied to the residual data to stabilize a variance of the residual data to generate transformed residual data. For example, with reference with FIG. 10, variance stabilizer 1004 applies a Box-Cox transform to residual data 1008 to stabilize the variance of residual data 1008 to generate transformed residual data 1010. For instance, FIG. 11C shows an example of transformed residual data 1010.


In step 1206, a transformed residual data-based threshold is determined based on the transformed residual data. For example, with reference to FIG. 10, dynamic threshold generator 1006 may generate at least one transformed residual data-based threshold (e.g., a minimum and/or a maximum threshold) based on transformed residual data 1010.


In step 1208, an inverse Box-Cox transform is applied to the transformed residual data-based threshold to reintroduce the variance, thereby generating a transformed threshold. For example, with reference to FIG. 10, dynamic threshold generator 1006 may apply an inverse Box-Cox transform to the transformed residual data-based threshold to reintroduce the variance, thereby generating a transformed threshold.


In step 1210, the seasonal pattern is combined with the transformed threshold to generate the threshold. For example, with reference to FIG. 10, dynamic threshold generator 1006 may combine seasonal pattern 246 with the transformed threshold to generate at least one dynamic threshold 1012.


C. Method for Issuing Alerts Indicative of Anomalous Resource Usage Based on Dynamic Thresholds


Once the dynamic thresholds are determined for a particular metric, the determined dynamic thresholds may be utilized to determine whether the metric exhibits anomalous behavior (e.g., an excessive amount or abnormally low number of requests, an excessive usage of CPU cycles, memory and/or storage, etc.) as the corresponding computing resource(s) continue to operate. If the determined thresholds are exceeded, an alert indicating anomalous resource usage with respect to the computing resource(s) may be provided to a user.


For example, FIG. 13 shows a flowchart 1300 of a method for issuing alerts indicative of anomalous resource usage based on dynamic thresholds in accordance with an example embodiment. In an embodiment, flowchart 1300 may be implemented by dynamic threshold-based alert engine 1400 of FIG. 14, although the method is not limited to those implementations. Accordingly, flowchart 1300 will be described with reference to FIG. 14. FIG. 14 is a block diagram of dynamic threshold-based alert engine 1400 in accordance with an embodiment. As shown in FIG. 14, dynamic threshold-based alert engine 1400 comprises a monitor 1402, a seasonality detector 1404, a model selector 1406, and one or more modeler(s) 1408. Dynamic threshold-based alert engine 1400 is an example of dynamic threshold-based alert engine 118, as described above with reference to FIG. 1. Monitor 1402 and seasonality detector 1404 are examples of monitor 202 and seasonality detector 204, as described above with reference to FIG. 2. Model selector 1406 is an example of model selector 402, as described above with reference to FIG. 4. Modeler(s) 1408 are examples of low-dispersion-based modeler 406, seasonal adjusted boxplot-based modeler 408, and Box-Cox transformation-based modeler 410, as described above with reference to FIG. 4. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 1300 and dynamic threshold-based alert engine 1400.


Flowchart 1300 begins with step 1302. In step 1302, a time series of data values corresponding to a metric associated with a computing resource is obtained. For example, with reference to FIG. 14, monitor 1402 obtains a time series 1412 of data values corresponding to a metric associated with one or more resources 1410. Resource(s) 1410 are examples of resources 218, as described above with reference to FIG. 2. Time series 1412 is an example of time series 216, as described above with reference to FIG. 2.


In step 1304, a seasonal pattern in the time series is detected. For example, with reference to FIG. 14, seasonality determiner 1404 may detect a seasonal pattern 1414 in time series 1412. Seasonal pattern 1414 is an example of seasonal pattern 246, as described above with reference to FIG. 2.


In accordance with one or more embodiments, the seasonal pattern may be determined in accordance with system 200 of FIG. 2 and flowchart 300 of FIG. 3.


In step 1306, a statistical analysis of the time series is performed. For example, with reference to FIG. 14, model selector 1406 may perform a statistical analysis of time series 1412.


In step 1308, a modeler is selected from among a plurality of different modelers based on results of the statistical analysis. For example, with reference to FIG. 14, model selector 1406 may select a modeler from modeler(s) 1408 based on results of the statistical analysis.


In accordance with one or more embodiments, the plurality of different modelers comprises at least one of a low dispersion-based modeler, a seasonal adjusted boxplot-based modeler, or a Box-Cox transformation-based modeler. For example, with reference to FIG. 4, the plurality of different modelers comprises at least one of low dispersion-based modeler 406, seasonal adjusted boxplot-based modeler 408, or Box-Cox transformation-based modeler 410.


In accordance with one or more embodiments, the modeler may be selected in accordance with system 400 of FIG. 4 and flowchart 500 of FIG. 5


In step 1310, the selected modeler is utilized to generate a threshold based on the seasonal pattern. For example, with reference to FIG. 14, the selected modeler of modeler(s) 1408 is utilized to generate a dynamic threshold 1416. With reference to FIG. 6, in an embodiment in which the selected modeler is seasonal adjusted boxplot-based modeler 600, seasonal adjusted boxplot-based modeler 600 generates dynamic threshold 612 based on seasonal pattern 246. Dynamic threshold 612 may be generated in accordance with flowchart 900 of FIG. 9. With reference to FIG. 10, in an embodiment in which the selected modeler is Box-Cox transformation-based modeler 1000, Box-Cox transformation-based modeler 1000 generates dynamic threshold 1012 based on seasonal pattern 246. Dynamic threshold 1012 may be generated in accordance with flowchart 1200 of FIG. 12.


In step 1312, the metric associated with the computing resource is monitored to determine whether the metric exceeds the threshold. For example, with reference to FIG. 14, monitor 1402 may monitor the metric associated with resource(s) 1410 to determine whether the metric exceeds dynamic threshold 1416. For instance, monitor 1402 may monitor one or more data points indicative of the metric and compare the data points to dynamic threshold 1416 to determine whether such data point(s) exceed dynamic threshold 1416.


In step 1314, an indication is provided based at least on determining that the metric exceeds the threshold. For example, with reference to FIG. 14, monitor 1402 may provide an indication 1418 based at least on determining that the metric exceeds dynamic threshold 1416. For instance, with reference to FIG. 8, if the metric exceeds minimum threshold 802 or maximum threshold 804, monitor 1404 may provide indication 1418.


In accordance with one or more embodiments, providing the indication includes issuing an alert. The alert (e.g., indication 1418) may be issued to computing device 104, as shown in FIG. 1. Examples of indication 1418 include, but are not limited to an e-mail message, a phone call, a text message, a short messaging service (SMS) message, and/or the like. Monitor 1402 may further provide an indication that indicate which of the data point(s) are anomalous (i.e., exceeded dynamic threshold 1416).


In accordance with one or more embodiments, indication 1418 may trigger one or more actions to be performed with respect to resources 1410. For instance, additional resources of resources 1410 may be automatically allocated to handle an excessive amount of network requests, CPU usage, memory usage, etc. to compensate for the anomalous behavior.


D. Adjusting Dynamic Thresholds Based on Confidence Approximation


The dynamic threshold(s) described above may suffer from overfitting (i.e., the threshold(s) may conform too closely to the seasonality detected during the training phase). If the metric values being monitored conform to this threshold, then no issue arises. However, in many cases, the monitored metric values tend to differ from the metric values collected during the training phase. As such, this may cause false alerts to be issued. This issue becomes more apparent the more complex the detected seasonality.


In accordance with an embodiment, the dynamic threshold(s) may be adjusted based on a confidence level of the seasonality pattern detected (i.e., how confident that the seasonality pattern is accurate). Generally, there is a gap between the baseline training error and the baseline-applied period error. The baseline training error may be represented by the interpercentile range (IPR) calculated for the metric values obtained during the training phase. The IPR may be based on the residuals of the medians of the metric values collected during the training phase. The IPR is the bandwidth around the medians that are observed during the training phase. The dynamic threshold(s) described above are based on this IPR. The baseline-applied period error is represented by the IPR estimated for metric values to be received after the training phase on which the dynamic threshold(s) are applied. The gap may be referred to as the optimism (i.e., the optimism about the performance of the dynamic threshold(s)).


Intuitively, the more confident the algorithm of the baseline accuracy, the tighter the dynamic threshold(s) that are generated. Formally, the thresholds are a function of the baseline and a distance between it and the training data. Seasonal patterns are a predictable change in a metric baseline in a set interval, such as hourly, daily or weekly patterns. As described above, detecting seasonal patterns reduces false positives and reduces the number of rules needed to capture metric behavior and its implication on the health of a resource. The higher order of the seasonal pattern detected increases model complexity. The influence of the number of samples (available metric history) and model complexity on the gap between the training error and the applied period error (also referred to as the test error) is visualized in FIGS. 15A and 15B.


To adjust the dynamic threshold(s), an IPR is estimated for data received after the training phase. The IPR may be determined accordance with Equations 2-4, which are described below:










IPR
test




IPR
train

+

2


IPR
train



period
N







(

Equation





2

)







IPR
test




(

1
+

2


period
N



)



IPR
train






(

Equation





3

)








IPR
test


IPR
train




(

1
+

2


period
N



)





(

Equation





4

)







where IPRtest represents the IPR estimated for data received after the training phase, IPRtrain represents the IPR estimated for data receiving during the training phase, period represents the detected seasonality (e.g., weekly, daily, hourly, etc.), and N represents the number of non-null (e.g., non-zero) samples of the observed metric used to determine the seasonality.


Since the IPR measurement is more robust (less likely to overfit), the constant value ‘2’ is replaced with a smaller value. In accordance with an embodiment, a constant value of ‘1.75’ is used, as shown below in Equation 5:










IPR
test




(

1
+


1
.
7


5


period
N



)



IPR
train






(

Equation





5

)







The value in the parenthetical of Equation 5 may be referred to as the IPR factor. Accordingly, to determine IPRtest, IPRtrain is multiplied by the IPR factor. The adjusted dynamic threshold(s) are based on IPRtest. In particular, the dynamic threshold(s) are either tightened or relaxed based on IPRtest. A relatively smaller value for IPRtest is indicative a greater confidence of the detected seasonal pattern, and a relatively larger value for IPRtest is indicative of a lesser confidence of the detected seasonal pattern. In other words, the confidence is represented by the ratio between how many samples are utilized and the complexity of the seasonality. If the seasonality is relatively complex, but the number of samples are relatively low, the confidence will be lower. If the seasonality is relatively simple, and the number of samples is relatively high, the confidence will be higher. The higher the confidence, the tighter the dynamic threshold(s) will be adjusted. The lower the confidence, the looser the dynamic threshold(s) will be adjusted.


Consider the following example, in which 10 days of training data are utilized and are sampled every 5 minutes. In accordance with Equation 5, a non-seasonal pattern (i.e., period=1) would result in an IPR factor of approximately 1, an hourly seasonal pattern would result in an IPR factor of approximately 1, a daily seasonal pattern would result in an IPR factor of approximately 1.175, a first weekly seasonal pattern (e.g., 6 weeks) would result in an IPR factor of approximately 1.291, and a second weekly seasonal pattern (e.g., 3 weeks) would result in an IPR factor of approximately 1.583. Accordingly, the more complex the seasonal pattern, the greater the value of the IPR factor. The greater the value of the IPR factor, the more the dynamic threshold(s) are adjusted to be more relaxed.


The resulting IPR factor is multiplied with IPRtrain to determine IPRtest. The dynamic threshold(s) are determined based on the determined IPRtest.


It is noted that while the embodiments described herein disclose techniques for adjusting dynamic thresholds, the embodiments described herein are not so limited and such techniques may be utilized to adjust other entities.


Accordingly, a dynamic threshold may be smoothed in many ways. For example, FIG. 16 shows a flowchart 1600 of a method for adjusting a dynamic threshold in accordance with an example embodiment. In an embodiment, flowchart 1600 may be implemented by dynamic threshold-based alert engine 1700 of FIG. 17, although the method is not limited to those implementations. Accordingly, flowchart 1600 will be described with reference to FIG. 17. FIG. 17 is a block diagram of dynamic threshold-based alert engine 1700 in accordance with an embodiment. Dynamic threshold-based alert engine 1700 is an example of dynamic threshold-based alert engine 1400 shown in FIG. 14. Dynamic threshold-based alert engine 1700 comprises monitor 1402, a seasonality detector 1404, a model selector 1406, and one or more modeler(s) 1408, as described above with reference to FIG. 14. Dynamic threshold-based alert engine 1700 also comprises a dynamic threshold adjustor 1702. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 1600 and dynamic threshold-based alert engine 1700.


Flowchart 1600 begins with step 1602. In step 1602, a dynamic threshold is generated based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource. For example, with reference to FIG. 17, modeler(s) 1408 generate a dynamic threshold 1416 based on seasonal pattern 1414 detected by seasonality determiner 1404. Seasonal pattern 1414 is detected in time series 1412 corresponding to a metric associated with resource(s) 1410.


In accordance with one or more embodiments, the dynamic threshold is generated during a training phase in which the seasonal pattern is detected in the time series of data values.


In step 1604, the generated dynamic threshold is adjusted based on a confidence level of the seasonal pattern. For example, with reference to FIG. 17, dynamic threshold adjuster 1702 adjusts dynamic threshold 1416 to generate an adjusted dynamic threshold 1704. Additional details regarding adjusting dynamic threshold 1416 are described below with reference to FIGS. 18 and 19.


In accordance with one or more embodiments, the confidence level is based on a number of data values in the time series and a period associated with the detected seasonal pattern.


In step 1606, the metric associated with the computing resource is monitored to determine whether the metric exceeds the adjusted dynamic threshold. For example, with reference to FIG. 16, monitor 1402 may monitor the metric associated with resource(s) 1410 to determine whether the metric exceeds adjusted dynamic threshold 1704. For instance, monitor 1402 may monitor one or more data points indicative of the metric and compare the data points to adjusted dynamic threshold 1704 to determine whether such data point(s) exceed adjusted dynamic threshold 1604.


In step 1608, an indication is provided based at least on determining that the metric exceeds the adjusted dynamic threshold. For example, with reference to FIG. 17, monitor 1402 may provide indication 1418 based at least on determining that the metric exceeds adjusted dynamic threshold 1704.


In accordance with one or more embodiments, providing the indication includes issuing an alert. The alert (e.g., indication 1418) may be issued to computing device 104, as shown in FIG. 1. Examples of indication 1418 include, but are not limited to an e-mail message, a phone call, a text message, a short messaging service (SMS) message, and/or the like. Monitor 1402 may further provide an indication that indicate which of the data point(s) are anomalous (i.e., exceeded adjusted dynamic threshold 1704).


In accordance with one or more embodiments, the indication causes an automatic allocation of additional computing resources.


In accordance with one or more embodiments, the adjustment of the generated dynamic threshold is based on statistical features associated with the time series of data values received during the training phase. For example, FIG. 18 shows a flowchart 1800 of a method for adjusting a dynamic threshold in accordance with an example embodiment. Flowchart 1800 may be implemented by a dynamic threshold-based alert engine 1900 of FIG. 19, although the method is not limited to those implementations. Accordingly, flowchart 1800 will be described with reference to FIG. 19. FIG. 19 is a block diagram of dynamic threshold-based alert engine 1900 in accordance with an example embodiment. Dynamic threshold-based alert engine 1900 is an example of dynamic threshold-based alert engine 1700, as described above with reference to FIG. 17. As shown in FIG. 19, dynamic threshold-based alert engine 1900 includes at least seasonality detector 1404, monitor 1402, modeler(s) 1408, and dynamic threshold adjuster 1702. Other components of dynamic threshold-based alert engine 1900 are not shown for purposes of brevity. As further shown in FIG. 19, dynamic threshold adjuster 1702 includes a statistical feature determiner 1902 and a confidence level determiner 1904. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 1800 and dynamic threshold-based alert engine 1900.


Flowchart 1800 begins with step 1802. In step 1802, a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected is determined, the generated dynamic threshold being determine based on the first statistical feature. For example, with reference to FIG. 19, modeler(s) 1408 generates dynamic threshold 1416 based on time series 1412. Dynamic threshold 1416 is determined based on the first statistical feature.


In step 1804, a second statistical feature for a subsequent time series of data values to be received after the training phase completes is estimated, the second statistical feature being based on the first statistical feature and the confidence level, the adjusted dynamic threshold being determined based on the second statistical feature. For example, with reference to FIG. 19, statistical feature determiner 1902 determines adjusted dynamic threshold 1704. The adjusted dynamic threshold is determined based on the second statistical feature. Statistical feature determiner 1902 estimates adjusted dynamic threshold 1704 based on a subsequent time series of data values to be received after the training phase completes based on dynamic threshold 1416 and a confidence level 1910 provided by confidence level determiner 1904. In accordance with an embodiment, step 1802 may be performed in accordance with Equation 5.


In accordance with one or more embodiments, the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern. For example, with reference to FIG. 19, confidence level determiner 1904 determines confidence level 1910 based on one or more of the number of data values in time series 1412 or a period associated with seasonal pattern 1414. For example, seasonality detector 1414 may provide an indication that indicates the period of seasonal pattern 1414.


In accordance with one or more embodiments, the first statistical feature is a first interpercentile range associated with the time series of data values received during the training phase, and the second statistical feature is a second interpercentile range that is estimated for a subsequent time series of data values to be received after the training phase. For example, with reference to FIG. 19, the first statistical feature (i.e., dynamic threshold 1416) is a first interpercentile range associated with time series 1412, and second statistical feature (i.e., adjusted dynamic threshold 1704) is estimated for a subsequent time series of data values to be received after the training phase.


In accordance with one or more embodiments, the generated dynamic threshold is adjusted a first amount based on the confidence level being relatively high, and wherein the generated dynamic threshold is adjusted a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount. For example, with reference to FIG. 19, statistical feature determiner 1902 of dynamic threshold adjuster 1702 adjusts dynamic threshold 1416 a first amount based on confidence level 1910 being relatively high and adjusts dynamic threshold 1416 a second amount based on confidence level 1910 being relatively low.


III. EXAMPLE COMPUTER SYSTEM IMPLEMENTATION


FIG. 20 depicts an example processor-based computer system 2000 that may be used to implement various embodiments described herein. For example, system 2000 may be used to implement any of nodes 108A-108B, 112A-112N, and/or 114A-114N, storage node(s) 110, computing device 104, and dynamic threshold-based alert engine 118 of FIG. 1, monitor 202, storage 214, seasonality determiner 204, clipper 206, time-based filter(s) 208, combiner 210, and transformer 212 of FIG. 2, model selector 402, model bank 404, low dispersion-based modeler 406, seasonal adjusted boxplot-based modeler 408, and Box-Cox transformation-based modeler 410 of FIG. 4, seasonal adjusted boxplot-based modeler 600, bin-based time series generator 602, threshold determiner 604, dynamic threshold generator 606 of FIG. 6, Box-Cox-based modeler 1000, seasonal decomposer 1002, variance stabilizer 1004, and dynamic threshold generator 1006 of FIG. 10, dynamic threshold-based alert engine 1400, monitor 1402, seasonality detector 1404, model selector 1406, and modeler(s) 1408 of FIG. 14, dynamic threshold-based alert engine 1700, monitor 1402, seasonality detector 1404, model selector 1406, modeler(s) 1408, and dynamic threshold adjuster 1702 of FIG. 17, and dynamic threshold-based alert engine 1900, monitor 1402, seasonality detector 1404, modeler(s) 1408, dynamic threshold adjuster 1702, statistical feature determiner 1902, and confidence level determiner 1904 of FIG. 19 and/or any of the components respectively described therein. System 2000 may also be used to implement any of the steps of any of the flowcharts of FIGS. 3, 5, 9, 10, 12, 13, 16, and 18 as described above. The description of system 2000 provided herein is provided for purposes of illustration, and is not intended to be limiting. Embodiments may be implemented in further types of computer systems, as would be known to persons skilled in the relevant art(s).


As shown in FIG. 20, system 2000 includes a processing unit 2002, a system memory 2004, and a bus 2006 that couples various system components including system memory 2004 to processing unit 2002. Processing unit 2002 may comprise one or more circuits, microprocessors or microprocessor cores. Bus 2006 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. System memory 2004 includes read only memory (ROM) 2008 and random access memory (RAM) 2010. A basic input/output system 2012 (BIOS) is stored in ROM 2008.


System 2000 also has one or more of the following drives: a hard disk drive 2014 for reading from and writing to a hard disk, a magnetic disk drive 2016 for reading from or writing to a removable magnetic disk 2018, and an optical disk drive 2020 for reading from or writing to a removable optical disk 2022 such as a CD ROM, DVD ROM, BLU-RAY™ disk or other optical media. Hard disk drive 2014, magnetic disk drive 2016, and optical disk drive 2020 are connected to bus 2006 by a hard disk drive interface 2024, a magnetic disk drive interface 2026, and an optical drive interface 2028, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer. Although a hard disk, a removable magnetic disk and a removable optical disk are described, other types of computer-readable memory devices and storage structures can be used to store data, such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.


A number of program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These program modules include an operating system 2030, one or more application programs 2032, other program modules 2034, and program data 2036. In accordance with various embodiments, the program modules may include computer program logic that is executable by processing unit 2002 to perform any or all of the functions and features of nodes 108A-108B, 112A-112N, and/or 114A-114N, storage node(s) 110, computing device 104, and dynamic threshold-based alert engine 118 of FIG. 1, monitor 202, storage 214, seasonality determiner 204, clipper 206, time-based filter(s) 208, combiner 210, and transformer 212 of FIG. 2, model selector 402, model bank 404, low dispersion-based modeler 406, seasonal adjusted boxplot-based modeler 408, and Box-Cox transformation-based modeler 410 of FIG. 4, seasonal adjusted boxplot-based modeler 600, bin-based time series generator 602, threshold determiner 604, dynamic threshold generator 606 of FIG. 6, Box-Cox-based modeler 1000, seasonal decomposer 1002, variance stabilizer 1004, and dynamic threshold generator 1006 of FIG. 10, dynamic threshold-based alert engine 1400, monitor 1402, seasonality detector 1404, model selector 1406, and modeler(s) 1408 of FIG. 14, and dynamic threshold-based alert engine 1700, monitor 1402, seasonality detector 1404, model selector 1406, modeler(s) 1408, dynamic threshold adjuster 1702 of FIG. 17, dynamic threshold-based alert engine 1700, monitor 1402, seasonality detector 1404, modeler(s) 1408, dynamic threshold adjuster 1702, statistical feature determiner 1902, and confidence level determiner 1904 of FIG. 19, and and/or any of the components respectively described therein, and/or any of the steps of any of the flowcharts of FIGS. 3, 5, 9, 10, 12, 13, 16, and 18 as described above. The program modules may also include computer program logic that, when executed by processing unit 2002, causes processing unit 2002 to perform any of the steps of any of the flowcharts of FIGS. 3, 5, 9, 10, 12, 13, 16, and 18 as described above.


A user may enter commands and information into system 2000 through input devices such as a keyboard 2038 and a pointing device 2040 (e.g., a mouse). Other input devices (not shown) may include a microphone, joystick, game controller, scanner, or the like. In one embodiment, a touch screen is provided in conjunction with a display 2044 to allow a user to provide user input via the application of a touch (as by a finger or stylus for example) to one or more points on the touch screen. These and other input devices are often connected to processing unit 2002 through a serial port interface 2042 that is coupled to bus 2006, but may be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB). Such interfaces may be wired or wireless interfaces.


Display 2044 is connected to bus 2006 via an interface, such as a video adapter 2046. In addition to display 2044, system 2000 may include other peripheral output devices (not shown) such as speakers and printers.


System 2000 is connected to a network 2048 (e.g., a local area network or wide area network such as the Internet) through a network interface 2050, a modem 2052, or other suitable means for establishing communications over the network. Modem 2052, which may be internal or external, is connected to bus 2006 via serial port interface 2042.


As used herein, the terms “computer program medium,” “computer-readable medium,” and “computer-readable storage medium” are used to generally refer to memory devices or storage structures such as the hard disk associated with hard disk drive 2014, removable magnetic disk 2018, removable optical disk 2022, as well as other memory devices or storage structures such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like. Such computer-readable storage media are distinguished from and non-overlapping with communication media (do not include communication media or modulated data signals). Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wireless media such as acoustic, RF, infrared and other wireless media. Embodiments are also directed to such communication media.


As noted above, computer programs and modules (including application programs 2032 and other program modules 2034) may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received via network interface 2050, serial port interface 2042, or any other interface type. Such computer programs, when executed or loaded by an application, enable system 2000 to implement features of embodiments discussed herein. Accordingly, such computer programs represent controllers of the system 2000. Embodiments are also directed to computer program products comprising software stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a data processing device(s) to operate as described herein. Embodiments may employ any computer-useable or computer-readable medium, known now or in the future. Examples of computer-readable mediums include, but are not limited to memory devices and storage structures such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs, zip disks, tapes, magnetic storage devices, optical storage devices, MEMs, nanotechnology-based storage devices, and the like.


In alternative implementations, system 2000 may be implemented as hardware logic/electrical circuitry or firmware. In accordance with further embodiments, one or more of these components may be implemented in a system-on-chip (SoC). The SoC may include an integrated circuit chip that includes one or more of a processor (e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits and/or embedded firmware to perform its functions.


IV. FURTHER EXAMPLE EMBODIMENTS

A method is described herein. The method includes: generating a dynamic threshold based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource; adjusting the generated dynamic threshold based on a confidence level of the detected seasonal pattern; monitoring the metric associated with the computing resource to determine whether the metric exceeds the adjusted dynamic threshold; and provide an indication based at least on determining that the metric exceeds the adjusted dynamic threshold.


In one implementation of the foregoing method, the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern.


In one implementation of the foregoing method, the generated dynamic threshold is adjusted a first amount based on the confidence level being relatively high, and the generated dynamic threshold is adjusted a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount.


In one implementation of the foregoing method, said adjusting comprises: determining a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected, the generated dynamic threshold being determined based on the first statistical feature; and estimating a second statistical feature for a subsequent time series of data values to be received after the training phase completes, the second statistical feature being determined based on the first statistical feature and the confidence level, the adjusted dynamic threshold being based on the second statistical feature.


In one implementation of the foregoing method, the first statistical feature is a first interpercentile range associated with the time series of data values received during the training phase, and the second statistical feature is a second interpercentile range that is estimated for a subsequent time series of data values received after the training phase.


In one implementation of the foregoing method, said providing the indication includes issuing an alert.


In one implementation of the foregoing method, the indication comprises at least one of: an e-mail message; a telephone call; or a short messaging service message.


In one implementation of the foregoing method, the indication causes an automatic allocation of additional computing resources.


A system in accordance with any of the embodiments described herein is also disclosed. The system includes: at least one processor circuit; and at least one memory that stores program code configured to be executed by the at least one processor circuit, the program code comprising: a modeler configured to generate a dynamic threshold based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource; a dynamic threshold adjuster configured to adjust the generated dynamic threshold based on a confidence level of the detected seasonal pattern; and a monitor configured to: monitor the metric associated with the computing resource to determine whether the metric exceeds the adjusted dynamic threshold; and provide an indication based at least on determining that the metric exceeds the adjusted dynamic threshold.


In one implementation of the foregoing system, the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern.


In one implementation of the foregoing system, the dynamic threshold adjustor is configured to adjust the generated dynamic threshold a first amount based on the confidence level being relatively high, and the dynamic threshold adjustor is configured to adjust the generated dynamic threshold a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount.


In one implementation of the foregoing system, the modeler is configured to determine a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected, the generated dynamic threshold being determined based on the first statistical feature; and the dynamic threshold adjuster comprises a statistical feature determiner that is configured to estimate a second statistical feature for a subsequent time series of data values to be received after the training phase completes, the second statistical feature being based on the first statistical feature and the confidence level, the adjusted dynamic threshold being determined based on the second statistical feature.


In one implementation of the foregoing system, the first statistical feature is a first interpercentile range associated with the time series of data values received during the training phase, and the second statistical feature is a second interpercentile range that is estimated for a subsequent time series of data values received after the training phase.


In one implementation of the foregoing system, the monitor is configured to provide the indication by issuing an alert.


In one implementation of the foregoing system, the indication comprises at least one of: an e-mail message; a telephone call; or a short messaging service message.


In one implementation of the foregoing system, the indication causes an automatic allocation of additional computing resources.


A computer-readable storage medium having program instructions recorded thereon that, when executed by at least one processor, perform a method. The method includes: generating a dynamic threshold based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource; adjusting the generated dynamic threshold based on a confidence level of the detected seasonal pattern; monitoring the metric associated with the computing resource to determine whether the metric exceeds the adjusted dynamic threshold; and provide an indication based at least on determining that the metric exceeds the adjusted dynamic threshold.


In one implementation of the foregoing computer-readable storage medium, the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern.


In one implementation of the foregoing computer-readable storage medium, the generated dynamic threshold is adjusted a first amount based on the confidence level being relatively high, and the generated dynamic threshold is adjusted a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount.


In one implementation of the foregoing computer-readable storage medium, said adjusting comprises: determining a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected, the generated dynamic threshold being determined based on the first statistical feature; and estimating a second statistical feature for a subsequent time series of data values to be received after the training phase completes, the second statistical feature being based on the first statistical feature and the confidence level, the adjusted dynamic threshold being determined based on the second statistical feature.


V. CONCLUSION

While various example embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art(s) that various changes in form and details may be made therein without departing from the spirit and scope of the embodiments as defined in the appended claims. Accordingly, the breadth and scope of the disclosure should not be limited by any of the above-described example embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims
  • 1. A method, comprising: generating a dynamic threshold based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource;adjusting the generated dynamic threshold based on a confidence level of the detected seasonal pattern;monitoring the metric associated with the computing resource to determine whether the metric exceeds the adjusted dynamic threshold; andprovide an indication based at least on determining that the metric exceeds the adjusted dynamic threshold.
  • 2. The method of claim 1, wherein the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern.
  • 3. The method of claim 1, wherein the generated dynamic threshold is adjusted a first amount based on the confidence level being relatively high, and wherein the generated dynamic threshold is adjusted a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount.
  • 4. The method of claim 1, wherein said adjusting comprises: determining a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected, the generated dynamic threshold being determined based on the first statistical feature; andestimating a second statistical feature for a subsequent time series of data values to be received after the training phase completes, the second statistical feature being based on the first statistical feature and the confidence level, the adjusted dynamic threshold being determined based on the second statistical feature.
  • 5. The method of claim 4, wherein the first statistical feature is a first interpercentile range associated with the time series of data values received during the training phase, and the second statistical feature is a second interpercentile range that is estimated for a subsequent time series of data values received after the training phase.
  • 6. The method of claim 1, wherein providing the indication includes issuing an alert.
  • 7. The method of claim 6, wherein the indication comprises at least one of: an e-mail message;a telephone call; ora short messaging service message.
  • 8. The method of claim 1, wherein the indication causes an automatic allocation of additional computing resources.
  • 9. A system, comprising: at least one processor circuit; andat least one memory that stores program code configured to be executed by the at least one processor circuit, the program code comprising: a modeler configured to generate a dynamic threshold based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource;a dynamic threshold adjuster configured to adjust the generated dynamic threshold based on a confidence level of the detected seasonal pattern; anda monitor configured to: monitor the metric associated with the computing resource to determine whether the metric exceeds the adjusted dynamic threshold; andprovide an indication based at least on determining that the metric exceeds the adjusted dynamic threshold.
  • 10. The system of claim 9, wherein the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern.
  • 11. The system of claim 9, wherein the dynamic threshold adjustor is configured to adjust the generated dynamic threshold a first amount based on the confidence level being relatively high, and wherein the dynamic threshold adjustor is configured to adjust the generated dynamic threshold a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount.
  • 12. The system of claim 9, wherein the modeler is configured to determine a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected, the generated dynamic threshold being determined based on the first statistical feature; and wherein the dynamic threshold adjuster comprises a statistical feature determiner that is configured to estimate a second statistical feature for a subsequent time series of data values to be received after the training phase completes, the second statistical feature being determined based on the first statistical feature and the confidence level, the adjusted dynamic threshold being based on the second statistical feature.
  • 13. The system of claim 12, wherein the first statistical feature is a first interpercentile range associated with the time series of data values received during the training phase, and the second statistical feature is a second interpercentile range that is estimated for a subsequent time series of data values received after the training phase.
  • 14. The system of claim 9, wherein the monitor is configured to provide the indication by issuing an alert.
  • 15. The system of claim 14, wherein the indication comprises at least one of: an e-mail message;a telephone call; ora short messaging service message.
  • 16. The system of claim 9, wherein the indication causes an automatic allocation of additional computing resources.
  • 17. A computer-readable storage medium having program instructions recorded thereon that, when executed by at least one processor, perform a method, the method comprising: generating a dynamic threshold based on a seasonal pattern detected in a time series of data values corresponding to a metric associated with a computing resource;adjusting the generated dynamic threshold based on a confidence level of the detected seasonal pattern;monitoring the metric associated with the computing resource to determine whether the metric exceeds the adjusted dynamic threshold; andprovide an indication based at least on determining that the metric exceeds the adjusted dynamic threshold.
  • 18. The computer-readable storage medium of claim 17, wherein the confidence level is based on one or more of a number of data values in the time series or a period associated with the detected seasonal pattern.
  • 19. The computer-readable storage medium of claim 17, wherein the generated dynamic threshold is adjusted a first amount based on the confidence level being relatively high, and wherein the generated dynamic threshold is adjusted a second amount based on the confidence level being relatively low, wherein the first amount is greater than the second amount.
  • 20. The computer-readable storage medium of claim 18, wherein said adjusting comprises: determining a first statistical feature associated with the time series of data values received during a training phase in which the seasonal pattern is detected, the generated dynamic threshold being determined based on the first statistical feature; andestimating a second statistical feature for a subsequent time series of data values to be received after the training phase completes, the second statistical feature being based on the first statistical feature and the confidence level, the adjusted dynamic threshold being determined based on the second statistical feature.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 62/878,997, filed Jul. 26, 2019, entitled “Confidence Approximation-based Dynamic Thresholds for Anomalous Computing Resource Usage Detection,” the entirety of which is incorporated by reference herein.

Provisional Applications (1)
Number Date Country
62878997 Jul 2019 US