The disclosure generally relates to the field of data processing, and more particularly to flap detection.
In various areas of computing, the rapid change in state of a system or system component, either software or hardware, typically corresponds to a problem. This rapid change in state is referred to as “flapping.” In addition to the problem causing the flapping, flapping itself can cause a high volume of notifications or alarms that may exacerbate the problem's impact on the system, perhaps further degrading system performance. Detecting flapping can lead to investigation of the cause of the flapping rather than investigating the individual state changes.
Embodiments of the disclosure may be better understood by referencing the accompanying drawings.
The description that follows includes example systems, methods, techniques, and program flows that embody embodiments of the disclosure. However, it is understood that this disclosure may be practiced without these specific details. For instance, this disclosure refers to arrays in multiple examples. Embodiments are not limited to using arrays and can use a different data structure to store values that allows the values to be accessed in forward and/or reverse order. In other instances, well-known instruction instances, protocols, structures and techniques have not been shown in detail in order not to obfuscate the description.
Although flapping typically relates to rapid change in state of a system, flapping can also occur in measurements of various resources. Although the rapid change in resource measurements can be considered a state change, this description refers to state changes and resources measurements separately to help explain possible differences in handling the detection of flapping in state or resource measurements. A system or system component state change typically relates to operability (e.g., device failure, connection lost, restart, sleep, etc.). A system/component often measures resources for determinations about performance, quality of service (“QoS”), etc. A change in state or resource measurements can relate to a condition or a threshold. Example changes include a component failure, installation of a component, a change in resource consumption with respect to a threshold or condition, and a change in a performance measurement.
A change in state or resource measurement can be accompanied by an alarm. An alarm can be a notification of a change and/or quantify the change with an alarm level. Since flapping in either state or measurements can result in a series of alarms with different alarm levels, flapping can also occur in alarm levels.
Sensors and/or components detect occurrence of changes and indicate the changes. The sensors and/or components can indicate the changes as events with any one of a variety of techniques: interrupt driven messaging, inter-process communication, publisher-subscriber messaging, and a posting mechanism (e.g., recording an event indication into a buffer). An event manager (e.g., an operating system process or executing application) can be programmed to process event indications differently. An event manager may present event indications (e.g., display event based information in a graphical user interface dashboard), implement corrective actions based on event indications, notify a component to take corrective action based on event indications, etc.
Overview
A flap detector can detect significant flapping using magnitudes of deltas. A delta is a value that represents a change. The delta is determined by computing a difference between values representing a system attribute being monitored (e.g., system/component states or resource measurements. As changes in a monitored system attribute (“monitored attribute”) occur, a series of deltas can be generated in different directions (e.g., increasing changes followed by decreasing changes). Consecutive deltas in a same direction are monotonic deltas. The flap detector aggregates monotonic deltas (e.g., adds the deltas). Aggregating monotonic deltas and disregarding direction yields a magnitude of monotonic deltas. A magnitude of a series of same direction deltas can be considered the magnitude of flap because the end of the series corresponds to a beginning of a delta series in a different direction (“directional transition”). When directional transition occurs (i.e., flapping occurs), the flap detector generates multiple monotonic delta magnitudes. The determined magnitudes can be used to filter out insignificant flapping that could be considered noise. The flap detector uses a first configurable threshold to identify the flaps that are significant. The flap detector can then use a second configurable threshold to determine whether a count of the significant flaps is significant. Although flaps may be significant in magnitude, the count of significant flaps may be too few to be considered significant. The flap detector can also aggregate the significant flap magnitudes to derive an event indication for the flapping in a given time window.
Example Illustrations
At operational stage A, the flap detector 103 determines deltas between resource measurements within a time window. The resource measurements 105 indicate throughput over a time range spanning from t1 to t16. For this example, the throughput is based on measurements taken at a particular network element for connections traversing the network element. Detecting flapping in throughput at a particular network element can help identify a problematic device in a network and help avoid violating a service level agreement. Table 1 indicates the throughput measurements depicted in the graph 107.
The flap detector 103 processes throughput measurements indicated in notifications for a defined time window. This illustration assumes the defined time window is 11 time instants, and the current time window encompasses time instants t3 to t13. A current throughput notification or most recent throughput notification corresponds to time instant t13. The time instants after t13 in
At operational stage B, the flap detector 103 determines magnitudes of monotonic throughput measurement deltas to detect throughput flaps. Assuming the history of deltas is available in an array of deltas, the flap detector 103 can traverse the array of deltas from the most recently computed delta (i.e., the delta between throughput measurements at t12 and t13) backwards in time until a directional transition is encountered (i.e., a change in delta sign). The flap detector 103 encounters a directional transition from negative for the delta between throughput measurements at t10 and t11 and positive for the delta between throughput measurements at t11 and t12. While traversing the entries, the flap detector 103 can accumulate a sum. The flap detector 103 determines that the delta between throughput measurements at t11 and t12 is a same sign as the delta between throughput measurements at t12 and t13 and computes a sum of 21.9 Mb/s. The flap detector 103 then determines that the delta between resource measurements at t10 and t11 is a negative sign, and terminates the sum accumulation. The sum represents the flap after al, which is 21.9 Mb/s in this case. The flap from t10 to t11 was −24.8 Mb/s. The largest decreasing flap was from t6 to t7 (−31.6 Mb/s), while the largest increasing flap was from t12 to t13. The flap detector can start computing sums of each series of monotonic deltas from t3 to t13. The first series of monotonic deltas (i.e., increasing series of deltas) correspond to the throughput measurements from t3 to t6 in which the throughput increases from 14.9 to 18 to 25.5 t 33.4. The next series of monotonic deltas include decreases in throughput measurements at t7 and t8 from 33.4 to 1.8 to 1.4. The throughput deltas in the time window from t3 to t13 include 5 monotonic series of deltas, which result in 5 flap magnitudes.
At operational stage C, the flap detector 103 filters throughput flaps. A flap magnitude threshold can be configured to filter out flaps considered to be insignificant by an administrator, for instance. Assuming a throughput flap magnitude threshold of 5 Mb/s, the flap detector 103 will disregard the flaps having a magnitude that does not exceed 5 Mb/s. Another threshold can be configured based on a number of significant flaps considered to be insignificant. An administrator may consider less than 2 flaps exceeding the flap magnitude threshold to be insignificant. The flap detector 103 counts the number of flap magnitudes that satisfy the flap magnitude threshold, and then determines whether that count satisfies a flap count threshold of 2. In this example, 5 of the computed flap magnitudes satisfy the flap magnitude threshold and this count exceeds the flap count threshold. Thus, the flap detector 103 determines that significant flapping has occurred in the time window from t3 to t13.
At operational stage D, the flap detector 103 generates a flapping notification based on throughput flap magnitudes. The flap detector 103 can communicate the flapping with a variety of information about the throughput flapping. For example, the flap detector 103 could generate the flapping notification to identify the network element and a flag or message that indicates flapping is occurring in throughput at the identified network element. The flap detector 103 could include the monotonic sums to show the direction and magnitude of flaps.
In this example illustration, the flap detector 103 would have started generating flapping notifications when the example flap count threshold (2 flaps) was exceeded at t8. The flap detector 103 or input parameters can be configured to avoid repeating flap notifications for a number of events and/or time period. For example, the flap detector 103 could be configured to discard or suppress a flapping notification if 2 flapping notifications for a particular type of event (e.g., throughput measurements) have been generated in the last 10 minutes.
The example illustration of
A flap detector detects a resource measurement (201). The flap detector may receive resource measurements or notifications of resource measurements. The flap detector may monitor a location at which resource measurements are stored or subscribe to receiving resource measurements for a particular resource.
The flap detector determines whether there are sufficient previous resource measurements relative to the detected resource measurement for flap detection (203). Since flapping occurs over a number of resource measurements generated over time, the flap detector determines whether there are sufficient historical resource measurements within a relative time window to evaluate for flap detection. For instance, a sufficient threshold may be configured to be 3 previous resource measurements within a 24 hour window preceding the detected resource measurements. If there are sufficient historical resource measurements within the time window, then the flap detector determines resource measurement deltas based on the detected resource measurement and historical resource measurements (205). If not, then the flap detector waits or terminates until a next resource measurement is detected. The flap detector may enter a sleep state or wait until a next resource measurements detected. In some cases, the flap detector may not be an ongoing process and may be invoked by another process when a resource measurement is detected.
The flap detector determines resource measurement deltas between successive resource measurements within the time window (205). The flap detector computes a delta between the detected resource measurement and a last detected resource measurement. The flap detector can then store this computed delta in a data structure of resource measurement deltas (e.g., array, linked list, table, etc.) and read the historical resource measurement deltas from the data structure. If previous deltas have not been computed because sufficient resource measurements had not yet been generated, then the flap detector can compute the deltas for the previous resource measurements that fall within the time window.
The flap detector determines sums of monotonic resource measurement delta series (“monotonic sums”) (207). The flap detector can begin at the beginning of the resource measurement deltas at the beginning of the time window and traverse the resource measurement deltas in temporal order. The flap detector accumulates a sum of deltas until it encounters a directional transition. At each directional transition, the flap detector begins to accumulate a new sum.
The flap detector determines whether any detected flapping in resource measurements is significant based on the monotonic sums (209). Since the monotonic sums are direction based, each monotonic sum corresponds to a flap. Due to the possibility of flaps that are not problematic, parameters can be set to filter out flaps. For example, an administrator may deem a flap magnitude less than 2 dropped packets or in a bottom quartile of possible processor frequency as insignificant for detecting resource flapping. In that case, the administrator can set a magnitude threshold accordingly. As previously mentioned, a count threshold can also be set to disregard a small number of flaps within a time window. If the flap detector determines that the flapping as represented by the monotonic sums does not satisfy conditions or exceed thresholds that define significance, then the flap detector terminates or waits until a next resource measurement. If there are no significant flaps, then the flap detector exits, sleeps, or returns to a calling process.
If the flap detector determines that the monotonic sums indicate significant flapping (209), then the flap detector generates a flapping notification based on the significant resource measurement flaps (211). The flap detector determines the magnitudes of the monotonic sums (i.e., absolute values of the monotonic sums) and can generate a value, flag, or message that indicates the resource measurement flapping. The particular technique for generating a value that represents an extent of resource measurement flapping can vary with the type of resource and/or component corresponding to the resource measurement. In addition, the notification of resource measurement flapping can be communicated with an alarm level. For instance, generation of a resource flapping alarm can be biased towards a higher alarm level for components or systems that are more sensitive to flapping of a particular resource.
The above examples refer to throughput flapping. As previously mentioned, magnitude-based flap detection can also be used to detect flapping in other resource measurements and/or performance measurements of a system. Table 2 indicates example latency measurements in milliseconds (ms) and corresponding values computed for magnitude based flap detection for a time window of t2 to t11.
For this example, a flap magnitude threshold has been configured to be 250 milliseconds. A flap detector would compute 3 flaps with magnitudes of 484 ms, 420 ms, and 220 ms. With the example flap magnitude threshold, the flap detector detects 2 significant latency flaps. Assuming that flap filtering does not employ a flap count threshold, the flap detector will generate a notification of the 2 significant flaps. The flap detector can generate a message with information about the 2 significant flaps. The flap detector could also generate a flap notification with a single representative value of the latency flapping. For example, the flap detector can compute an average of the significant flap magnitudes, which would be (484 ms+420 ms)/2=452 ms.
A flap detector detects a value for an event (301). Since an event can vary, notifications of events will use different metrics to indicate the event. For example, an event notification for resource consumption exceeding a threshold may indicate a value in terms of the amount of the resource consumed beyond the threshold at a time corresponding to the event or the amount of the resource consumed at the time of the event. As another example, an event notification may indicate a value in terms of a performance measurement at a time of an event (e.g., processor frequency at the time). The flap detector may receive an event notification with the value, may read the value from a preconfigured location, etc.
The flap detector computes a delta between the detected value and a preceding value and inserts the computed delta into a delta array (303). The flap detector may read the preceding value (e.g., a last detected value) from a time-ordered array of values. The flap detector can also insert the detected value into the time-ordered values array.
The flap detector determines whether the computed delta is in the same direction as the preceding delta (304). Since deltas have both magnitude and direction to indicate whether an attribute has been increasing or decreasing, the flap detector determines whether the computed delta has a same sign as the preceding delta in the delta array. A same direction indicates continuation of a monotonic series of deltas.
If the direction of the computed delta is the same as the previous delta (304), then the flap detector adds the computed delta to a monotonic sum that includes the previous delta (305). Since the monotonic series continues with the computed delta, then the computed delta can be added to the previously computed monotonic sum.
If the direction of the computed delta is not the same as the previous delta (304), then the flap detector uses the computed delta as a new monotonic sum (307). The flap detector could maintain a persistent data structure of monotonic sums and revise the sums that incorporate deltas at the beginning and the ending of a time window. The sums affected by the edges of the time window are revised to account for the deltas that fall outside of the time window and are newly introduced into the time window. The flap detector could, instead, compute the monotonic sums across the time window upon each flap detection trigger and maintain those for use for the particular trigger (“on-the-fly” monotonic sums).
After determination of a monotonic sum with the computed delta (307 or 305), the flap detector determines a number of monotonic sums that satisfy a flap magnitude threshold. As earlier mentioned, a threshold or condition can be set to filter out a flap with a magnitude that does not satisfy the threshold or the condition. The flap detector can traverse the determined monotonic sums and evaluate the magnitude of each monotonic sum against the condition or threshold. The flap counter can increment a counter for each magnitude that satisfies the magnitude threshold (“significant flap counter”).
The flap detector determines whether the number of monotonic sums that satisfy the flap magnitude threshold satisfies a flap count threshold (311). If the significant flap counter satisfies the flap count threshold, then the flap detector generates a notification of the significant flapping (313). The flap detector can generate the notification with information about the contributing events. The contributing events are those events that correspond to the flaps with a magnitude that satisfied the magnitude threshold. The information may identify the events and/or the values of the events. If the flap count threshold was not satisfied (311), then the flap detector terminates/exits or waits until a next event.
The above examples presume that deltas are stored for later retrieval and use after initial computation. However, a flap detector can compute deltas on-the-fly. An event management system, or similar system, will likely maintain the values from events and/or the event notifications in a database, archive, or other type of persistent store. When triggered, the flap detector can retrieve the event values within a time window and compute the deltas across those event values.
The above example illustrations also presume that event notifications indicate a numerical value. In some cases, a notification may have a non-numeric value. As an example, resource measurement notification may be “critical,” “high,” “test,” and “normal.” The flap detector can map these non-numeric resource measurements to numeric values. The flap detector can be configured with the mapping, can read data that informs the mapping, can be programmed with the mapping, etc. After mapping the non-numeric event values to numeric values, the flap detector can perform the flap detection.
The examples often refer to a “flap detector.” The flap detector is a construct used to refer to implementation of functionality for the disclosed magnitude based flap detection. This construct is utilized since numerous implementations are possible. A flap detector may be a standalone program, plug-in, extension, component of an event management system, etc.
The flowcharts are provided to aid in understanding the illustrations and are not to be used to limit scope of the claims. The flowcharts depict example operations that can vary within the scope of the claims. Additional operations may be performed; fewer operations may be performed; the operations may be performed in parallel; and the operations may be performed in a different order. For example,
As will be appreciated, aspects of the disclosure may be embodied as a system, method or program code/instructions stored in one or more machine-readable media. Accordingly, aspects may take the form of hardware, software (including firmware, resident software, micro-code, etc.), or a combination of software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” The functionality presented as individual modules/units in the example illustrations can be organized differently in accordance with any one of platform (operating system and/or hardware), application ecosystem, interfaces, programmer preferences, programming language, administrator preferences, etc.
Any combination of one or more machine readable medium(s) may be utilized. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable storage medium may be, for example, but not limited to, a system, apparatus, or device, that employs any one of or combination of electronic, magnetic, optical, electromagnetic, infrared, or semiconductor technology to store program code. More specific examples (a non-exhaustive list) of the machine readable storage medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a machine readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. A machine readable storage medium is not a machine readable signal medium. A machine readable storage medium does not include transitory, propagating signals.
A machine readable signal medium may include a propagated data signal with machine readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A machine readable signal medium may be any machine readable medium that is not a machine readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a machine readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as the Java® programming language, C++ or the like; a dynamic programming language such as Python; a scripting language such as Perl programming language or PowerShell script language; and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on a standalone machine, may execute in a distributed manner across multiple machines, and may execute on one machine while providing results and or accepting input on another machine.
The program code/instructions may also be stored in a machine readable medium that can direct a machine to function in a particular manner, such that the instructions stored in the machine readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
While the aspects of the disclosure are described with reference to various implementations and exploitations, it will be understood that these aspects are illustrative and that the scope of the claims is not limited to them. In general, techniques for magnitude based flap detection as described herein may be implemented with facilities consistent with any hardware system or hardware systems. Many variations, modifications, additions, and improvements are possible.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the disclosure. In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure.