Maintaining and improving application performance is an integral part of success for many of today's institutions. Businesses and other entities progressively rely on increased numbers of software applications for day to day operations. Consider a business having a presence on the World Wide Web. Typically, such a business will provide one or more web sites that run one or more web-based applications. A disadvantage of conducting business via the Internet in this manner is the reliance on software and hardware infrastructures for handling business transactions. If a web site goes down, becomes unresponsive or otherwise fails to properly serve customers, the business may lose potential sales and/or customers. Intranets and Extranets pose similar concerns for these businesses. Thus, there exists a need to monitor web-based, and other applications, to ensure they are performing properly or according to expectation.
Developers seek to debug software when an application or transaction is performing poorly to determine what part of the code is causing the performance problem. Even if a developer successfully determines which method, function, routine, process, etc. is executing when an issue occurs, it is often difficult to determine whether the problem lies with the identified method, etc., or whether the problem lies with another method, function, routine, process, etc. that is called by the identified method. Furthermore, it is often not apparent what is a typical or normal execution time for a portion of an application or transaction. Production applications can demonstrate a wide variety of what may be termed normal behavior depending on the nature of the application and its business requirements. In many enterprise systems, it may take weeks or months for a person monitoring an application to determine the normal range of performance metrics. Standard statistical techniques, such as those using standard deviation or interquatile ranges, may be used to determine whether a current metric value is normal compared to a previously measured value. In the context of many systems, such as web-application monitoring for example, standard statistical techniques may be insufficient to distinguish between statistical anomalies that do not significantly affect end-user experience from those that do. Thus, even with information regarding the time associated with a piece of code, the developer may not be able to determine whether the execution time is indicative of a performance problem or not.
An application monitoring system monitors one or more applications to generate and report application performance data for transactions. Actual performance data for one or more metrics is compared with corresponding baseline metric value(s) to detect anomalous transactions or components thereof. Automatic baselining for a selected metric is provided using variability based on a distribution range and arithmetic mean of actual performance data to determine an appropriate sensitivity for boundaries between comparison levels. A user-defined sensitivity parameter allows adjustment of baselines to increase or decrease comparison sensitivity for a selected metric. The system identifies anomalies in transactions, or components of transaction based on a comparison of actual performance data with the automatically determined baseline for a corresponding metric. The system reports performance data and other transactional data for identified anomalies.
In one embodiment, a computer-implemented method of determining a normal range of behavior for an application is provided that includes accessing performance data associated with a metric for a plurality of transactions of an application, accessing an initial range multiple for the metric, calculating a variability measure for the metric based on a maximum value, minimum value and arithmetic mean of the performance data, modifying the initial range multiple based on the calculated variability measure for the metric, and automatically establishing a baseline for the metric based on the modified range multiple.
A computer-implemented method in accordance with another embodiment includes monitoring a plurality of transactions associated with an application, generating performance data for the plurality of transactions of the application, the performance data corresponding to a selected metric, establishing a default deviation threshold for the selected metric, modifying the default deviation threshold using a calculated variability measure for the selected metric based on the performance data, automatically establishing a baseline for the selected metric using the modified deviation threshold, comparing the generated performance data for the plurality of transactions to the baseline for the metric, and reporting one or more transactions having performance data outside of the baseline for the selected metric.
In one embodiment, a computer-implemented method is provided that includes accessing performance data associated with a metric of an application, establishing an initial baseline for the metric, modifying the initial baseline based on a calculated variability of the performance data associated with the metric, determining at least one comparison threshold for the metric using the modified baseline for the metric, generating additional performance data associated with the metric of the application, comparing the additional performance data with the at least one comparison threshold, and reporting one or more anomalies associated with the application responsive to the comparing.
Embodiments in accordance with the present disclosure can be accomplished using hardware, software or a combination of both hardware and software. The software can be stored on one or more processor readable storage devices such as hard disk drives, CD-ROMs, DVDs, optical disks, floppy disks, tape drives, RAM, ROM, flash memory or other suitable storage device(s). In alternative embodiments, some or all of the software can be replaced by dedicated hardware including custom integrated circuits, gate arrays, FPGAs, PLDs, and special purpose processors. In one embodiment, software (stored on a storage device) implementing one or more embodiments is used to program one or more processors. The one or more processors can be in communication with one or more storage devices, peripherals and/or communication interfaces.
An application monitoring system monitors one or more applications to generate and report application performance data for transactions. Actual performance data for a metric is compared with a corresponding baseline metric value to detect anomalous transactions and components thereof. Automatic baselining for a selected metric is provided using variability based on a distribution range and arithmetic mean of actual performance data to determine an appropriate sensitivity for boundaries between comparison levels. A user-defined sensitivity parameter allows adjustment of baselines to increase or decrease comparison sensitivity for a selected metric. The system identifies anomalies in transactions and components of transactions based on a comparison of actual performance data with the automatically determined baseline for a corresponding metric. The system reports performance data and other transactional data for identified anomalies.
Anomalous transactions can be automatically determined using the baseline metrics. An agent is installed on an application server or other machine which performs a transaction in one embodiment. The agent receives monitoring data from monitoring code within an application that performs the transaction and determines a baseline for the transaction. The actual transaction performance is then compared to baseline metric values for transaction performance for each transaction. The agent can identify anomalous transactions based on the comparison and configuration data received from an application monitoring system. After the agent identifies anomalous transactions, information for the identified transactions is automatically reported to a user. The reported information may include rich application transaction information, including the performance and structure of components that comprise the application, for each anomaly transaction. One or more of the foregoing operations can be performed by a centralized or distributed enterprise manager in combination with the agents.
In one embodiment, the performance data is processed and reported as deviation information based on a deviation range for actual data point values. A number of deviation ranges can be generated based on a baseline metric value. The actual data point will be contained in one of the ranges. The deviation associated with the range is proportional to how far the range is from the predicted value. An indication of which range contains the actual data point value may be presented to a user through an interface and updated as different data points in the time series are processed.
A baseline for a selected metric is established automatically using actual performance data. The baseline can be dynamically updated based on data received over time. Absolute notions of metric variability are included in baseline determinations in addition to standard measurements of distribution spread. Considerations of metric variability allow more meaningful definitions of normal metric performance or behavior to be established. For example, incorporating variability allows the definition of normal behavior to include or focus on real-world human sensitivity to delays and variation. The inclusion of measured variability combines absolute deviation and relative deviation to dynamically determine normal values for application diagnostic metrics. These normal values can be established as baseline metrics such as a comparison threshold around a calculated average or mean in one example.
In one embodiment, an initial range multiple is defined for a selected metric. By way of non-limiting example, the range multiple may be a number of standard deviations from a calculated average or mean. The initial range multiple may be a default value or may be a value determined from past performance data for the corresponding metric. More than one range multiple can be defined to establish different comparison intervals for classifying application or transaction performance. For example, a first range multiple may define a first z-score or number of deviations above and/or below an average value and a second range multiple may define a second z-score or number of deviations further above and/or below the average value than the first z-score. Transactions falling outside the first range multiple may be considered abnormal and transactions falling outside the second range multiple may be considered very abnormal. Other designations may be used.
Using actual performance data, a variability of the selected metric is calculated, for example, by combining the range of the metric's distribution with its arithmetic mean. Generally, a fairly constant distribution having a narrow range will have a low variability if its mean is relatively large. If the metric is distributed widely compared to its average value, it will have a large variability. The calculated variability can be combined with the initial range multiples such that the comparison sensitivity is increased for more variable distributions and decreased for more constant distributions. The adjusted range multiple is combined with the standard deviation of the metric distribution to determine baseline metrics, such as comparison thresholds.
Response time, error rate, throughput, and stalls are examples of the many metrics that can be monitored, processed and reported using the present technology. Other examples of performance metrics that can be monitored, processed and reported include, but are not limited to, method timers, remote invocation method timers, thread counters, network bandwidth, servlet timers, Java Server Pages timers, systems logs, file system input and output bandwidth meters, available and used memory, Enterprise JavaBean timers, and other measurements of other activities. Other metrics and data may be monitored, processed and reported as well, including connection pools, thread pools, CPU utilization, user roundtrip response time, user visible errors, user visible stalls, and others. In various embodiments, performance metrics for which normality is generally accepted to be a combination of relative and absolute measures undergo automatic baselining using variability of the metric distribution.
Network server 140 may provide a network service to client device 110 over network 115. Application server 150 is in communication with network server 140, shown locally, but can also be connected over one or more networks. When network server 140 receives a request from client device 110, network server 140 may relay the request to application server 150 for processing. Client device 110 can be a laptop, PC, workstation, cell phone, PDA, or other computing device which is operated by an end user. The client device may also be an automated computing device such a server. Application server 150 processes the request received from network server 140 and sends a corresponding response to the client device 110 via the network server 140. In some embodiments, application server 150 may send a request to database server 160 as part of processing a request received from network server 140. Database server 160 may provide a database or some other backend service and process requests from application server 150
The monitoring system of
Performance data, such as time series data corresponding to one or more metrics, may be generated by monitoring an application using bytecode instrumentation. An application management tool, not shown but part of application monitoring system 190 in one example, may instrument the application's object code (also called bytecode).
Probe Builder 4 instruments or modifies the bytecode for Application 2 to add probes and additional code to create Application 6. The probes may measure specific pieces of information about the application without changing the application's business or other underlying logic. Probe Builder 4 may also generate one or more Agents 8. Agents 8 may be installed on the same machine as Application 6 or a separate machine. Once the probes have been installed in the application bytecode, the application may be referred to as a managed application. More information about instrumenting byte code can be found in U.S. Pat. No. 6,260,187 “System For Modifying Object Oriented Code” by Lewis K. Cirne, incorporated herein by reference in its entirety.
One embodiment instruments bytecode by adding new code. The added code activates a tracing mechanism when a method starts and terminates the tracing mechanism when the method completes. To better explain this concept, consider the following example pseudo code for a method called “exampleMethod.” This method receives an integer parameter, adds 1 to the integer parameter, and returns the sum:
In some embodiments, instrumenting the existing code conceptually includes calling a tracer method, grouping the original instructions from the method in a “try” block and adding a “finally” block with a code that stops the tracer. An example is below which uses the pseudo code for the method above.
IMethodTracer is an interface that defines a tracer for profiling. AMethodTracer is an abstract class that implements IMethodTracer. IMethodTracer includes the methods startTrace and finishTrace. AMethodTracer includes the methods startTrace, finishTrace, dostartTrace and dofinishTrace. The method startTrace is called to start a tracer, perform error handling and perform setup for starting the tracer. The actual tracer is started by the method doStartTrace, which is called by startTrace. The method finishTrace is called to stop the tracer and perform error handling. The method finishTrace calls doFinishTrace to actually stop the tracer. Within AMethodTracer, startTrace and finishTracer are final and void methods; and doStartTrace and doFinishTrace are protected, abstract and void methods. Thus, the methods doStartTrace and do FinishTrace must be implemented in subclasses of AMethodTracer. Each of the subclasses of AMethodTracer implement the actual tracers. The method loadTracer is a static method that calls startTrace and includes five parameters. The first parameter, “com.introscope . . . ” is the name of the class that is intended to be instantiated that implements the tracer. The second parameter, “this” is the object being traced. The third parameter “com.wily.example . . . ” is the name of the class that the current instruction is inside of. The fourth parameter, “exampleMethod” is the name of the method the current instruction is inside of. The fifth parameter, “name= . . . ” is the name to record the statistics under. The original instruction (return x+1) is placed inside a “try” block. The code for stopping the tracer (a call to the static method tracer.finishTrace) is put within the finally block.
The above example shows source code being instrumented. In some embodiments, the present technology doesn't actually modify source code, but instead, modifies object code. The source code examples above are used for illustration. The object code is modified conceptually in the same manner that source code modifications are explained above. That is, the object code is modified to add the functionality of the “try” block and “finally” block. More information about such object code modification can be found in U.S. patent application Ser. No. 09/795,901, “Adding Functionality To Existing Code At Exits,” filed on Feb. 28, 2001, incorporated herein by reference in its entirety. In another embodiment, the source code can be modified as explained above.
In one embodiment of the system of
In some embodiments, a user of the system in
Comparison system logic 156 includes logic that compares expected data to baseline data. In particular, comparison system logic 156 includes logic that carries out processes as discussed below. Reporting engine 158 may identify flagged transactions, generate a report package, and transmit a report package having data for each flagged transaction. The report package provided by reporting engine 158 may include anomaly data 222.
The computer system of
Portable storage medium drive 262 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, to input and output data and code to and from the computer system of
User input device(s) 260 provides a portion of a user interface. User input device(s) 260 may include an alpha-numeric keypad for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. In order to display textual and graphical information, the computer system of
The components contained in the computer system of
A trace session is configured for one or more transactions at step 410. Configuring a trace may be performed at a workstation within application monitoring system 190. Trace configuration may involve identifying one or more transactions to monitor, one or more components within an application to monitor, selecting a sensitivity parameter for a baseline to apply to transaction performance data, and other information. The transaction trace session is typically configured with user input but may be automated in other examples. Eventually, the configuration data is transmitted to an agent 152 within an application server by application monitoring system 190.
In some embodiments, a dialog box or other interface is presented to the user. This dialog box or interface will prompt the user for transaction trace configuration information. The configuration information is received from the user through a dialogue box or other interface element. Other means for entering the information can also be used within the spirit of the present invention.
Several configuration parameters may be received from or configured by a user, including a baseline. A user may enter a desired comparison threshold or range parameter time, which could be in seconds, milliseconds, microseconds, etc. When analyzing transactions for response time, the system will report those transactions that have an execution time that does not fall within the comparison threshold with respect to a baseline value. For example, if the comparison threshold is one second and the detected baseline is three seconds, the system will report transactions that are executing for shorter than two seconds or longer than four seconds, which are outside the range of the baseline plus or minus the threshold.
In some embodiments, other configuration data can also be provided. For example, the user can identify an agent, a set of agents, or all agents, and only identified agents will perform the transaction tracing described herein. In some embodiments, enterprise manager 120 will determine which agents to use. Another configuration variable that can be provided is the session length. The session length indicates how long the system will perform the tracing. For example, if the session length is ten minutes, the system will only trace transactions for ten minutes. At the end of the ten minute period, new transactions that are started will not be traced; however, transactions that have already started during the ten minute period will continue to be traced. In other embodiments, at the end of the session length all tracing will cease regardless of when the transaction started. Other configuration data can also include specifying one or more userIDs, a flag set by an external process or other data of interest to the user. For example, the userID is used to specify that the only transactions initiated by processes associated with a particular one, or more userIDs will be traced. The flag is used so that an external process can set a flag for certain transactions, and only those transactions that have the flag set will be traced. Other parameters can also be used to identify which transactions to trace. In one embodiment, a user does not provide a threshold, deviation, or trace period for transactions being traced. Rather, the application performance management tool intelligently determines the threshold(s).
At step 415, the workstation adds the new filter to a list of filters on the workstation. In step 420, the workstation requests enterprise manager 120 to start the trace using the new filter. In step 425, enterprise manager 120 adds the filter received from the workstation to a list of filters. For each filter in its list, enterprise manager 120 stores an identification of the workstation that requested the filter, the details of the filter (described above), and the agents to which the filter applies. In one embodiment, if the workstation does not specify the agents to which the filter applies, then the filter will apply to all agents. In step 430, enterprise manager 120 requests the appropriate agents to perform the trace. In step 435, the appropriate agents perform the trace and send data to enterprise manager 120. More information about steps 430 and 435 will be provided below. In step 440, enterprise manager 120 matches the received data to the appropriate workstation/filter/agent entry. In step 445, enterprise manager 120 forwards the data to the appropriate workstation(s) based on the matching in step 440. In step 450, the appropriate workstations report the data. In one embodiment, the workstation can report the data by writing information to a text file, to a relational database, or other data container. In another embodiment, a workstation can report the data by displaying the data in a GUI. More information about how data is reported is provided below.
When performing a trace of a transaction in one example, one or more Agents 8 perform transaction tracing using Blame technology. Blame Technology works in a managed Java Application to enable the identification of component interactions and component resource usage. Blame Technology tracks components that are specified to it using concepts of consumers and resources. A consumer requests an activity while a resource performs the activity. A component can be both a consumer and a resource, depending on the context in how it is used.
An exemplary hierarchy of transaction components is now discussed. An Agent may build a hierarchical tree of transaction components from information received from trace code within the application performing the transaction. When reporting about transactions, the word Called designates a resource. This resource is a resource (or a sub-resource) of the parent component, which is the consumer. For example, under the consumer Servlet A (see below), there may be a sub-resource Called EJB. Consumers and resources can be reported in a tree-like manner. Data for a transaction can also be stored according to the tree. For example, if a Servlet (e.g. Servlet A) is a consumer of a network socket (e.g. Socket C) and is also a consumer of an EJB (e.g. EJB B), which is a consumer of a JDBC (e.g. JDBC D), the tree might look something like the following:
In one embodiment, the above tree is stored by the Agent in a stack called the Blame Stack. When transactions are started, they are added to or “pushed onto” the stack. When transactions are completed, they are removed or “popped off” the stack. In some embodiments, each transaction on the stack has the following information stored: type of transaction, a name used by the system for that transaction, a hash map of parameters, a timestamp for when the transaction was pushed onto the stack, and sub-elements. Sub-elements are Blame Stack entries for other components (e.g. methods, process, procedure, function, thread, set of instructions, etc.) that are started from within the transaction of interest. Using the tree as an example above, the Blame Stack entry for Servlet A would have two sub-elements. The first sub-element would be an entry for EJB B and the second sub-element would be an entry for Socket Space C. Even though a sub-element is part of an entry for a particular transaction, the sub-element will also have its own Blame Stack entry. As the tree above notes, EJB B is a sub-element of Servlet A and also has its own entry. The top (or initial) entry (e.g., Servlet A) for a transaction is called the root component. Each of the entries on the stack is an object. While the embodiment described herein includes the use of Blame technology and a stack, other embodiments of the present invention can use different types of stack, different types of data structures, or other means for storing information about transactions. More information about blame technology and transaction tracing can be found in U.S. patent application Ser. No. 10/318,272, “Transaction Tracer,” filed on Dec. 12, 2002, incorporated herein by reference in its entirety.
In step 504, the agent acquires the desired parameter information. In one embodiment, a user can configure which parameter information is to be acquired via a configuration file or the GUI. The acquired parameters are stored in a hash map, which is part of the object pushed onto the Blame Stack. In other embodiments, the identification of parameters are pre-configured. There are many different parameters that can be stored. In some embodiments, the actual list of parameters used is dependent on the application being monitored. Some parameters that may be obtained and stored include UserID, URL, URL Query, Dynamic SQL, method, object, class name, and others. In one embodiment, the actual list of parameters used is dependent on the application being monitored. The present disclosure is not limited to any particular set of parameters.
In step 506, the system acquires a timestamp indicating the current time. In step 508, a stack entry is created. In step 510, the stack entry is pushed onto the Blame Stack. In one embodiment, the timestamp is added as part of step 510. The process of
A timestamp is retrieved or acquired at step 506. The time stamp indicates the time at which the transaction or particular component was pushed onto the stack. After retrieving the time stamp, a stack entry is created at step 508. In some embodiments, the stack entry is created to include the parameter information acquired at step 504 as well as the time stamp retrieved at step 506. The stack entry is then added or “pushed onto” the Blame Stack at step 510. Once the transaction completes, a process similar to that of
Performance data for one or more traced transactions is accessed at step 560. In one possible approach, initial transaction data and metrics are received from agents at the hosts. For example, this information may be received by the enterprise manager over a period of time which is used to establish the baseline metrics. In another possible approach, initial baseline metrics are set, e.g., based on a prior value of the metric or an administrator input, and subsequently periodically updated automatically.
The performance data may be accessed from agent 105 by enterprise manager 120. Performance data associated with a desired metric is identified. In one embodiment, enterprise manager 120 parses the received performance data and identifies a portion of the performance data to be processed.
The performance data may be a time series of past performance data associated with a recently completed transaction or component of a transaction The time series may be received as a first group of data in a set of groups that are received periodically. For example, the process of identifying anomalous transactions may be performed periodically, such as every five, ten or fifteen seconds. The time series of data may be stored by the agents, representing past performance of one or more transactions being analyzed. For example, the time series of past performance data may represent response times for the last 50 invocations, the invocations in the last fifteen seconds, or some other set of invocations for the particular transaction.
In some embodiments, if there are multiple data points for a given data type, the data is aggregated as shown at step 565. The particular aggregation function may differ according to the data type being aggregated. For example, multiple response time data points are averaged together while multiple error rate data points are summed. In some embodiments, there is one data set per application. Thus, if there is aggregated data for four different applications, there will be four data sets. The data set may comprise a time series of data, such as a series of response times that take place over time. In some embodiments, the data sets may be aggregated by URL rather than application, with one dataset per URL.
The metrics can be correlated with transactions, although this is not always necessary. After selecting a first metric, a baseline is calculated at step 570 using a calculated variability of the performance data corresponding to the selected first metric. Different baselines for metrics can be used in accordance with different embodiments. In one embodiment, standard deviations can be used to establish comparison intervals for determining whether performance data is outside one or more normal ranges. For instance, a transaction having a metric a specified number of standard deviations away from the average for the metric may be considered anomalous. Multiple numbers of standard deviations (also referred to as z-score) may be established to further refine the degree of reporting for transactions. By way of example, a first number of standard deviations from average may be used to classify a transaction as abnormal while a second number may used to classify a transaction as highly abnormal. Initial baseline measures can be established by a user or automatically determined after a number of transactions.
The baseline metrics can be deviation ranges set as a function of the response time, error count or CPU load, for instance, e.g., as a percentage, a standard deviation, or so forth. Further, the deviation range can extend above and/or below the baseline level. As an example, a baseline response time for a transaction may be 1 sec. and the deviation range may be +/−0.2 sec. Thus, a response time in the range of 0.8-1.2 sec, would be considered normal, while a response time outside the range would be considered anomalous.
The calculated variability used to determine a baseline metric facilitates smoothing or tempering of deviations (e.g., a number of standard deviations) used to define sensitivity boundaries for normality. In one embodiment, the range of the distribution is combined with its arithmetic mean to determine the appropriate sensitivity to boundaries between comparison intervals as further explained in
A metric having a fairly constant distribution (i.e., having a narrow range) will have a low variability if its mean is relatively large. By contrast, a metric having a larger distribution (i.e., having a wider range) compared with its average value will have a large variability. By introducing the variability of a metric into the determination of baseline values, more valuable indications of normality can be achieved. Using the variability in defining a baseline value increases the comparison sensitivity for metrics having more variable distributions and decreases the comparison sensitivity for metrics having more constant distributions.
After calculating the baseline for the metric, the transaction performance data is compared to the baseline metric at step 575. At this step, performance data generated from information received from the transaction trace and compared to the baseline dynamically determined at step 570.
After comparing the data, an anomaly event may be generated based on the comparison if needed at step 580. Thus, if the comparison of the actual performance data and baseline metric value indicates that transaction performance was an anomaly, an anomaly event may be generated. In some embodiments, generating an anomaly event includes setting a flag for the particular transaction. Thus, if the actual performance of a transaction was slower or faster than expected within a particular range, a flag may be set which identified the transaction instance. The flag for the transaction may be set by comparison logic 156 within agent 152.
At step 585, the enterprise manger determines if there are additional metrics against which the performance data should be compared. If there are additional metrics to be evaluated, the next metric is selected at step 590 and the method returns to step 570 to calculate its baseline. If there are no additional metrics to be evaluated, anomaly events may be reported at step 490. In some embodiments, anomaly events are reported based on a triggering event, such as the expiration of an internal timer, a request received from enterprise manager 120 or some other system, or some other event. Reporting may include generating a package of data and transmitting the data to enterprise manager 120. Reporting an anomaly event is discussed in more detail below with respect to
Performance data for one or more new trace sessions is combined with any data sets for past performance data of the selected metric at step 605 if available. Various aggregation techniques as earlier described can be used. At step 610, the current range multiple for the metric is accessed. The range multiple is a number of standard deviations used as a baseline metric in one implementation. If a current range multiple for the metric is not available, an initial value can be established. Default values can be used in one embodiment.
At step 615, the variability of the metric is calculated based on the aggregated performance data. The variability is based on the maximum and minimum values in the distribution of data for the selected metric. A more detailed example is described with respect to
modified_range_multiple=initial_multiple−variability Equation 1
At step 625, the Enterprise Manager determines whether a user provided desired sensitivity parameter is available. A user can indicate a desired level of sensitivity to fine tune the deviation comparisons that are made. By increasing the sensitivity, more transactions or less deviating behavior will be considered abnormal. By lowering the sensitivity, fewer transactions or more deviating behavior will be considered abnormal. If a user has provided a desired sensitivity, a sensitivity multiple is calculated at step 630. Equation 2 sets forth one technique for calculating a sensitivity multiple. A maximum sensitivity and default sensitivity are first established. Various values can be used. For instance, consider an example using a maximum sensitivity of 5 and a default sensitivity of 3 (the mean possible value). The sensitivity multiple can be calculated by determining the difference between the sum of the desired sensitivity and 1, then determining the quotient of this value and the default sensitivity.
At step 635, one or more comparison thresholds are established based on the modified range multiple and the sensitivity multiple if a user-defined sensitivity parameter was provided. More details regarding establishing comparison thresholds are provided with respect to
At step 650, a distribution of values for the selected metric is accessed. The distribution of values is based on monitored transaction data that can be aggregated as described. At step 655, the range of the distribution of values for the metric is determined. The range is calculated using the maximum and minimum values in the distribution, for example, by determining their difference. The arithmetic mean of the distribution of values is determined at step 660. At step 665, the arithmetic mean is combined with the distribution range to determine a final variability value. In one example, step 665 includes determining the quotient of the distribution range and arithmetic mean as shown in Equation 3. In one embodiment, the variability is capped at 1, although this is not required. If the calculated variability is greater than 1, then the variability is set to 1.
thresholds=avg±(sens mult*modified range mult*standard dev) Equation 4
At step 710, the system determines if the actual performance data, such as a data point in the metric distribution, is within the upper comparison threshold(s) for the selected metric. If the actual data is within the upper limits, the system determines if the actual data is within the lower comparison threshold(s) for the selected metric at step 720. If the actual data is within the lower limits, the process completes at step 730 for the selected metric without flagging any anomalies. If the actual data is not within the upper comparison threshold(s) at step 710, the corresponding transaction is flagged at step 715 with an indication that the deviation is high for that transaction. If the actual data is within the upper comparison threshold(s) but not the lower comparison threshold(s), the transaction is flagged at step 725 with an indication that the deviation is low for that transaction.
The method of
After accessing the first transaction trace data set, a determination is made as to whether the accessed data set is flagged to be reported at step 830. A transaction may be flagged at step 715 or 725 in the method of
A determination is made as to whether more transaction data sets exists to be analyzed at step 870. If more transaction data sets are to be analyzed to determine if a corresponding transaction is flagged, the next transaction data set is accessed at step 880 and the method returns to step 830. If no further transaction data sets exist to be analyzed, the report package containing the flagged data sets and component data is transmitted to enterprise manager 120 at step 890.
The foregoing detailed description has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto.