Service monitoring interface reflecting by-service mode

Information

  • Patent Grant
  • 10305758
  • Patent Number
    10,305,758
  • Date Filed
    Friday, September 22, 2017
    7 years ago
  • Date Issued
    Tuesday, May 28, 2019
    5 years ago
Abstract
Network connected devices are controlled via the transmission of action messages to prevent or correct conditions that impair the operation of the networked information technology (IT) assets. The service monitoring system (SMS) monitoring the IT environment groups together related notable events that are received during system operation. Automatic processes dynamically determine grouping operations that automatically correlate the events without requiring, for example, a set of declarative grouping rules. Events grouping may be performed on a by-service basis to facilitate the complex processing of predicting undesirable system conditions that may be prevented or reduced by transmission of the action messages to the appropriate assets.
Description
TECHNICAL FIELD

The present disclosure relates to system monitoring including, more particularly, monitoring a technology environment using machine data.


BACKGROUND

Modern data centers often comprise thousands of hosts that operate collectively to service requests from even larger numbers of remote clients. During operation, components of these data centers can produce significant volumes of machine-generated data. The unstructured nature of much of this data has made it challenging to perform indexing and searching operations because of the difficulty of applying semantic meaning to unstructured data. As the number of hosts and clients associated with a data center continues to grow, processing large volumes of machine-generated data in an intelligent manner and effectively presenting the results of such processing continues to be a priority.





BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various implementations of the disclosure.



FIG. 1 illustrates a block diagram of an example of entities providing a service, in accordance with one or more implementations of the present disclosure.



FIG. 2 is a block diagram of one implementation of a service monitoring system, in accordance with one or more implementations of the present disclosure.



FIG. 3 is a block diagram illustrating an entity definition for an entity, in accordance with one or more implementations of the present disclosure.



FIG. 4 is a block diagram illustrating a service definition that relates one or more entities with a service, in accordance with one or more implementations of the present disclosure.



FIG. 5 is a flow diagram of an implementation of a method for creating one or more key performance indicators for a service, in accordance with one or more implementations of the present disclosure.



FIG. 6 is a flow diagram of an implementation of a method for creating an entity definition for an entity, in accordance with one or more implementations of the present disclosure.



FIG. 7 illustrates an example of a graphical user interface (GUI) for creating and/or editing entity definition(s) and/or service definition(s), in accordance with one or more implementations of the present disclosure.



FIG. 8 illustrates an example of a GUI for creating and/or editing entity definitions, in accordance with one or more implementations of the present disclosure.



FIG. 9A illustrates an example of a GUI for creating an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 9B illustrates an example of input received via GUI for creating an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 9C illustrates an example of a GUI of a service monitoring system for creating an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 10A illustrates an example of a GUI for creating and/or editing entity definitions, in accordance with one or more implementations of the present disclosure.



FIG. 10B illustrates an example of the structure of an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 10C illustrates an example of an instance of an entity definition record for an entity, in accordance with one or more implementations of the present disclosure.



FIG. 10D is a flow diagram of an implementation of a method for creating entity definition(s) using a file, in accordance with one or more implementations of the present disclosure.



FIG. 10E is a block diagram of an example of creating entity definition(s) using a file, in accordance with one or more implementations of the present disclosure.



FIG. 10F illustrates an example of a GUI of a service monitoring system for creating entity definition(s) using a file or using a set of search results, in accordance with one or more implementations of the present disclosure.



FIG. 10G illustrates an example of a GUI of a service monitoring system for selecting a file for creating entity definitions, in accordance with one or more implementations of the present disclosure.



FIG. 10H illustrates an example of a GUI of a service monitoring system that displays a table for facilitating user input for creating entity definition(s) using a file, in accordance with one or more implementations of the present disclosure.



FIG. 10I illustrates an example of a GUI of a service monitoring system for displaying a list of entity definition component types, in accordance with one or more implementations of the present disclosure.



FIG. 10J illustrates an example of a GUI of a service monitoring system for specifying the type of entity definition records to create, in accordance with one or more implementations of the present disclosure.



FIG. 10K illustrates an example of a GUI of a service monitoring system for merging entity definition records, in accordance with one or more implementations of the present disclosure.



FIG. 10L illustrates an example of a GUI of a service monitoring system for providing information for newly created and/or updated entity definition records, in accordance with one or more implementations of the present disclosure.



FIG. 10M illustrates an example of a GUI of a service monitoring system for saving configurations settings of an import, in accordance with one or more implementations of the present disclosure.



FIGS. 10N-10O illustrates an example of GUIs of a service monitoring system for setting the parameters for monitoring a file, in accordance with one or more implementations of the present disclosure.



FIG. 10P illustrates an example of a GUI of a service monitoring system for creating and/or editing entity definition record(s), in accordance with one or more implementations of the present disclosure.



FIG. 10Q is a flow diagram of an implementation of a method for creating entity definition(s) using a search result set, in accordance with one or more implementations of the present disclosure.



FIG. 10R is a block diagram of an example of creating entity definition(s) using a search result set, in accordance with one or more implementations of the present disclosure.



FIG. 10S illustrates an example of a GUI of a service monitoring system for defining search criteria for a search query for creating entity definition(s), in accordance with one or more implementations of the present disclosure.



FIG. 10T illustrates an example of a GUI of a service monitoring system for defining a search query using a saved search, in accordance with one or more implementations of the present disclosure.



FIG. 10U illustrates an example of a GUI of a service monitoring system that displays a search result set for creating entity definition(s), in accordance with one or more implementations of the present disclosure.



FIG. 10V illustrates an example of a of a service monitoring system that displays a table for facilitating user input for creating entity definition(s) using a search result set, in accordance with one or more implementations of the present disclosure.



FIG. 10W illustrates an example of a GUI of a service monitoring system for merging entity definition records, in accordance with one or more implementations of the present disclosure.



FIG. 10X illustrates an example of a GUI of a service monitoring system for providing information for newly created and/or updated entity definition records, in accordance with one or more implementations of the present disclosure.



FIG. 10Y illustrates an example of a GUI of a service monitoring system for saving configurations settings of an import, in accordance with one or more implementations of the present disclosure.



FIG. 10Z illustrates and example GUI of a service monitoring system for setting the parameters for a saved search, in accordance with one or more implementations of the present disclosure.



FIG. 10AA is a flow diagram of an implementation of a method for creating an informational field and adding the informational field to an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 10AB illustrates an example of a GUI facilitating user input for creating an informational field and adding the informational field to an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 10AC is a flow diagram of an implementation of a method for filtering entity definitions using informational field-value data, in accordance with one or more implementations of the present disclosure.



FIG. 10AD-10AE illustrate examples of GUIs facilitating user input for filtering entity definitions using informational field-value data, in accordance with one or more implementations of the present disclosure.



FIG. 10AF is a flow diagram of a method addressing the automatic updating of a set of stored entity definitions, including depictions of certain components in the computing environment.



FIG. 11 is a flow diagram of an implementation of a method for creating a service definition for a service, in accordance with one or more implementations of the present disclosure.



FIG. 12 illustrates an example of a GUI for creating and/or editing service definitions, in accordance with one or more implementations of the present disclosure.



FIG. 13 illustrates an example of a GUI for identifying a service for a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 14 illustrates an example of a GUI for creating a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 15 illustrates an example of a GUI for associating one or more entities with a service by associating one or more entity definitions with a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 16 illustrates an example of a GUI facilitating user input for creating an entity definition, in accordance with one or more implementations of the present disclosure.



FIG. 17A illustrates an example of a GUI indicating one or more entities associated with a service based on input, in accordance with one or more implementations of the present disclosure.



FIG. 17B illustrates an example of the structure for storing a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 17C is a block diagram of an example of using filter criteria to dynamically identify one or more entities and to associate the entities with a service, in accordance with one or more implementations of the present disclosure.



FIG. 17D is a flow diagram of an implementation of a method for using filter criteria to associate entity definition(s) with a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 17E illustrates an example of a GUI of a service monitoring system for using filter criteria to identify one or more entity definitions to associate with a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 17F illustrates an example of a GUI of a service monitoring system for specifying filter criteria for a rule, in accordance with one or more implementations of the present disclosure.



FIG. 17G illustrates an example of a GUI of a service monitoring system for specifying one or more values for a rule, in accordance with one or more implementations of the present disclosure.



FIG. 17H illustrates an example of a GUI of a service monitoring system for specifying multiple rules for associating one or more entity definitions with a service definition, in accordance with one or more implementations of the present disclosure.



FIG. 17I illustrates an example of a GUI of a service monitoring system for displaying entity definitions that satisfy filter criteria, in accordance with one or more implementations of the present disclosure.



FIG. 17J is a system diagram including a process flow for implementing service discovery in one embodiment.



FIG. 17K depicts a user interface display related to service and entity discovery processing in one embodiment.



FIG. 17L depicts a user interface display related to service and entity discovery processing in one embodiment including a presentation of discovered items.



FIG. 17M depicts a user interface display related to editing and confirmation of discovered items.



FIG. 17N depicts a user interface display related to editing and confirmation of discovered items with filtering and bulk edit aspects.



FIG. 17O depicts a user interface display related to bulk editing, particularly bulk editing of the service association.



FIG. 17P depicts a user interface display related to graphically visualizing discovered items.



FIG. 17Q depicts a user interface display aspect related to graphically visualizing discovered items.



FIG. 17R depicts a user interface display related to automated configuration updates of service discovery.



FIG. 18 illustrates an example of a GUI for specifying dependencies for the service, in accordance with one or more implementations of the present disclosure.



FIG. 19 is a flow diagram of an implementation of a method for creating one or more key performance indicators (KPIs) for a service, in accordance with one or more implementations of the present disclosure.



FIG. 20 is a flow diagram of an implementation of a method for creating a search query, in accordance with one or more implementations of the present disclosure.



FIG. 21 illustrates an example of a GUI for creating a KPI for a service, in accordance with one or more implementations of the present disclosure.



FIG. 22 illustrates an example of a GUI for creating a KPI for a service, in accordance with one or more implementations of the present disclosure.



FIG. 23 illustrates an example of a GUI for receiving input of search processing language for defining a search query for a KPI for a service, in accordance with one or more implementations of the present disclosure.



FIG. 24 illustrates an example of a GUI for defining a search query for a KPI using a data model, in accordance with one or more implementations of the present disclosure.



FIG. 25 illustrates an example of a GUI for facilitating user input for selecting a data model and an object of the data model to use for the search query, in accordance with one or more implementations of the present disclosure.



FIG. 26 illustrates an example of a GUI for displaying a selected statistic, in accordance with one or more implementations of the present disclosure.



FIG. 27 illustrates an example of a GUI for editing which entity definitions to use for the KPI, in accordance with one or more implementations of the present disclosure.


FIG. 27A1 illustrates a process for the production of multiple KPIs using a common shared base search in one embodiment.


FIG. 27A2 illustrates a user interface as may be used for the creation and maintenance of shared base search definition information for controlling an SMS in one embodiment.


FIG. 27A3 illustrates a user interface as may be used for the creation of metric definition information of shared base search in one embodiment.


FIG. 27A4 illustrates a user interface as may be used in one embodiment to establish an association between a KPI and a defined shared base search.



FIG. 28 is a flow diagram of an implementation of a method for defining one or more thresholds for a KPI, in accordance with one or more implementations of the present disclosure.



FIGS. 29A-B, illustrate examples of a graphical interface enabling a user to set a threshold for the KPI, in accordance with one or more implementations of the present disclosure.



FIG. 29C illustrates an example GUI 2960 for configuring KPI monitoring in accordance with one or more implementations of the present disclosure.



FIG. 30 illustrates an example GUI for enabling a user to set one or more thresholds for the KPI, in accordance with one or more implementations of the present disclosure.



FIG. 31A-C illustrate example GUIs for defining thresholds for a KPI, in accordance with one or more implementations of the present disclosure.



FIGS. 31D-31F illustrate example GUIs for defining threshold settings for a KPI, in accordance with alternative implementations of the present disclosure.



FIG. 31G is a flow diagram of an implementation of a method for defining one or more thresholds for a KPI on a per entity basis, in accordance with one or more implementations of the present disclosure.



FIG. 32 is a flow diagram of an implementation of a method for calculating an aggregate KPI score for a service based on the KPIs for the service, in accordance with one or more implementations of the present disclosure.



FIG. 33A illustrates an example GUI 3300 for assigning a frequency of monitoring to a KPI based on user input, in accordance with one or more implementations of the present disclosure.



FIG. 33B illustrates an example GUI for defining threshold settings, including state ratings, for a KPI, in accordance with one or more implementations of the present disclosure.



FIG. 34A is a flow diagram of an implementation of a method for calculating a value for an aggregate KPI for the service, in accordance with one or more implementations of the present disclosure.



FIG. 34AB is a flow diagram of an implementation of a method for automatically defining one or more thresholds for a KPI, in accordance with one or more implementations of the present disclosure.



FIG. 34AC-AO illustrate example GUIs for configuring automatic thresholds for a KPI, in accordance with one or more implementations of the present disclosure.



FIG. 34AP is a flow diagram of an exemplary method for defining multiple sets of KPI thresholds that apply to different time frames, in accordance with one or more implementations of the present disclosure.



FIG. 34AQ is a flow diagram of an exemplary method for determining KPI states based on multiple sets of KPI thresholds that correspond to different times frames, in accordance with one or more implementations of the present disclosure.



FIG. 34AR is an exemplary GUI for defining threshold settings that apply to different time frames, in accordance with one or more implementations of the present disclosure.



FIG. 34AS is an exemplary GUI for displaying multiple KPI states according to sets of KPI thresholds with different time frames, in accordance with one or more implementations of the present disclosure.



FIG. 34AT is an exemplary GUI for displaying threshold information of one or more time policies using a presentation schedule having a time grid arrangement, in accordance with one or more implementations of the present disclosure.



FIG. 34AU is an exemplary GUI for displaying a presentation schedule having time slots in a graph arrangement and a depiction illustrating KPI values, in accordance with one or more implementations of the present disclosure.



FIG. 34AV is an exemplary GUI for displaying a presentation schedule having multiple depictions representing different portions of training data, in accordance with one or more implementations of the present disclosure.



FIG. 34AW is an exemplary GUI for displaying multiple presentation schedules and multiple graphical control elements for creating one or more time policies and configuring threshold information, in accordance with one or more implementations of the present disclosure.



FIG. 34AX is a flow diagram of an exemplary method for displaying a graphical user interface including a presentation schedule with one or more time slots, in accordance with one or more implementations of the present disclosure.



FIG. 34AV is a flow diagram of an exemplary method for utilizing adaptive thresholding to determine thresholds based on training data, in accordance with one or more implementations of the present disclosure.


FIG. 34AZ1 is an exemplary GUI, in accordance with one or more implementations of the present disclosure.


FIG. 34AZ2 is an exemplary GUI, in accordance with one or more implementations of the present disclosure.


FIG. 34AZ3 is an exemplary GUI, in accordance with one or more implementations of the present disclosure.


FIG. 34AZ4 is a flow diagram of an exemplary method for anomaly detection, in accordance with one or more implementations of the present disclosure.



FIG. 34B illustrates a block diagram of an example of monitoring one or more services using key performance indicator(s), in accordance with one or more implementations of the present disclosure.



FIG. 34C illustrates an example of monitoring one or more services using a KPI correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34D illustrates an example of the structure for storing a KPI correlation search definition, in accordance with one or more implementations of the present disclosure.



FIG. 34E is a flow diagram of an implementation of a method for monitoring service performance using a KPI correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34F illustrates an example of a GUI of a service monitoring system for initiating creation of a KPI correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34G illustrates an example of a GUI of a service monitoring system for defining a KPI correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34H illustrates an example GUI for facilitating user input specifying a duration to use for a KPI correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34I illustrates an example of a GUI of a service monitoring system for presenting detailed performance data for a KPI for a time range, in accordance with one or more implementations of the present disclosure.



FIG. 34J illustrates an example of a GUI of a service monitoring system for specifying trigger criteria for a KPI for a KPI correlation search definition, in accordance with one or more implementations of the present disclosure.



FIG. 34K illustrates an example of a GUI of a service monitoring system for specifying trigger criteria for a KPI for a KPI correlation search definition, in accordance with one or more implementations of the present disclosure.



FIG. 34L illustrates an example of a GUI of a service monitoring system for creating a KPI correlation search based on a KPI correlation search definition, in accordance with one or more implementations of the present disclosure.



FIG. 34M illustrates an example of a GUI of a service monitoring system for creating the KPI correlation search as a saved search based on the KPI correlation search definition that has been specified, in accordance with one or more implementations of the present disclosure.



FIG. 34NA illustrates an example of a graphical user interface for selecting KPIs from one or more services and for adjusting the weights of the KPIs, in accordance with one or more implementations of the present disclosure



FIG. 34NB illustrates an exemplary weight adjustment display component, in accordance with one or more implementations of the present disclosure.



FIG. 34NC presents a flow diagram of an exemplary method for displaying a graphical user interface that enables a user to adjust KPI weights for an aggregate KPI that spans one or more IT services, in accordance with one or more implementations of the present disclosure



FIG. 34ND presents a flow diagram of an exemplary method for creating an aggregate KPI that characterizes the performance of multiple services, in accordance with one or more implementations of the present disclosure.



FIG. 34O is a flow diagram of an implementation of a method of causing display of a GUI presenting information pertaining to notable events produced as a result of correlation searches, in accordance with one or more implementations of the present disclosure.



FIG. 34PA illustrates an example of a GUI presenting information pertaining to notable events produced as a result of correlation searches, in accordance with one or more implementations of the present disclosure.



FIG. 34PB illustrates an example of a GUI for filtering the presentation of notable events produced as a result of correlation searches, in accordance with one or more implementations of the present disclosure.



FIG. 34Q illustrates an example of a GUI editing information pertaining to a notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34R illustrates an example of a GUI presenting options for actions that may be taken for a corresponding notable event produced as a result of a KPI correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34S illustrates an example of a GUI presenting options for actions that may be taken for a corresponding notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34T illustrates an example of a GUI presenting detailed information pertaining to a notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34U illustrates an example of a GUI for configuring a ServiceNow™ incident ticket produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34V illustrates an example of a GUI for configuring a ServiceNow™ event ticket produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34 W illustrates an example of a GUI presenting options for actions that may be taken for a corresponding notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 34X illustrates an example of a GUI for configuring an incident ticket for a notable event, in accordance with one or more implementations of the present disclosure.



FIG. 34Y illustrates an example of a GUI for configuring an event ticket for a notable event, in accordance with one or more implementations of the present disclosure.



FIG. 34Z illustrates an example of a GUI presenting detailed information pertaining to a notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure.


FIG. 34ZA1 illustrates a process embodiment for conducting a user interface for service monitoring based on service detail.


FIG. 34ZA2 illustrates a user interface as may be employed to enable of user to view and interact with service detail information in one embodiment.


FIG. 34ZA3 illustrates a KPI portion of a service detail user interface in one embodiment.


FIG. 34ZA4 illustrates an entity portion of a service detail user interface in one embodiment.


FIG. 34ZA5 illustrates an embodiment of a service selection interface aspect.


FIG. 34ZA6 illustrates a timeframe selection interface display in one embodiment.


FIG. 34ZB1 illustrates a process embodiment for conducting a user interface for service monitoring based on entity detail.


FIG. 34ZB2 illustrates an entity lister interface in one embodiment.


FIG. 34ZB3 illustrates a user interface as may be employed to enable of user to view and interact with entity detail information in one embodiment.


FIG. 34ZB4 illustrates a service portion of an entity detail user interface in one embodiment.


FIG. 34ZB5 illustrates a KPI portion of an entity detail user interface in one embodiment.


FIG. 34ZB6 illustrates a timeframe selection interface display in one embodiment.


FIG. 34ZC1 illustrates methods and certain related components of a system implementation permitting maintenance periods.


FIG. 34ZC2 illustrates one embodiment of a user interface for displaying and creating maintenance period definitions.


FIGS. 34ZC3 and 34ZC4 illustrate an example of a possible user interface embodiment for creating a maintenance period definition.


FIG. 34ZC5 illustrates a maintenance period definition detail user interface in one embodiment.


FIG. 34ZC6 illustrates the interface of FIG. 34ZC5 modified by the selection of an alternate tab control.


FIG. 34ZC7 illustrates examples of different content as may be useful to populate an information display area.


FIG. 34ZC8 illustrates examples of user interface elements for implementing output presentation related to maintenance periods in an embodiment.


FIG. 34ZD1 is a system diagram with methods for implementing automated event group processing in one embodiment.


FIG. 34ZD2 depicts a user interface related to group membership criteria for an event group in one embodiment.


FIG. 34ZD3 depicts user interface matter related to group membership criteria for an event group in one embodiment.


FIG. 34ZD4 depicts user interface matter related to group membership termination criteria for an event group in one embodiment.


FIG. 34ZD5 depicts user interface matter related to event group information in one embodiment.


FIG. 34ZD6 depicts a user interface related to automated actions for an event group in one embodiment.


FIG. 34ZD7 depicts a user interface related to event group policy information in one embodiment.


FIG. 34ZD8 depicts a user interface related to event group policies.


FIG. 34ZD9 depicts a user interface example including aspects related to automated event group processing.


FIG. 34ZD10 depicts a user interface or portion for a grouped events information display in one embodiment.


FIG. 34ZE1 is a system diagram including automated event correlation (AEC) processing in one embodiment.


FIG. 34ZE2 depicts a method for creating seed group control information to direct automated event correlation processing in one embodiment.


FIG. 34ZE3 depicts example embodiments of AEC group definition entries.


FIG. 34ZE4 depicts a method for performing AEC processing against notable events in or near realtime in one embodiment.


FIG. 34ZE5 depicts a method for matching an event to seed groups in one embodiment.


FIG. 34ZE6 depicts a method for matching an event to active (non-seed) groups in one embodiment.


FIG. 34ZE7 depicts a user interface for AEC control functions in one embodiment.


FIG. 34ZE8 depicts a user interface for AEC control functions related to seed group determinations in one embodiment.


FIG. 34ZE9 depicts a user interface for AEC control functions in one embodiment that includes example seed group creation controls and information.


FIG. 34ZE10 depicts a user interface for AEC control functions in one embodiment that includes an augmented information display.


FIG. 34ZF1 is a flow diagram of an example process for associating a notable/alert event with a service in one embodiment.


FIG. 34ZF2 illustrates an embodiment of notable/alert events datastore indexed by service.


FIG. 34ZF3 illustrates an example storage representation of a notable/alert event in one embodiment.


FIG. 34ZF4 depicts a method for creating by-service seed group control information to direct automated event correlation processing on a by-service basis in one embodiment.


FIG. 34ZF5 illustrates an example service monitoring interface GUI portion for effecting by-service automatic event correlation in one embodiment.


FIG. 34ZF6 depicts a method for performing by-service AEC processing against notable events in or near realtime in one embodiment.


FIG. 34ZF7 depicts a system diagram showing activation of a network attached action target device using causal determinations from by-service event groups in one embodiment.


FIG. 34ZF8 depicts a processing diagram for activating a network attached action target device using backtracking of predictor rules to make plausible causal determinations from by-service event groups, in one embodiment.



FIG. 35 is a flow diagram of an implementation of a method for creating a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 36A illustrates an example GUI for creating and/or editing a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 36B illustrates an example GUI for a dashboard-creation graphical interface for creating a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 37 illustrates an example GUI for a dashboard-creation graphical interface including a user selected background image, in accordance with one or more implementations of the present disclosure.



FIG. 38A illustrates an example GUI for displaying of a set of KPIs associated with a selected service, in accordance with one or more implementations of the present disclosure.



FIG. 38B illustrates an example GUI for displaying a set of KPIs associated with a selected service for which a user can select for a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 39A illustrates an example GUI facilitating user input for selecting a location in the dashboard template and style settings for a KPI widget, and displaying the KPI widget in the dashboard template, in accordance with one or more implementations of the present disclosure.



FIG. 39B illustrates example KPI widgets, in accordance with one or more implementations of the present disclosure.



FIG. 40 illustrates an example Noel gauge widget, in accordance with one or more implementations of the present disclosure.



FIG. 41 illustrates an example single value widget, in accordance with one or more implementations of the present disclosure.



FIG. 42 illustrates an example GUI illustrating a search query and a search result for a Noel gauge widget, a single value widget, and a trend indicator widget, in accordance with one or more implementations of the present disclosure.



FIG. 43A illustrates an example GUI portion of a service-monitoring dashboard for facilitating user input specifying a time range to use when executing a search query defining a KPI, in accordance with one or more implementations of the present disclosure.



FIG. 43B illustrates an example GUI for facilitating user input specifying an end date and time for a time range to use when executing a search query defining a KPI, in accordance with one or more implementations of the present disclosure.



FIG. 44 illustrates spark line widget, in accordance with one or more implementations of the present disclosure.



FIG. 45A illustrates an example GUI illustrating a search query and search results for a spark line widget, in accordance with one or more implementations of the present disclosure.



FIG. 45B illustrates spark line widget, in accordance with one or more implementations of the present disclosure.



FIG. 46A illustrates a trend indicator widget, in accordance with one or more implementations of the present disclosure.



FIG. 46B illustrates an example GUI for creating and/or editing a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 46BA illustrates an example GUI for specifying information for a new service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 46C illustrates an example GUI for editing a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 46D illustrates an example interface for using a data model to define an adhoc KPI, in accordance with one or more implementations of the present disclosure.



FIG. 46E illustrates an example interface for setting one or more thresholds for the adhoc KPI, in accordance with one or more implementations of the present disclosure.



FIG. 46F illustrates an example interface for a service-related KPI, in accordance with one or more implementations of the present disclosure.



FIG. 46GA illustrates exemplary interfaces for configuring the selection behavior (e.g., click-in behavior) of the service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 46GB illustrates an exemplary GUI for editing a service-monitoring dashboard to include customized selection behavior (e.g., click-in behavior), in accordance with one or more implementations of the present disclosure.



FIG. 46HA illustrates an example GUI for editing layers for items, in accordance with one or more implementations of the present disclosure.



FIG. 46HB illustrates an example GUI for editing layers for items, in accordance with one or more implementations of the present disclosure.



FIG. 46I illustrates an example GUI for moving a group of items, in accordance with one or more implementations of the present disclosure.



FIG. 46J illustrates an example GUI for connecting items, in accordance with one or more implementations of the present disclosure.



FIG. 46K illustrates a block diagram of an example for editing a line using the modifiable dashboard template, in accordance with one or more implementations of the present disclosure.



FIG. 47A is a flow diagram of an implementation of a method for creating and causing for display a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure.



FIG. 47B describes an example service-monitoring dashboard GUI, in accordance with one or more implementations of the present disclosure.



FIG. 47C illustrates an example service-monitoring dashboard GUI that is displayed in view mode based on the dashboard template, in accordance with one or more implementations of the present disclosure.


FIG. 47D1 illustrates a flow diagram for a method of dashboard template service swapping in one embodiment.


FIG. 47D2 illustrates a flowchart of a method for automatically determining comparable widget KPIs in one embodiment.


FIG. 47D3 illustrates a block diagram of a system for dashboard swapping in one embodiment.


FIG. 47D4 illustrates an example user interface for creating and/or updating a service monitoring dashboard.


FIG. 47D5 illustrates an example user interface used for service swapping in one embodiment.


FIG. 47D6 illustrates an example user interface displaying a base dashboard template with swapping enabled.


FIG. 47D7 illustrates an example user interface displaying a swapped dashboard template in one embodiment.


FIG. 47D8 illustrates an example user interface portion indicating a failed data source/KPI match for a dashboard widget in one embodiment.



FIG. 48 describes an example home page GUI for service-level monitoring, in accordance with one or more implementations of the present disclosure.



FIG. 49A describes an example home page GUI for service-level monitoring, in accordance with one or more implementations of the present disclosure.



FIG. 49B is a flow diagram of an implementation of a method for creating a home page GUI for service-level and KPI-level monitoring, in accordance with one or more implementations of the present disclosure.



FIG. 49C illustrates an example of a service-monitoring page 4920, in accordance with one or more implementations of the present disclosure.



FIG. 49D illustrates an example of a service-monitoring page 4920 including a notable events region, in accordance with one or more implementations of the present disclosure.



FIGS. 49E-F illustrate an example of a service-monitoring page, in accordance with one or more implementations of the present disclosure.



FIG. 50A is a flow diagram of an implementation of a method for creating a visual interface displaying graphical visualizations of KPI values along time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 50B is a flow diagram of an implementation of a method for generating a graphical visualization of KPI values along a time-based graph lane, in accordance with one or more implementations of the present disclosure.



FIG. 51 illustrates an example of a graphical user interface (GUI) for creating a visual interface displaying graphical visualizations of KPI values along time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 52 illustrates an example of a GUI for adding a graphical visualization of KPI values along a time-based graph lane to a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 53 illustrates an example of a visual interface with time-based graph lanes for displaying graphical visualizations, in accordance with one or more implementations of the present disclosure.



FIG. 54 illustrates an example of a visual interface displaying graphical visualizations of KPI values along time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 55A illustrates an example of a visual interface with a user manipulable visual indicator spanning across the time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 55B is a flow diagram of an implementation of a method for inspecting graphical visualizations of KPI values along a time-based graph lane, in accordance with one or more implementations of the present disclosure.



FIG. 55C illustrates an example of a visual interface with a user manipulable visual indicator spanning across multi-series time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 56 illustrates an example of a visual interface displaying graphical visualizations of KPI values along time-based graph lanes with options for editing the graphical visualizations, in accordance with one or more implementations of the present disclosure.



FIG. 57 illustrates an example of a GUI for editing a graphical visualization of KPI values along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 58 illustrates an example of a GUI for editing a graph style of a graphical visualization of KPI values along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 59 illustrates an example of a GUI for selecting the KPI corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 60 illustrates an example of a GUI for selecting a data model corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 61 illustrates an example of a GUI for selecting a data model corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 62A illustrates an example of a GUI for editing an aggregation operation for a data model corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 62B illustrates an example of a GUI for editing a graphical visualization of KPI values along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure.



FIG. 63 illustrates an example of a GUI for selecting a time range that graphical visualizations along a time-based graph lane in a visual interface should cover, in accordance with one or more implementations of the present disclosure.



FIG. 64A illustrates an example of a visual interface for selecting a subset of a time range that graphical visualizations along a time-based graph lane in a visual interface cover, in accordance with one or more implementations of the present disclosure.



FIG. 64B is a flow diagram of an implementation of a method for enhancing a view of a subset a subset of a time range for a time-based graph lane, in accordance with one or more implementations of the present disclosure.



FIG. 65 illustrates an example of a visual interface displaying graphical visualizations of KPI values along time-based graph lanes for a selected subset of a time range, in accordance with one or more implementations of the present disclosure.



FIG. 66 illustrates an example of a visual interface displaying twin graphical visualizations of KPI values along time-based graph lanes for different periods of time, in accordance with one or more implementations of the present disclosure.



FIG. 67 illustrates an example of a visual interface with a user manipulable visual indicator spanning across twin graphical visualizations of KPI values along time-based graph lanes for different periods of time, in accordance with one or more implementations of the present disclosure.



FIG. 68A illustrates an example of a visual interface displaying a graph lane with inventory information for a service or entities reflected by KPI values, in accordance with one or more implementations of the present disclosure.



FIG. 68B illustrates an example of a visual interface displaying an event graph lane with event information in an additional lane, in accordance with one or more implementations of the present disclosure.



FIG. 69 illustrates an example of a visual interface displaying a graph lane with notable events occurring during a timer period covered by graphical visualization of KPI values, in accordance with one or more implementations of the present disclosure.



FIG. 70 illustrates an example of a visual interface displaying a graph lane with notable events occurring during a timer period covered by graphical visualization of KPI values, in accordance with one or more implementations of the present disclosure.



FIG. 70A is a flow diagram of an implementation of a method addressing the production and use of KPI entity breakdown data.



FIGS. 70B-70C illustrate examples of a GUI for editing a graph style of a graphical visualization of KPI-related values along a time-based graph lane in a visual interface, including aspects related to KPI entity breakdown.



FIG. 70D-70F illustrate examples of a visual interface displaying graphical visualizations along time-based graph lanes, including aspects related to KPI entity breakdown.



FIG. 70G-H illustrate GUI examples for graph lane overlay options, including aspects of KPI entity breakdown.



FIG. 70I illustrates an example of a visual interface displaying twin graphical visualizations along time-based graph lanes for different periods of time, including aspects of KPI entity breakdown.



FIG. 70J illustrates an example of a visual interface displaying graphical visualizations along time-based graph lanes including threshold visualization and aspects of KPI entity breakdown.



FIG. 70K is a block diagram illustrating aspects of navigation options in one implementation.



FIG. 71 illustrates an exemplary GUI facilitating the creation of a correlation search based on a displayed set of graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 72A presents a flow diagram of a method for assisting a user in initiating a creation of a new correlation search, in accordance with one or more implementations of the present disclosure.



FIG. 72B presents a flow diagram of a method for creating a new correlations search definition based on a set of displayed graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 72C presents a flow diagram of a method for executing a new correlations search to identify a subsequent occurrence of a pattern of interest in the performance of one or more services, in accordance with one or more implementations of the present disclosure.



FIG. 73A-F illustrate exemplary GUIs for facilitating the creation of a new correlation search to monitor the performance of a web service, an application service and a database service, in accordance with one or more implementations of the present disclosure.



FIG. 74 illustrates an exemplary GUI for receiving identification information and configuration information for a new correlation search, in accordance with one or more implementations of the present disclosure.



FIGS. 75A and 75B illustrates exemplary GUIs providing a correlation search wizard that may be pre-populated with information from the new correlation search definition, in accordance with one or more implementations of the present disclosure.



FIG. 75C illustrates an example of a graphical user interface for a topology navigator that displays multiple services and information related to the services, in accordance with one or more implementations of the present disclosure.



FIG. 75D illustrates an exemplary topology graph component of the topology navigator that includes visual attributes to illustrate the aggregate KPI values (e.g., health scores) of the service nodes, in accordance with one or more implementations of the present disclosure.



FIG. 75E illustrates an exemplary details display component of the topology navigator, in accordance with one or more implementations of the present disclosure.



FIG. 75F illustrates an example of a graphical user interface with a topology navigator and multiple time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 75G presents a flow diagram of an exemplary method for creating and updating a topology navigator, in accordance with one or more implementations of the present disclosure



FIG. 75H presents a flow diagram of another exemplary method for using the topology navigator to investigate abnormal activity of a service and identify a KPI of a dependent service to be added to a list of time-based graph lanes, in accordance with one or more implementations of the present disclosure.



FIG. 75I illustrates an example of a data model in accordance with one or more implementations of the present disclosure.



FIG. 75J presents a flow diagram of an exemplary method for performing a search query in response to detecting a scheduled time for a KPI, in accordance with one or more implementations of the present disclosure.



FIG. 75K presents a flow diagram of an exemplary method for performing a search query in response to detecting a scheduled time for a KPI, in accordance with one or more implementations of the present disclosure.


FIG. 75L1 illustrates a block diagram of a system implementing control modules in one embodiment.


FIG. 75L2 is a diagram of methods and process flow for creation, use, and management of control modules and module packages in one embodiment.


FIG. 75L3 illustrates an example interface display listing control modules of an SMS and enabling navigation requests to further processing options.


FIG. 75L4 depicts a user interface related to control module information in one embodiment.


FIG. 75L5 depicts a user interface related to control module detail information in one embodiment.


FIG. 75L6 illustrates an example interface related to control module detail information options in one embodiment.


FIG. 75L7 illustrates an example interface for adding content to a control module.


FIG. 75L8 illustrates an example interface related to the creation of a control module after certain content has been added.


FIG. 75L9 illustrates packaging of a particular control module in one embodiment.



FIG. 76 presents a block diagram of an event-processing system in accordance with one or more implementations of the present disclosure.



FIG. 77 presents a flowchart illustrating how indexers process, index, and store data received from forwarders in accordance with one or more implementations of the present disclosure.



FIG. 78 presents a flowchart illustrating how a search head and indexers perform a search query in accordance with one or more implementations of the present disclosure.



FIG. 79A presents a block diagram of a system for processing search requests that uses extraction rules for field values in accordance with one or more implementations of the present disclosure.



FIG. 79B illustrates an example data model structure, in accordance with some implementations of the present disclosure.



FIG. 79C illustrates an example definition of a root object of a data model, in accordance with some implementations.



FIG. 79D illustrates example definitions and of child objects, in accordance with some implementations.



FIG. 80 illustrates an exemplary search query received from a client and executed by search peers in accordance with one or more implementations of the present disclosure.



FIG. 81A illustrates a search screen in accordance with one or more implementations of the present disclosure.



FIG. 81B illustrates a data summary dialog that enables a user to select various data sources in accordance with one or more implementations of the present disclosure.



FIG. 82A illustrates a key indicators view in accordance with one or more implementations of the present disclosure.



FIG. 82B illustrates an incident review dashboard in accordance with one or more implementations of the present disclosure.



FIG. 82C illustrates a proactive monitoring tree in accordance with one or more implementations of the present disclosure.



FIG. 82D illustrates a screen displaying both log data and performance data in accordance with one or more implementations of the present disclosure.



FIG. 83 depicts a block diagram of an example computing device operating in accordance with one or more implementations of the present disclosure.





DETAILED DESCRIPTION

Overview


The present disclosure is directed to monitoring performance of a system at a service level using key performance indicators derived from machine data. Implementations of the present disclosure provide users with insight to the performance of monitored services, such as, services pertaining to an information technology (IT) environment. For example, one or more users may wish to monitor the performance of a web hosting service, which provides hosted web content to end users via network.


A service can be provided by one or more entities. An entity that provides a service can be associated with machine data. As described in greater detail below, the machine data pertaining to a particular entity may use different formats and/or different aliases for the entity.


Implementations of the present disclosure are described for normalizing the different aliases and/or formats of machine data pertaining to the same entity. In particular, an entity definition can be created for a respective entity. The entity definition can normalize various machine data pertaining to a particular entity, thus simplifying the use of heterogeneous machine data for monitoring a service.


Implementations of the present disclosure are described for specifying which entities, and thus, which heterogeneous machine data, to use for monitoring a service. In one implementation, a service definition is created for a service that is to be monitored. The service definition specifies one or more entity definitions, where each entity definition corresponds to a respective entity providing the service. The service definition provides users with flexibility in associating entities with services. The service definition further provides users with the ability to define relationships between entities and services at the machine data level. Implementations of the present disclosure enable end-users to monitor services from a top-down perspective and can provide rich visualization to troubleshoot any service-related issues. Implementations of the present disclosure enable end-users to understand an environment (e.g., IT environment) and the services in the environment. For example, end-users can understand and monitor services at a business service level, application tier level, etc.


Implementations of the present disclosure provide users (e.g., business analysts) a tool for dynamically associating entities with a service. One or more entities can provide a service and/or be associated with a service. Implementations of the present disclosure provide a service monitoring system that captures the relationships between entities and services via entity definitions and/or service definitions. IT environments typically undergo changes. For example, new equipment may be added, configurations may change, systems may be upgraded and/or undergo maintenance, etc. The changes that are made to the entities in an IT environment may affect the monitoring of the services in the environment. Implementations of the present disclosure provide a tool that enable users to configure flexible relationships between entities and services to ensure that changes that are made to the entities in the IT environment are accurately captured in the entity definitions and/or service definitions. Implementations of the present disclosure can determine the relationships between the entities and services based on changes that are made to an environment without any user interaction, and can update, also without user interaction, the entity definitions and/or service definitions to reflect any adjustments made to the entities in the environment, as described below in conjunction with FIGS. 17B-17I.


Implementations of the present disclosure provide users (e.g., business analysts) an efficient tool for creating entity definitions in a timely manner. Data that describes an IT environment may exist, for example, for inventory purposes. For example, an inventory system can generate a file that contains information relating to physical machines, virtual machines, application interfaces, processes, etc. in an IT environment. Entity definitions for various components of the IT environment may be created. At times, hundreds of entity definitions are generated and maintained. Implementations of the present disclosure provide a GUI that utilizes existing data (e.g., inventory data) for creating entity definitions to reduce the amount of time and resources needed for creating the entity definitions.


Implementations of the present disclosure provide users (e.g., business analysts) an efficient tool for creating entity definitions in a timely manner. Data that describes an IT environment may be obtained, for example, by executing a search query. A user may run a search query that produces a search result set including information relating to physical machines, virtual machines, application interfaces, users, owners, and/or processes in an IT environment. The information in the search result set may be useful for creating entity definitions. Implementations of the present disclosure provide a GUI that utilizes existing data (e.g., search results sets) for creating entity definitions to reduce the amount of time and resources needed for creating the entity definitions.


In one implementation, one or more entity definitions are created from user input received via an entity definition creation GUI, as described in conjunction with FIGS. 6-10. In another implementation, one or more entity definitions are created from data in a file and user input received via a GUI, as described in conjunction with FIGS. 10B-10P. In yet another implementation, one or more entity definitions are created from data in a search result set and user input received via a GUI, as described in conjunction with FIGS. 10Q-10Z.


Implementations of the present disclosure are described for creating informational fields and including the informational fields to corresponding entity definitions. An informational field is an entity definition component for storing user-defined metadata for a corresponding entity, which includes information about the entity that may not be reliably present in, or may be absent altogether from, the machine data events. Informational fields are described in more detail below with respect to FIGS. 10AA-10AE.


Implementations of the present disclosure are described for performing an automated identification of services, the entities that provide them, and the associations among the discovered entities and services, starting from a corpus of disparate machine data. In one aspect, an implementation automatically performs the processing against the disparate machine data in accordance with discovery parameters to identify the relevant entities and their service associations. In one aspect, entities actually involved in service provision may be identified from a larger set of potential entities, not all of which provide services. In one aspect, the discovered services, entities, and their associations, are reflected in service and entity definition information that controls service monitoring system operation. In one aspect, one or more user interfaces may be implemented to establish discovery parameters, provide previews of results, interject user modifications to automated process results, and report outcomes. Other aspects will become apparent.


Implementations of the present disclosure are described for monitoring a service at a granular level. For example, one or more aspects of a service can be monitored using one or more key performance indicators for the service. A performance indicator or key performance indicator (KPI) is a type of performance measurement. For example, users may wish to monitor the CPU (central processing unit) usage of a web hosting service, the memory usage of the web hosting service, and the request response time for the web hosting service. In one implementation, a separate KPI can be created for each of these aspects of the service that indicates how the corresponding aspect is performing.


Implementations of the present disclosure give users freedom to decide which aspects to monitor for a service and which heterogeneous machine data to use for a particular KPI. In particular, one or more KPIs can be created for a service. Each KPI can be defined by a search query that produces a value derived from the machine data identified in the entity definitions specified in the service definition. Each value can be indicative of how a particular aspect of the service is performing at a point in time or during a period of time. Implementations of the present disclosure enable users to decide what value should be produced by the search query defining the KPI. For example, a user may wish that the request response time be monitored as the average response time over a period of time.


Implementations of the present disclosure are described for customizing various states that a KPI can be in. For example, a user may define a Normal state, a Warning state, and a Critical state for a KPI, and the value produced by the search query of the KPI can indicate the current state of the KPI. In one implementation, one or more thresholds are created for each KPI. Each threshold defines an end of a range of values that represent a particular state of the KPI. A graphical interface can be provided to facilitate user input for creating one or more thresholds for each KPI, naming the states for the KPI, and associating a visual indicator (e.g., color, pattern) to represent a respective state.


Implementations of the present disclosure are described for defining multiple time varying static thresholds using sets of KPI thresholds that correspond to different time frames. For example, a user may define a first set of KPI thresholds to apply during week-days and a different set of KPI thresholds to apply on weekends. Each set of KPI thresholds may include, for example, thresholds that correspond to a Normal state, a Warning state, and a Critical state, however the values of these thresholds may vary across different sets of KPI thresholds depending on the time frame.


Implementations of the present disclosure are described for monitoring a service at a more abstract level, as well. In particular, an aggregate KPI can be configured and calculated for a service to represent the overall health of a service. For example, a service may have 10 KPIs, each monitoring a various aspect of the service. The service may have 7 KPIs in a Normal state, 2 KPIs in a Warning state, and 1 KPI in a Critical state. The aggregate KPI can be a value representative of the overall performance of the service based on the values for the individual KPIs. Implementations of the present disclosure allow individual KPIs of a service to be weighted in terms of how important a particular KPI is to the service relative to the other KPIs in the service, thus giving users control of how to represent the overall performance of a service and control in providing a more accurate representation of the performance of the service. In addition, specific actions can be defined that are to be taken when the aggregate KPI indicating the overall health of a service, for example, exceeds a particular threshold.


Implementations of the present disclosure are described for creating notable events and/or alarms via distribution thresholding. In one implementation, a correlation search is created and used to generate notable event(s) and/or alarm(s). A correlation search can be created to determine the status of a set of KPIs for a service over a defined window of time. A correlation search represents a search query that has a triggering condition and one or more actions that correspond to the trigger condition. Thresholds can be set on the distribution of the state of each individual KPI and if the distribution thresholds are exceeded then an alert/alarm can be generated.


Implementations of the present disclosure are described for monitoring one or more services using a key performance indicator (KPI) correlation search. The performance of a service can be vital to the function of an IT environment. Certain services may be more essential than others. For example, one or more other services may be dependent on a particular service. The performance of the more crucial services may need to be monitored more aggressively. One or more states of one or more KPIs for one or more services can be proactively monitored periodically using a KPI correlation search. A defined action (e.g., creating an alarm, sending a notification, displaying information in an interface, etc.) can be taken on conditions specified by the KPI correlation search. Implementations of the present disclosure provide users (e.g., business analysts) a graphical user interface (GUI) for defining a KPI correlation search. Implementations of the present disclosure provide visualizations of current KPI state performance that can be used for specifying search information and information for a trigger determination for a KPI correlation search.


Implementations of the present disclosure are described for providing a GUI that presents notable events pertaining to one or more KPIs of one or more services. Such a notable event can be generated by a correlation search associated with a particular service. A correlation search associated with a service can include a search query, a triggering determination or triggering condition, and one or more actions to be performed based on the triggering determination (a determination as to whether the triggering condition is satisfied). In particular, a search query may include search criteria pertaining to one or more KIPs of the service, and may produce data using the search criteria. For example, a search query may produce KPI data for each occurrence of a KPI reaching a certain threshold over a specified period of time. A triggering condition can be applied to the data produced by the search query to determine whether the produced data satisfies the triggering condition. Using the above example, the triggering condition can be applied to the produced KPI data to determine whether the number of occurrences of a KPI reaching a certain threshold over a specified period of time exceeds a value in the triggering condition. If the produced data satisfies the triggering condition, a particular action can be performed. Specifically, if the data produced by the search query satisfies the triggering condition, a notable event can be generated. Additional details with respect to this “Incident Review” interface are provided below with respect to FIGS. 34O-34T.


Implementations of the present disclosure are described for providing a service-monitoring dashboard that displays one or more KPI widgets. Each KPI widget can provide a numerical or graphical representation of one or more values for a corresponding KPI or service health score (aggregate KPI for a service) indicating how a service or an aspect of a service is performing at one or more points in time. Users can be provided with the ability to design and draw the service-monitoring dashboard and to customize each of the KPI widgets. A dashboard-creation graphical interface can be provided to define a service-monitoring dashboard based on user input allowing different users to each create a customized service-monitoring dashboard. Users can select an image for the service-monitoring dashboard (e.g., image for the background of a service-monitoring dashboard, image for an entity and/or service for service-monitoring dashboard), draw a flow chart or a representation of an environment (e.g., IT environment), specify which KPIs to include in the service-monitoring dashboard, configure a KPI widget for each specified KPI, and add one or more ad hoc KPI searches to the service-monitoring dashboard. Implementations of the present disclosure provide users with service monitoring information that can be continuously and/or periodically updated. Each service-monitoring dashboard can provide a service-level perspective of how one or more services are performing to help users make operating decisions and/or further evaluate the performance of one or more services.


Implementations of the present disclosure are described for providing service swapping for a service-monitoring dashboard template to produce dashboard variants. In one embodiment, a new or existing dashboard template may be enabled for service swapping. A user may identify one or more services eligible to be swapped for a service associated with the base dashboard template. Comparable KPI's of a service to be swapped in are automatically identified for each KPI of the base service that provides data to dashboard elements (e.g., dashboard widgets). A variant dashboard template is actually or virtually created that produces a dashboard display with the same general layout and appearance as the base dashboard but reflecting the KPIs of a different service. The variant dashboard templates may be created dynamically, used transiently, or persisted as command/control/configuration information that determines the operation of the service monitoring system. In an embodiment, the implementation of dashboard swapping can reduce the storage burden associated with having multiple, largely duplicative dashboard definitions and other computing resource burdens associated therewith.


Implementations are described for a visual interface that displays time-based graphical visualizations that each corresponds to a different KPI reflecting how a service provided by one or more entities is performing. This visual interface may be referred to as a “deep dive.” As described herein, machine data pertaining to one or more entities that provide a given service can be presented and viewed in a number of ways. The deep dive visual interface allows an in-depth look at KPI data that reflects how a service or entity is performing over a certain period of time. By having multiple graphical visualizations, each representing a different service or a different aspect of the same service, the deep dive visual interface allows a user to visually correlate the respective KPIs over a defined period of time. In one implementation, the graphical visualizations are all calibrated to the same time scale, so that the values of different KPIs can be compared at any given point in time. In one implementation, the graphical visualizations are all calibrated to different time scales. Although each graphical visualization is displayed in the same visual interface, one or more of the graphical visualizations may have a different time scale than the other graphical visualizations. The different time scale may be more appropriate for the underlying KPI data associated with the one or more graphical visualizations. In one implementation, the graphical visualizations are displayed in parallel lanes, which simplifies visual correlation and allows a user to relate the performance of one service or one aspect of the service (as represented by the KPI values) to the performance of one or more additional services or one or more additional aspects of the same service.


Implementations are described for a visual interface that enables a user to create a new correlation search based on a set of displayed graph lanes. The set of graph lanes may assist a user in identifying a situation (e.g., problem or a pattern of interest) in the performance of one or more services by providing graphical visualizations that illustrate the performance of the one or more services. Once the user has identified the situation, the user may submit a request to create a new correlation search that can result in detecting a re-occurrence of the identified problem. The new correlation search may include a definition that is derived from the set of graph lanes. For example, the definition of the new correlation search may include an aggregate triggering condition with KPI criteria determined by iterating through the multiple graph lanes. As the system iterates through the multiple graph lanes, it may analyze the fluctuations in a corresponding KPI, such as for example, fluctuations in the state of the KPI or fluctuations of the values of the KPI to determine a KPI criterion associated with the corresponding KPI. For example, the fluctuation analysis may result in determining that a CPU utilization KPI was in a critical state for 25% of a four hour time period, and this determined condition may be included in the KPI criterion for the CPU utilization KPI. After creating the new correlation's search definition, the system may run the correlation search to monitor the services and when the correlation search identifies a re-occurrence of the problem, the correlation search may generate a notable event or alarm to notify the user who created the correlation search or some other users.


Implementations of the present disclosure are described for methods for the automatic creation of entity definitions in a service monitoring system. Machine data by or about an entity machine is received and made available before an entity definition exists for the machine. An identification criteria may be used to identify the entity machine from the machine data as a newly added machine for which an entity definition should be created. Information to populate an entity definition is then harvested from that and other machine data, and the new entity definition is stored. The entity definition is then available for general use and may be automatically associated with a service using an association rule of the service definition. Portions of the method may be performed automatically on a regular basis. Embodiments may perform the method in conjunction with content from a domain add-on that extends the features and capabilities of the service monitoring system with the addition of a form of codified expertise in a particular domain or field, such as load-balancing or high-volume web transaction processing, as particularly applied to related IT service monitoring. The method may be extended, modified, or adapted as necessary to implement automatic modification and/or deletion of entity definitions, the need for which is determined through machine data analysis.


Implementations of the present disclosure are described for methods for the production and utilization of KPI data on a per-entity basis beyond state determination with thresholds. A per-entity breakdown of KPI data may produce a set of per-entity time series for the KPI. Processing can transform the set into corresponding time series for one or more statistical metrics about the per-entity data. Visualization of the statistical metric time series data as a distribution flow graph provides an analyst with an unprecedented macro-level view for the KPI to facilitate system monitoring, incident prevention, and problem determination. Visualizations may optionally include a selected amount of per-entity detail as well as KPI threshold/state visualization. The visualization may operate with configurable navigation options that are context sensitive as well as able to carry context forward to a navigated destination.


Implementations of the present disclosure are described for methods for addressing adaptations of service monitoring during periods of maintenance downtime in the monitored system, or other instances where non-normal data is expected. User interfaces enable a user to create and maintain system control information that directs the recognition of maintenance periods so that tainted data may be prevented and/or identified with a maintenance state. Recognition of the maintenance state can further lead to adaptation of monitoring system reporting to, for example, suppress unhelpful alerts or surface warnings about tainted measurements.


Implementations of the present disclosure are described for methods of automatically identifying and grouping events, such as notable events, based on criteria as may be user-specified, and to automatically perform actions, possibly against the group and/or its members upon detection of a satisfied precondition, which action and precondition may also be user-specified, in an embodiment. Additionally, the multiple members of a group may be collectively represented under the singular rubric of the group for a variety of service monitoring functions, such as control console and reporting functions.


Implementations of the present disclosure are described for methods enabling the creation, management, and use of control modules. Information in the command/configuration/control (CCC) data of a service monitoring system (SMS) that is used to direct the operation of the SMS may be selectively encapsulated into one or more control modules. The creation and use of the control modules leverages the CCC data in a system and can thereby reduce the computing resources that would otherwise be required to effect operational control over the SMS. Control modules may be represented in the form of portable control module packages that may reside external to the CCC data store or the SMS, and be useful for conveyance to other systems or for backup or archiving.



FIG. 1 illustrates a block diagram of an example service provided by entities, in accordance with one or more implementations of the present disclosure. One or more entities 104A, 104B provide service 102. An entity 104A, 104B can be a component in an IT environment. Examples of an entity can include, and are not limited to a host machine, a virtual machine, a switch, a firewall, a router, a sensor, etc. For example, the service 102 may be a web hosting service, and the entities 104A, 104B may be web servers running on one or more host machines to provide the web hosting service. In another example, an entity could represent a single process on different (physical or virtual) machines. In another example, an entity could represent communication between two different machines.


The service 102 can be monitored using one or more KPIs 106 for the service. A KPI is a type of performance measurement. One or more KPIs can be defined for a service. In the illustrated example, three KPIs 106A-C are defined for service 102. KPI 106A may be a measurement of CPU (central processing unit) usage for the service 102. KPI 106B may be a measurement of memory usage for the service 102. KPI 106C may be a measurement of request response time for the service 102.


In one implementation, KPI 106A-C is derived based on machine data pertaining to entities 104A and 104B that provide the service 102 that is associated with the KPI 106A-C. In another implementation, KPI 106A-C is derived based on machine data pertaining to entities other than and/or in addition to entities 104A and 104B. In another implementation, input (e.g., user input) may be received that defines a custom query, which does not use entity filtering, and is treated as a KPI. Machine data pertaining to a specific entity can be machine data produced by that entity or machine data about that entity, which is produced by another entity. For example, machine data pertaining to entity 104A can be derived from different sources that may be hosted by entity 104A and/or some other entity or entities.


A source of machine data can include, for example, a software application, a module, an operating system, a script, an application programming interface, etc. For example, machine data 110B may be log data that is produced by the operating system of entity 104A. In another example, machine data 110C may be produced by a script that is executing on entity 104A. In yet another example, machine data 110A may be about an entity 104A and produced by a software application 120A that is hosted by another entity to monitor the performance of the entity 104A through an application programming interface (API).


For example, entity 104A may be a virtual machine and software application 120A may be executing outside of the virtual machine (e.g., on a hypervisor or a host operating system) to monitor the performance of the virtual machine via an API. The API can generate network packet data including performance measurements for the virtual machine, such as, memory utilization, CPU usage, etc.


Similarly, machine data pertaining to entity 104B may include, for example, machine data 110D, such as log data produced by the operating system of entity 104B, and machine data 110E, such as network packets including http responses generated by a web server hosted by entity 104B.


Implementations of the present disclosure provide for an association between an entity (e.g., a physical machine) and machine data pertaining to that entity (e.g., machine data produced by different sources hosted by the entity or machine data about the entity that may be produced by sources hosted by some other entity or entities). The association may be provided via an entity definition that identifies machine data from different sources and links the identified machine data with the actual entity to which the machine data pertains, as will be discussed in more detail below in conjunction with FIG. 3 and FIGS. 6-10. Entities that are part of a particular service can be further grouped via a service definition that specifies entity definitions of the entities providing the service, as will be discussed in more detail below in conjunction with FIGS. 11-31.


In the illustrated example, an entity definition for entity 104A can associate machine data 110A, 110B and 110C with entity 104A, an entity definition for entity 104B can associate machine data 110D and 110E with entity 104B, and a service definition for service 102 can group entities 104A and 104B together, thereby defining a pool of machine data that can be operated on to produce KPIs 106A, 106B and 106C for the service 102. In particular, each KPI 106A, 106B, 106C of the service 102 can be defined by a search query that produces a value 108A, 108B, 108C derived from the machine data 110A-E. As will be discussed in more detail below, according to one implementation, the machine data 110A-E is identified in entity definitions of entities 104A and 104B, and the entity definitions are specified in a service definition of service 102 for which values 108A-C are produced to indicate how the service 102 is performing at a point in time or during a period of time. For example, KPI 106A can be defined by a search query that produces value 108A indicating how the service 102 is performing with respect to CPU usage. KPI 106B can be defined by a different search query that produces value 108B indicating how the service 102 is performing with respect to memory usage. KPI 106C can be defined by yet another search query that produces value 108C indicating how the service 102 is performing with respect to request response time.


The values 108A-C for the KPIs can be produced by executing the search query of the respective KPI. In one example, the search query defining a KPI 106A-C can be executed upon receiving a request (e.g., user request). For example, a service-monitoring dashboard, which is described in greater detail below in conjunction with FIG. 35, can display KPI widgets providing a numerical or graphical representation of the value 108 for a respective KPI 106. A user may request the service-monitoring dashboard to be displayed at a point in time, and the search queries for the KPIs 106 can be executed in response to the request to produce the value 108 for the respective KPI 106. The produced values 108 can be displayed in the service-monitoring dashboard.


In another example, the search query defining a KPI 106A-C can be executed in real-time (continuous execution until interrupted). For example, a user may request the service-monitoring dashboard to be displayed, and the search queries for the KPIs 106 can be executed in response to the request to produce the value 108 for the respective KPI 106. The produced values 108 can be displayed in the service-monitoring dashboard. The search queries for the KPIs 106 can be continuously executed until interrupted and the values for the search queries can be refreshed in the service-monitoring dashboard with each execution. Examples of interruption can include changing graphical interfaces, stopping execution of a program, etc.


In another example, the search query defining a KPI 106 can be executed based on a schedule. For example, the search query for a KPI (e.g., KPI 106A) can be executed at one or more particular times (e.g., 6:00 am, 12:00 pm, 6:00 pm, etc.) and/or based on a period of time (e.g., every 5 minutes). In one example, the values (e.g., values 108A) produced by a search query for a KPI (e.g., KPI 106A) by executing the search query on a schedule are stored in a data store, and are used to calculate an aggregate KPI score for a service (e.g., service 102), as described in greater detail below in conjunction with FIGS. 32-33. An aggregate KPI score for the service 102 is indicative of an overall performance of the KPIs 106 of the service.


In one implementation, the machine data (e.g., machine data 110A-E) used by a search query defining a KPI (e.g., KPI 106A) to produce a value can be based on a time range. The time range can be a user-defined time range or a default time range. For example, in the service-monitoring dashboard example above, a user can select, via the service-monitoring dashboard, a time range to use to further specify, for example, based on time-stamps, which machine data should be used by a search query defining a KPI. For example, the time range can be defined as “Last 15 minutes,” which would represent an aggregation period for producing the value. In other words, if the query is executed periodically (e.g., every 5 minutes), the value resulting from each execution can be based on the last 15 minutes on a rolling basis, and the value resulting from each execution can be, for example, the maximum value during a corresponding 15-minute time range, the minimum value during the corresponding 15-minute time range, an average value for the corresponding 15-minute time range, etc.


In another implementation, the time range is a selected (e.g., user-selected) point in time and the definition of an individual KPI can specify the aggregation period for the respective KPI. By including the aggregation period for an individual KPI as part of the definition of the respective KPI, multiple KPIs can run on different aggregation periods, which can more accurately represent certain types of aggregations, such as, distinct counts and sums, improving the utility of defined thresholds. In this manner, the value of each KPI can be displayed at a given point in time. In one example, a user may also select “real time” as the point in time to produce the most up to date value for each KPI using its respective individually defined aggregation period.


An event-processing system can process a search query that defines a KPI of a service. An event-processing system can aggregate heterogeneous machine-generated data (machine data) received from various sources (e.g., servers, databases, applications, networks, etc.) and optionally provide filtering such that data is only represented where it pertains to the entities providing the service. In one example, a KPI may be defined by a user-defined custom query that does not use entity filtering. The aggregated machine data can be processed and represented as events. An event can be represented by a data structure that is associated with a certain point in time and comprises a portion of raw machine data (i.e., machine data). Events are described in greater detail below in conjunction with FIG. 72. The event-processing system can be configured to perform real-time indexing of the machine data and to execute real-time, scheduled, or historic searches on the source data. An exemplary event-processing system is described in greater detail below in conjunction with FIG. 71.


Example Service Monitoring System



FIG. 2 is a block diagram 200 of one implementation of a service monitoring system 210 for monitoring performance of one or more services using key performance indicators derived from machine data, in accordance with one or more implementations of the present disclosure. The service monitoring system 210 can be hosted by one or more computing machines and can include components for monitoring performance of one or more services. The components can include, for example, an entity module 220, a service module 230, a key performance indicator module 240, a user interface (UI) module 250, a dashboard module 260, a deep dive module 270, and a home page module 280. The components can be combined together or separated in further components, according to a particular embodiment. The components and/or combinations of components can be hosted on a single computing machine and/or multiple computing machines. The components and/or combinations of components can be hosted on one or more client computing machines and/or server computing machines.


The entity module 220 can create entity definitions. “Create” hereinafter includes “edit” throughout this document. An entity definition is a data structure that associates an entity (e.g., entity 104A in FIG. 1) with machine data (e.g., machine data 110A-C in FIG. 1). The entity module 220 can determine associations between machine data and entities, and can create an entity definition that associates an individual entity with machine data produced by different sources hosted by that entity and/or other entity(ies). In one implementation, the entity module 220 automatically identifies the entities in an environment (e.g., IT environment), automatically determines, for each entity, which machine data is associated with that particular entity, and automatically generates an entity definition for each entity. In another implementation, the entity module 220 receives input (e.g., user input) for creating an entity definition for an entity, as will be discussed in greater detail below in conjunction with FIGS. 5-10.



FIG. 3 is a block diagram 300 illustrating an entity definition for an entity, in accordance with one or more implementations of the present disclosure. The entity module 220 can create entity definition 350 that associates an entity 304 with machine data (e.g., machine data 310A, machine data 310B, machine data 310C) pertaining to that entity 304. Machine data that pertains to a particular entity can be produced by different sources 315 and may be produced in different data formats 330. For example, the entity 304 may be a host machine that is executing a server application 334 that produces machine data 310B (e.g., log data). The entity 304 may also host a script 336, which when executed, produces machine data 310C. A software application 330, which is hosted by a different entity (not shown), can monitor the entity 304 and use an API 333 to produce machine data 310A about the entity 304.


Each of the machine data 310A-C can include an alias that references the entity 304. At least some of the aliases for the particular entity 304 may be different from each other. For example, the alias for entity 304 in machine data 310A may be an identifier (ID) number 315, the alias for entity 304 in machine data 310B may be a hostname 317, and the alias for entity 304 in machine data 310C may be an IP (internet protocol) address 319.


The entity module 220 can receive input for an identifying name 360 for the entity 304 and can include the identifying name 360 in the entity definition 350. The identifying name 360 can be defined from input (e.g., user input). For example, the entity 304 may be a web server and the entity module 220 may receive input specifying webserver01.splunk.com as the identifying name 360. The identifying name 360 can be used to normalize the different aliases of the entity 304 from the machine data 310A-C to a single identifier.


A KPI, for example, for monitoring CPU usage for a service provided by the entity 304, can be defined by a search query directed to search machine data 310A-C based a service definition, which is described in greater detail below in conjunction with FIG. 4, associating the entity definition 350 with the KPI, the entity definition 350 associating the entity 304 with the identifying name 360, and associating the identifying name 360 (e.g., webserver01.splunk.com) with the various aliases (e.g., ID number 315, hostname 317, and IP address 319).


Referring to FIG. 2, the service module 230 can create service definitions for services. A service definition is a data structure that associates one or more entities with a service. The service module 230 can receive input (e.g., user input) of a title and/or description for a service definition. FIG. 4 is a block diagram illustrating a service definition that associates one or more entities with a service, in accordance with one or more implementations of the present disclosure. In another implementation, a service definition specifies one or more other services which a service depends upon and does not associate any entities with the service, as described in greater detail below in conjunction with FIG. 18. In another implementation, a service definition specifies a service as a collection of one or more other services and one or more entities.


In one example, a service 402 is provided by one or more entities 404A-N. For example, entities 404A-N may be web servers that provide the service 402 (e.g., web hosting service). In another example, a service 402 may be a database service that provides database data to other services (e.g., analytical services). The entities 404A-N, which provides the database service, may be database servers.


The service module 230 can include an entity definition 450A-450N, for a corresponding entity 404A-N that provides the service 402, in the service definition 460 for the service 402. The service module 230 can receive input (e.g., user input) identifying one or more entity definitions to include in a service definition.


The service module 230 can include dependencies 470 in the service definition 460. The dependencies 470 indicate one or more other services for which the service 402 is dependent upon. For example, another set of entities (e.g., host machines) may define a testing environment that provides a sandbox service for isolating and testing untested programming code changes. In another example, a specific set of entities (e.g., host machines) may define a revision control system that provides a revision control service to a development organization. In yet another example, a set of entities (e.g., switches, firewall systems, and routers) may define a network that provides a networking service. The sandbox service can depend on the revision control service and the networking service. The revision control service can depend on the networking service. If the service 402 is the sandbox service and the service definition 460 is for the sandbox service 402, the dependencies 470 can include the revision control service and the networking service. The service module 230 can receive input specifying the other service(s) for which the service 402 is dependent on and can include the dependencies 470 between the services in the service definition 460. In one implementation, the service associated defined by the service definition 460 may be designated as a dependency for another service, and the service definition 460 can include information indicating the other services which depend on the service described by the service definition 460.


Referring to FIG. 2, the KPI module 240 can create one or more KPIs for a service and include the KPIs in the service definition. For example, in FIG. 4, various aspects (e.g., CPU usage, memory usage, response time, etc.) of the service 402 can be monitored using respective KPIs. The KPI module 240 can receive input (e.g., user input) defining a KPI for each aspect of the service 402 to be monitored and include the KPIs (e.g., KPIs 406A-406N) in the service definition 460 for the service 402. Each KPI can be defined by a search query that can produce a value. For example, the KPI 406A can be defined by a search query that produces value 408A, and the KPI 406N can be defined by a search query that produces value 408N.


The KPI module 240 can receive input specifying the search processing language for the search query defining the KPI. The input can include a search string defining the search query and/or selection of a data model to define the search query. Data models are described in greater detail below in conjunction with FIGS. 74B-D. The search query can produce, for a corresponding KPI, value 408A-N derived from machine data that is identified in the entity definitions 450A-N that are identified in the service definition 460.


The KPI module 240 can receive input to define one or more thresholds for one or more KPIs. For example, the KPI module 240 can receive input defining one or more thresholds 410A for KPI 406A and input defining one or more thresholds 410N for KPI 406N. Each threshold defines an end of a range of values representing a certain state for the KPI. Multiple states can be defined for the KPI (e.g., unknown state, trivial state, informational state, normal state, warning state, error state, and critical state), and the current state of the KPI depends on which range the value, which is produced by the search query defining the KPI, falls into. The KPI module 240 can include the threshold definition(s) in the KPI definitions. The service module 230 can include the defined KPIs in the service definition for the service.


The KPI module 240 can calculate an aggregate KPI score 480 for the service for continuous monitoring of the service. The score 480 can be a calculated value 482 for the aggregate of the KPIs for the service to indicate an overall performance of the service. For example, if the service has 10 KPIs and if the values produced by the search queries for 9 of the 10 KPIs indicate that the corresponding KPI is in a normal state, then the value 482 for an aggregate KPI may indicate that the overall performance of the service is satisfactory. Some implementations of calculating a value for an aggregate KPI for the service are discussed in greater detail below in conjunction with FIGS. 32-33.


Referring to FIG. 2, the service monitoring system 210 can be coupled to one or more data stores 290. The entity definitions, the service definitions, and the KPI definitions can be stored in the data store(s) 290 that are coupled to the service monitoring system 210. The entity definitions, the service definitions, and the KPI definitions can be stored in a data store 290 in a key-value store, a configuration file, a lookup file, a database, or in metadata fields associated with events representing the machine data. A data store 290 can be a persistent storage that is capable of storing data. A persistent storage can be a local storage unit or a remote storage unit. Persistent storage can be a magnetic storage unit, optical storage unit, solid state storage unit, electronic storage units (main memory), or similar storage unit. Persistent storage can be a monolithic device or a distributed set of devices. A ‘set’, as used herein, refers to any positive whole number of items.


The user interface (UI) module 250 can generate graphical interfaces for creating and/or editing entity definitions for entities, creating and/or editing service definitions for services, defining key performance indicators (KPIs) for services, setting thresholds for the KPIs, and defining aggregate KPI scores for services. The graphical interfaces can be user interfaces and/or graphical user interfaces (GUIs).


The UI module 250 can cause the display of the graphical interfaces and can receive input via the graphical interfaces. The entity module 220, service module 230, KPI module 240, dashboard module 260, deep dive module 270, and home page module 280 can receive input via the graphical interfaces generated by the UI module 250. The entity module 220, service module 230, KPI module 240, dashboard module 260, deep dive module 270, and home page module 280 can provide data to be displayed in the graphical interfaces to the UI module 250, and the UI module 250 can cause the display of the data in the graphical interfaces.


The dashboard module 260 can create a service-monitoring dashboard. In one implementation, dashboard module 260 works in connection with UI module 250 to present a dashboard-creation graphical interface that includes a modifiable dashboard template, an interface containing drawing tools to customize a service-monitoring dashboard to define flow charts, text and connections between different elements on the service-monitoring dashboard, a KPI-selection interface and/or service selection interface, and a configuration interface for creating service-monitoring dashboard. The service-monitoring dashboard displays one or more KPI widgets. Each KPI widget can provide a numerical or graphical representation of one or more values for a corresponding KPI indicating how an aspect of a service is performing at one or more points in time. Dashboard module 260 can work in connection with UI module 250 to define the service-monitoring dashboard in response to user input, and to cause display of the service-monitoring dashboard including the one or more KPI widgets. The input can be used to customize the service-monitoring dashboard. The input can include for example, selection of one or more images for the service-monitoring dashboard (e.g., a background image for the service-monitoring dashboard, an image to represent an entity and/or service), creation and representation of adhoc search in the form of KPI widgets, selection of one or more KPIs to represent in the service-monitoring dashboard, selection of a KPI widget for each selected KPI. The input can be stored in the one or more data stores 290 that are coupled to the dashboard module 260. In other implementations, some other software or hardware module may perform the actions associated with generating and displaying the service-monitoring dashboard, although the general functionality and features of the service-monitoring dashboard should remain as described herein. Some implementations of creating the service-monitoring dashboard and causing display of the service-monitoring dashboard are discussed in greater detail below in conjunction with FIGS. 35-47.


In one implementation, deep dive module 270 works in connection with UI module 250 to present a wizard for creation and editing of the deep dive visual interface, to generate the deep dive visual interface in response to user input, and to cause display of the deep dive visual interface including the one or more graphical visualizations. The input can be stored in the one or more data stores 290 that are coupled to the deep dive module 270. In other implementations, some other software or hardware module may perform the actions associated with generating and displaying the deep dive visual interface, although the general functionality and features of deep dive should remain as described herein. Some implementations of creating the deep dive visual interface and causing display of the deep dive visual interface are discussed in greater detail below in conjunction with FIGS. 49-70.


The home page module 280 can create a home page graphical interface. The home page graphical interface can include one or more tiles, where each tile represents a service-related alarm, service-monitoring dashboard, a deep dive visual interface, or the value of a particular KPI. In one implementation home page module 280 works in connection with UI module 250. The UI module 250 can cause the display of the home page graphical interface. The home page module 280 can receive input (e.g., user input) to request a service-monitoring dashboard or a deep dive to be displayed. The input can include for example, selection of a tile representing a service-monitoring dashboard or a deep dive. In other implementations, some other software or hardware module may perform the actions associated with generating and displaying the home page graphical interface, although the general functionality and features of the home page graphical interface should remain as described herein. An example home page graphical interface is discussed in greater detail below in conjunction with FIG. 48.


Referring to FIG. 2, the service monitoring system 210 can be coupled to an event processing system 205 via one or more networks. The event processing system 205 can receive a request from the service monitoring system 210 to process a search query. For example, the dashboard module 260 may receive input request to display a service-monitoring dashboard with one or more KPI widgets. The dashboard module 260 can request the event processing system 205 to process a search query for each KPI represented by a KPI widget in the service-monitoring dashboard. Some implementations of an event processing system 205 are discussed in greater detail below in conjunction with FIG. 71.


The one or more networks can include one or more public networks (e.g., the Internet), one or more private networks (e.g., a local area network (LAN) or one or more wide area networks (WAN)), one or more wired networks (e.g., Ethernet network), one or more wireless networks (e.g., an 802.11 network or a Wi-Fi network), one or more cellular networks (e.g., a Long Term Evolution (LTE) network), routers, hubs, switches, server computers, and/or a combination thereof.


Key Performance Indicators



FIG. 5 is a flow diagram of an implementation of a method 500 for creating one or more key performance indicators for a service, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 502, the computing machine creates one or more entity definitions, each for a corresponding entity. Each entity definition associates an entity with machine data that pertains to that entity. As described above, various machine data may be associated with a particular entity, but may use different aliases for identifying the same entity. The entity definition for an entity normalizes the different aliases of that entity. In one implementation, the computing machine receives input for creating the entity definition. The input can be user input. Some implementations of creating an entity definition for an entity from input received via a graphical user interface are discussed in greater detail below in conjunction with FIGS. 6-10.


In another implementation, the computing machine imports a data file (e.g., CSV (comma-separated values) data file) that includes information identifying entities in an environment and uses the data file to automatically create entity definitions for the entities described in the data file. The data file may be stored in a data store (e.g., data store 290 in FIG. 2) that is coupled to the computing machine.


In another implementation, the computing machine automatically (without any user input) identifies one or more aliases for an entity in machine data, and automatically creates an entity definition in response to automatically identifying the aliases of the entity in the machine data. For example, the computing machine can execute a search query from a saved search to extract data to identify an alias for an entity in machine data from one or more sources, and automatically create an entity definition for the entity based on the identified aliases. Some implementations of creating an entity definition from importing a data file and/or from a saved search are discussed in greater detail below in conjunction with FIG. 16.


At block 504, the computing machine creates a service definition for a service using the entity definitions of the one or more entities that provide the service, according to one implementation. A service definition can relate one or more entities to a service. For example, the service definition can include an entity definition for each of the entities that provide the service. In one implementation, the computing machine receives input (e.g., user input) for creating the service definition. Some implementations of creating a service definition from input received via a graphical interface are discussed in more detail below in conjunction with FIGS. 11-18. In one implementation, the computing machine automatically creates a service definition for a service. In another example, a service may not directly be provided by one or more entities, and the service definition for the service may not directly relate one or more entities to the service. For example, a service definition for a service may not contain any entity definitions and may contain information indicating that the service is dependent on one or more other services. A service that is dependent on one or more other services is described in greater detail below in conjunction with FIG. 18. For example, a business service may not be directly provided by one or more entities and may be dependent on one or more other services. For example, an online store service may depend on an e-commerce service provided by an e-commerce system, a database service, and a network service. The online store service can be monitored via the entities of the other services (e.g., e-commerce service, database service, and network service) upon which the service depends on.


At block 506, the computing machine creates one or more key performance indicators (KPIs) corresponding to one or more aspects of the service. An aspect of a service may refer to a certain characteristic of the service that can be measured at various points in time during the operation of the service. For example, aspects of a web hosting service may include request response time, CPU usage, and memory usage. Each KPI for the service can be defined by a search query that produces a value derived from the machine data that is identified in the entity definitions included in the service definition for the service. Each value is indicative of how an aspect of the service is performing at a point in time or during a period of time. In one implementation, the computing machine receives input (e.g., user input) for creating the KPI(s) for the service. Some implementations of creating KPI(s) for a service from input received via a graphical interface will be discussed in greater detail below in conjunction with FIGS. 19-31. In one implementation, the computing machine automatically creates one or more key performance indicators (KPIs) corresponding to one or more aspects of the service.



FIG. 6 is a flow diagram of an implementation of a method 600 for creating an entity definition for an entity, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 602, the computing machine receives input of an identifying name for referencing the entity definition for an entity. The input can be user input. The user input can be received via a graphical interface. Some implementations of creating an entity definition via input received from a graphical interface are discussed in greater detail below in conjunction with FIGS. 7-10. The identifying name can be a unique name.


At block 604, the computing machine receives input (e.g., user input) specifying one or more search fields (“fields”) representing the entity in machine data from different sources, to be used to normalize different aliases of the entity. Machine data can be represented as events. As described above, the computing machine can be coupled to an event processing system (e.g., event processing system 205 in FIG. 2). The event processing system can process machine data to represent the machine data as events. Each of the events is raw data, and when a late-binding schema is applied to the events, values for fields defined by the schema are extracted from the events. A number of “default fields” that specify metadata about the events rather than data in the events themselves can be created automatically. For example, such default fields can specify: a timestamp for the event data; a host from which the event data originated; a source of the event data; and a source type for the event data. These default fields may be determined automatically when the events are created, indexed or stored. Each event has metadata associated with the respective event. Implementations of the event processing system processing the machine data to be represented as events are discussed in greater detail below in conjunction with FIG. 71.


At block 606, the computing machine receives input (e.g., user input) specifying one or more search values (“values”) for the fields to establish associations between the entity and machine data. The values can be used to search for the events that have matching values for the above fields. The entity can be associated with the machine data that is represented by the events that have fields that store values that match the received input.


The computing machine can optionally also receive input (e.g., user input) specifying a type of entity to which the entity definition applies. The computing machine can optionally also receive input (e.g., user input) associating the entity of the entity definition with one or more services. Some implementations of receiving input for an entity type for an entity definition and associating the entity with one or more services are discussed in greater detail below in conjunction with FIGS. 9A-B.



FIG. 7 illustrates an example of a GUI 700 of a service monitoring system for creating and/or editing entity definition(s) and/or service definition(s), in accordance with one or more implementations of the present disclosure. One or more GUIs of the service monitoring system can include GUI elements to receive input and to display data. The GUI elements can include, for example, and are not limited to, a text box, a button, a link, a selection button, a drop down menu, a sliding bar, a selection button, an input field, etc. In one implementation, GUI 700 includes a menu item, such as Configure 702, to facilitate the creation of entity definitions and service definitions.


Upon the selection of the Configure 702 menu item, a drop-down menu 704 listing configuration options can be displayed. If the user selects the entities option 706 from the drop-down menu 704, a GUI for creating an entity definition can be displayed, as discussed in more detail below in conjunction with FIG. 8. If the user selects the services option 708 from the drop-down menu 704, a GUI for creating a service definition can be displayed, as discussed in more detail below in conjunction with FIG. 11.



FIG. 8 illustrates an example of a GUI 800 of a service monitoring system for creating and/or editing entity definitions, in accordance with one or more implementations of the present disclosure. GUI 800 can display a list 802 of entity definitions that have already been created. Each entity definition in the list 802 can include a button 804 for requesting a drop-down menu 810 listing editing options to edit the corresponding entity definition. Editing can include editing the entity definition and/or deleting the entity definition. When an editing option is selected from the drop-down menu 810, one or more additional GUIs can be displayed for editing the entity definition. GUI 800 can include an import button 806 for importing a data file (e.g., CSV file) for auto-discovery of entities and automatic generation of entity definitions for the discovered entities. The data file can include a list of entities that exist in an environment (e.g., IT environment). The service monitoring system can use the data file to automatically create an entity definition for an entity in the list. In one implementation, the service monitoring system uses the data file to automatically create an entity definition for each entity in the list. GUI 800 can include a button 808 that a user can activate to proceed to the creation of an entity definition, which leads to GUI 900 of FIG. 9A. The automatic generation of entity definitions for entities is described in greater detail below in conjunction with FIG. 16.



FIG. 9A illustrates an example of a GUI 900 of a service monitoring system for creating an entity definition, in accordance with one or more implementations of the present disclosure. GUI 900 can facilitate user input specifying an identifying name 904 for the entity, an entity type 906 for the entity, field(s) 908 and value(s) 910 for the fields 908 to use during the search to find events pertaining to the entity, and any services 912 that the entity provides. The entity type 906 can describe the particular entity. For example, the entity may be a host machine that is executing a webserver application that produces machine data. FIG. 9B illustrates an example of input received via GUI 900 for creating an entity definition, in accordance with one or more implementations of the present disclosure.


For example, the identifying name 904 is webserver01.splunk.com and the entity type 906 is web server. Examples of entity type can include, and are not limited to, host machine, virtual machine, type of server (e.g., web server, email server, database server, etc.) switch, firewall, router, sensor, etc. The fields 908 that are part of the entity definition can be used to normalize the various aliases for the entity. For example, the entity definition specifies three fields 920,922,924 and four values 910 (e.g., values 930,932,934,936) to associate the entity with the events that include any of the four values in any of the three fields.


For example, the event processing system (e.g., event processing system 205 in FIG. 2) can apply a late-binding schema to the events to extract values for fields (e.g., host field, ip field, and dest field) defined by the schema and determine which events have values that are extracted for a host field that includes 10.11.12.13, webserver01.splunk.com, webserver01, or vm-0123, determine which events have values that are extracted for an ip field that includes 10.11.12.13, webserver01.splunk.com, webserver01, or vm-0123, or a dest field that includes 10.11.12.13, webserver01.splunk.com, webserver01, or vm-0123. The machine data that relates to the events that are produced from the search is the machine data that is associated with the entity webserver01.splunk.com.


In another implementation, the entity definition can specify one or more values 910 to use for a specific field 908. For example, the value 930 (10.11.12.13) may be used for extracting values for the ip field and determine which values match the value 930, and the value 932 (webserver01.splunk.com) and the value 936 (vm-0123) may be used for extracting values for the host 920 field and determining which values match the value 932 or value 936.


In another implementation, GUI 900 includes a list of identifying field/value pairs. A search term that is modeled after these entities can constructed, such that, when a late-binding schema is applied to events, values that match the identifiers associated with the fields defined by the schema will be extracted. For example, if identifier.fields=“X,Y” then the entity definition should include input specifying fields labeled “X” and “Y”. The entity definition should also include input mapping the fields. For example, the entity definition can include the mapping of the fields as “X”:“1”,“Y”:[“2”,“3”]. The event processing system (e.g., event processing system 205 in FIG. 2) can apply a late-binding schema to the events to extract values for fields (e.g., X and Y) defined by the schema and determine which events have values extracted for an X field that include “1”, or which events have values extracted for a Y field that include “2”, or which events have values extracted for a Y field that include “3”.


GUI 900 can facilitate user input specifying any services 912 that the entity provides. The input can specify one or more services that have corresponding service definitions. For example, if there is a service definition for a service named web hosting service that is provided by the entity corresponding to the entity definition, then a user can specify the web hosting service as a service 912 in the entity definition.


The save button 916 can be selected to save the entity definition in a data store (e.g., data store 290 in FIG. 2). The saved entity definition can be edited.



FIG. 9C illustrates an example of a GUI 950 of a service monitoring system for creating an entity definition, in accordance with one or more implementations of the present disclosure. GUI 950 can include text boxes 952A-B that enables a user to specify a field name —field value pair 951 to use during the search to find events pertaining to the entity. User input can be received via GUI 950 for specify one or more field name —field value pairs 951. In one implementation, the text boxes 952A-B are automatically populated with field name—field value pair 951 information that was previous specified for the entity definition. GUI 950 can include a button 955, which when selected, display additional text boxes 952A-B for specifying a field name—field value pair 951.


GUI 950 can include text boxes 953A-B that enables a user to specify a name—value pair for informational fields. Informational fields are described in greater detail below in conjunction with FIG. 10AA. GUI 950 can include a button, which when selected, display additional text boxes 953A-B for specifying a name—value pair for an informational field.


GUI 950 can include a text box 954 that enables a user to associate the entity being represented by the entity definition with one or more services. In one implementation, user input of one or more strings that identify the one or more service is received via text box 954. In one implementation, when text box 954 is selected (e.g., clicked) a list of service definition is displayed which a user can select from. The list can be populated using service definitions that are stored in a service monitoring data store, as described in greater detail below.



FIG. 10A illustrates an example of a GUI 1000 of a service monitoring system for creating and/or editing entity definitions, in accordance with one or more implementations of the present disclosure. GUI 1000 can display a list 1002 of entity definitions that have already been created. For example, list 1002 includes the entity definition webserver01.splunk.com that can be selected for editing.


Creating Entity Definition from a File



FIG. 10B illustrates an example of the structure 11000 for storing an entity definition, in accordance with one or more implementations of the present disclosure. Structure 11000 represents one logical structure or data organization that illustrates associations among various data items and groups to aid in understanding of the subject matter and is not intended to limit the variety of possible logical and physical representations for entity definition information. An entity definition can be stored in an entity definition data store as a record that contains information about one or more characteristics of an entity. Various characteristics of an entity include, for example, a name of the entity, one or more aliases for the entity, one or more informational fields for the entity, one or more services associated with the entity, and other information pertaining to the entity. Informational fields can be associated with an entity. An informational field is a field for storing user-defined metadata for a corresponding entity, which includes information about the entity that may not be reliably present in, or may be absent altogether from, the raw machine data. Implementations of informational fields are described in greater detail below in conjunction with FIGS. 10AA-10AE.


The entity definition structure 11000 includes one or more components. Each entity definition component relates to a characteristic of the entity. For example, there is an entity name 11001 component, one or more alias 11003 components, one or more informational (info) field 11005 components, one or more service association 11007 components, and one or more components for other information 11009. The characteristic of the entity being represented by a particular component is the particular entity definition component's type. For example, if a particular component represents an alias characteristic of the entity, the component is an alias-type component.


Each entity definition component stores information for an element. The information can include an element name and one or more element values for the element. In one implementation, the element name-value pair(s) within an entity definition component serves as a field name-field value pair for a search query. The search query can be directed to search machine data. As described above, the computing machine can be coupled to an event processing system (e.g., event processing system 205 in FIG. 2). Machine data can be represented as events. Each of the events includes raw data. The event processing system can apply a late-binding schema to the events to extract values for fields defined by the schema, and determine which events have values that are extracted for a field. A component in the entity definition includes (a) an element name that can be, in one implementation, a name of a field defined by the schema, and (b) one or more element values that can be, in one implementation, one or more extracted values for the field identified by the element name.


The element names for the entity definition components (e.g., name component 11051, the alias components 11053A-B, and the informational (info) field components 11055A-B) can be based on user input. In one implementation, the elements names correspond to data items that are imported from a file, as described in greater detail below in conjunction with FIGS. 10D, 10E and 1011. In another implementation, the element names correspond to data items that are imported from a search result set, as described in greater detail below in conjunction with FIGS. 10Q-10Z. In one implementation, element names for any additional service information that can be associated with the entities are received via user input.


The elements values for the entity definition components (e.g., name component 11051, the alias components 11053A-B, and the informational field components 11055A-B) can be based on user input. In one implementation, the values correspond to data items that are imported from a file, as described in greater detail below in conjunction with FIG. 10E and FIG. 10H. In another implementation, the values correspond to data items that are imported from a search result set, as described in greater detail below in conjunction with FIGS. 10Q-10Z.


In one implementation, an entity definition includes one entity component for each entity characteristic represented in the definition. Each entity component may have as many elements as required to adequately express the associated characteristic of the entity. Each element may be represented as a name-value pair (i.e., (element-name)-(element-value)) where the value of that name-value pair may be scalar or compound. Each component is a logical data collection.


In another implementation, an entity definition includes one or more entity components for each entity characteristic represented in the definition. Each entity component has a single element that may be represented as a name-value pair (i.e., (element-name)-(element-value)). The value of that name-value pair may be scalar or compound. The number of entity components of a particular type within the entity definition may be determined by the number needed to adequately express the associated characteristic of the entity. Each component is a logical data collection.


In another implementation, an entity definition includes one or more entity components for each entity characteristic represented in the definition. Each entity component may have one or more elements that may each be represented as a name-value pair (i.e., (element-name)-(element-value)). The value of that name-value pair may be scalar or compound. The number of elements for a particular entity component may be determined by some meaningful grouping factor, such as the day and time of entry into the entity definition. The number of entity components of a particular type within the entity definition may be determined by the number needed to adequately express the associated characteristic of the entity. Each component is a logical data collection. These and other implementations are possible including representations in RDBMS's and the like.



FIG. 10C illustrates an example of an instance of an entity definition record 11050 for an entity, in accordance with one or more implementations of the present disclosure. An entity definition component (e.g., alias component, informational field component, service association component, other component) can specify all, or only a part, of a characteristic of the entity. For example, in one implementation, an entity definition record includes a single entity name component that contains all of the identifying information (e.g., name, title, and/or identifier) for the entity. The value for the name component type in an entity definition record can be used as the entity identifier for the entity being represented by the record. For example, the entity definition record 11050 includes a single entity name component 11051 that has an element name of “name” and an element value of “foobar”. The value “foobar” becomes the entity identifier for the entity that is being represented by record 11050.


There can be one or multiple components having a particular entity definition component type. For example, the entity definition record 11050 has two components (e.g., informational field component 11055A and informational field component 11055B) having the informational field component type. In another example, the entity definition record 11050 has two components (e.g., alias component 11053A and alias component 11053B) having the alias component type. In one implementation, some combination of a single and multiple components of the same type are used to store information pertaining to a characteristic of an entity.


An entity definition component can store a single value for an element or multiple values for the element. For example, alias component 11053A stores an element name of “IP” and a single element value 11063 of “1.1.1.1”. Alias component 11053B stores an element name of “IP2” and multiple element values 11065 of “2.2.2.2” and “5.5.5.5”. In one implementation, when an entity definition component stores multiple values for the same element, and when the element name-element value pair is used for a search query, the search query uses the values disjunctively. For example, a search query may search for fields named “IP2” and having either a “2.2.2.2” value or a “5.5.5.5” value.


As described above, the element name—element value pair in an entity definition record can be used as a field-value pair for a search query. Various machine data may be associated with a particular entity, but may use different aliases for identifying the same entity. Record 11050 has an alias component 11053A that stores information for one alias, and has another alias component 11053B that stores another alias element (having two alias element values) for the entity. The alias components 11053A,B of the entity definition can be used to aggregate event data associated with different aliases for the entity represented by the entity definition. The element name—element value pairs for the alias components can be used as field-value pairs to search for the events that have matching values for fields specified by the elements' names. The entity can be associated with the machine data represented by the events having associated fields whose values match the element values in the alias components. For example, a search query may search for events with a “1.1.1.1” value in a field named “IP” and events with either a “2.2.2.2” value or a “5.5.5.5” value in a field named “IP2”.


Various implementations may use a variety of data representation and/or organization for the component information in an entity definition record based on such factors as performance, data density, site conventions, and available application infrastructure, for example. The structure (e.g., structure 11000 in FIG. 10B) of an entity definition can include rows, entries, or tuples to depict components of an entity definition. An entity definition component can be a normalized, tabular representation for the component, as can be used in an implementation, such as an implementation storing the entity definition within an RDBMS. Different implementations may use different representations for component information; for example, representations that are not normalized and/or not tabular. Different implementations may use various data storage and retrieval frameworks, a JSON-based database as one example, to facilitate storing entity definitions (entity definition records). Further, within an implementation, some information may be implied by, for example, the position within a defined data structure or schema where a value, such as “1.1.1.1” 11063 in FIG. 10C, is stored—rather than being stored explicitly. For example, in an implementation having a defined data structure for an entity definition where the first data item is defined to be the value of the name element for the name component of the entity, only the value need be explicitly stored as the entity component and the element name (name) are known from the data structure definition.



FIG. 10D is a flow diagram of an implementation of a method 12000 for creating entity definition(s) using a file, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 12002, the computing machine receives a file having multiple entries. The computing machine may receive the entire file or something less. The file can be stored in a data store. User input can be received, via a graphical user interface (GUI), requesting access to the file. One implementation of receiving the file via a GUI is described in greater detail below in conjunction with FIGS. 10F-10G. The file can be a file that is generated by a tool (e.g., inventory system) and includes information pertaining to an IT environment. For example, the file may include a list of entities (e.g., physical machines, virtual machines, APIs, processes, etc.) in an IT environment and various characteristics (e.g., name, aliases, user, role, operating system, etc.) for each entity. One or more entries in the file can correspond to a particular entity. Each entry can include one or more data items. Each data item can correspond to a characteristic of the particular entity. The file can be a delimited file, where multiple entries in the file are separated using entry delimiters, and the data items within a particular entry in the file are separated using data item delimiters.


A delimiter is a sequence of one or more characters (printable, or not) used to specify a boundary between separate, independent regions in plain text or other data streams. An entry delimiter is a sequence of one or more characters to separate entries in the file. An example of an entry delimiter is an end-of-line indicator. An end-of-line indicator can be a special character or a sequence of characters. Examples of an end-of-line indicator include, and are not limited to a line feed (LF) and a carriage return (CR). A data item delimiter is a sequence of one or more characters to separate data items in an entry. Examples of a data item delimiter can include, and are not limited to a comma character, a space character, a semicolon, quote(s), brace(s), pipe, slash(es), and a tab.


An example of a delimited file includes, and is not limited to a comma-separated values (CSV) file. Such a CSV file can have entries for different entities separated by line feeds or carriage returns, and an entry for each entity can include data items (e.g., entity name, entity alias, entity user, entity operating system, etc.), in proper sequence, separated by comma characters. Null data items can be represented by having nothing between sequential delimiters, i.e., one comma immediately followed by another. An example of a CSV file is described in greater detail below in conjunction with FIG. 10E.


Each entry in the delimited file has an ordinal position within the file, and each data item has an ordinal position within the corresponding entry in the file. An ordinal position is a specified position in a numbered series. Each entry in the file can have the same number of data items. Alternatively, the number of data items per entry can vary.


At block 12004, the computing machine creates a table having one or more rows, and one or more columns in each row. The number of rows in the table can be based on the number of entries in the file, and the number of columns in the table can be based on the number of data items in an entry of the file (e.g., the number of data items in an entry having the most data items). Each row has an ordinal position within the table, and each column has an ordinal position within the table. At block 12006, the computing machine associates the entries in the file with corresponding rows in the table based on the ordinal positions of the entries within the file and the ordinal positions of the rows within the table. For each entry, the computing machine matches the ordinal position of the entry with the ordinal position of one of the rows. The matched ordinal positions need not be equal in an implementation, and one may be calculated from the other using, for example, an offset value.


At block 12008, for each entry in the file, the computing machine imports each of the data items of the particular entry in the file into a respective column of the same row of the table. An example of importing the data items of a particular entry to populate a respective column of a same row of a table is described in greater detail below in conjunction with FIG. 10E.


At block 12010, the computing system causes display in a GUI of one or more rows of the table populated with data items imported from the file. An example GUI presenting a table with data items imported from a delimited file is described in greater detail below in conjunction with FIG. 10E and FIG. 10H.


At block 12012, the computing machine receives user input designating, for each of one or more respective columns, an element name and a type of entity definition component to which the respective column pertains. As discussed above, an entity definition component type represents a particular characteristic type (e.g., name, alias, information, service association, etc.) of an entity. An element name represents a name of an element associated with a corresponding characteristic of an entity. For example, the entity definition component type may be an alias component type, and an element associated with an alias of an entity may be an element name “IP”.


The user input designating, for each respective column, an element name and a type (e.g., name, alias, informational field, service association, and other) of entity definition component to which the respective column pertains can be received via the GUI. One implementation of user input designating, for each respective column, an element name and a type of entity definition component to which the respective column pertains is discussed in greater detail below in conjunction with FIGS. 10H-10I.


At block 12014, the computing machine stores, for each of one or more of the data items of the particular entry of the file, a value of an element of an entity definition. A data item will be stored if it appeared in a column for which a proper element name and entity definition component type were specified. An entity definition includes one or more components. Each component stores information pertaining to an element. The element of the entity definition has the element name designated for the respective column in which the data item appeared. The element of the entity definition is associated with an entity definition component having the type designated for the respective column in which the data item appeared. The element names and the values for the elements can be stored in an entity definition data store, which may be a relational database (e.g., SQL server) or a document-oriented database (e.g., MongoDB), for example.



FIG. 10E is a block diagram 13000 of an example of creating entity definition(s) using a file, in accordance with one or more implementations of the present disclosure. A file 13009 can be stored in a data store. The file 13009 can have a delimited data format that has one or more sequentially ordered data items (each corresponding to a tabular column) in one or more lines or entries (each corresponding to a tabular row). The file 13009 is a CSV file called “test.csv” and includes multiple entries 13007A-C. Each entry 13007A-C includes one or more data items. A CSV file stores tabular data in plain-text form and consists of any number of entries (e.g., entries 13007A-C).


The rows in the file 13009 can be defined by the delimiters that separate the entries 13007A-C. The entry delimiters can include, for example, line breaks, such as a line feed (not shown) or carriage return (not shown). In one implementation, one type of entry delimiter is used to separate the entries in the same file.


The nominal columns in the file 13009 can be defined by delimiters that separate the data items in the entries 13007A-C. The data item delimiter may be, for example, a comma character. For example, for entry 13007A, “IP” 13001 and “IP2” 13003 are separated by a comma character, “IP2” 13003 and “user” 13005 are also separated by a comma character, and “user” 13005 and “name” 13006 are also separated by a comma character. In one implementation, the same type of delimiter is used to separate the data items in the same file.


The first entry 13007A in the file 1309 may be a “header” entry. The data items (e.g. IP 13001, IP2 13003, user 13005, name 13006) in the “header” entry 13007A can be names defining the types of data items in the file 13009.


A table 13015 can be displayed in a GUI. The table 13015 can include one or more rows. In one implementation, a top row in the table 13015 is a column identifier row 13017, and each subsequent row 13019A,B is a data row. A column identifier row 13017 contains column identifiers, such as an element name 13011A-D and an entity definition component type 13013A-D, for each column 13021A-D in the table 13015. User input can be received via the GUI for designating the element names 13011A-D and component types 13013A-D for each column 13021A-D.


In one implementation, the data items of the first entry (e.g., entry 13007A) in the file 13009 are automatically imported as the element names 13011A-D into the column identifier row 13017 in the table 13015, and user input is received via the GUI that indicates acceptance of using the data items of the first entry 13007A in the file 13009 as the element names 13011A-D in the table 13015. In one implementation, user input designating the component types is also received via the GUI. For example, a user selection of a save button or a next button in a GUI can indicate acceptance. One implementation of a GUI facilitating user input for designating the element names and component types for each column is described in greater detail below in conjunction with FIG. 10H.


The determination of how to import a data item from the file 13009 to a particular location in the table 13015 is based on ordinal positions of the data items within a respective entry in the file 13009 and ordinal positions of columns within the table 13015. In one implementation, ordinal positions of the entries 13007A-D within the file 13009 and ordinal positions of the rows (e.g., rows 13017,13019A-B) within the table 13015 are used to determine how to import a data item from the file 13009 into the table 13015.


Each of the entries and data items in the file 13009 has an ordinal position. Each of the rows and columns in the table 13015 has an ordinal position. In one implementation, the first position in a numbered series is zero. In another implementation, the first position in a numbered series is one.


For example, each entry 13007A-C in the file 13009 has an ordinal position within the file 13009. In one implementation, the top entry in the file 13009 has a first position in a numbered series, and each subsequent entry has a corresponding position in the number series relative to the entry having the first position. For example, for file 13009, entry 13007A has an ordinal position of one, entry 13007B has an ordinal position of two, and entry 13007C has an ordinal position of three.


Each data item in an entry 13007A-C has an ordinal position within the respective entry. In one implementation, the left most data item in an entry has a first position in a numbered series, and each subsequent data item has a corresponding position in the number series relative to the data item having the first position. For example, for entry 13007A, “IP” 13001 has an ordinal position of one, “IP2” 13003 has an ordinal position of two, “user” 13005 has an ordinal position of three, and “name” 13006 has an ordinal position of four.


Each row in the table 13015 has an ordinal position within the table 13015. In one implementation, the top row in the table 13015 has a first position in a numbered series, and each subsequent row has a corresponding position in the number series relative to the row having the first position. For example, for table 13015, row 13017 has an ordinal position of one, row 13019A has an ordinal position of two, and row 13019B has an ordinal position of three.


Each column in the table 13015 has an ordinal position within the table 13015. In one implementation, the left most column in the table 13015 has a first position in a numbered series, and each subsequent column has a corresponding position in the number series relative to the column having the first position. For example, for table 13015, column 13021A has an ordinal position of one, column 13021B has an ordinal position of two, column 13021C has an ordinal position of three, and column 13021D has an ordinal position of four.


Each element name 13011A-C in the table 13015 has an ordinal position within the table 13015. In one implementation, the left most element name in the table 13015 has a first position in a numbered series, and each subsequent element name has a corresponding position in the numbered series relative to the element name having the first position. For example, for table 13015, element name 13011A has an ordinal position of one, element name 13011B has an ordinal position of two, element name 13011C has an ordinal position of three, and element name 13011D has an ordinal position of four.


The ordinal positions of the rows in the table 13015 and the ordinal positions of the entries 13007A-C in the file 13009A can correspond to each other. The ordinal positions of the columns in the table 1315 and the ordinal positions of the data items in the file 13009 can correspond to each other. The ordinal positions of the element names in the table 13015 and the ordinal positions of the data items in the file 13009 can correspond to each other.


The determination of an entity name 13011A-D in which to place a data item can be based on the ordinal position of the entity name 13011A-D that corresponds to the ordinal position of the data item. For example, “IP” 13001 has an ordinal position of one within entry 13007A in the file 13009. Element name 13011A has an ordinal position that matches the ordinal position of “IP” 13001. “IP” 13001 can be imported from the file 13009 and placed in row 13017 and in element name 13011A.


The data items for a particular entry in the file 13009 can appear in the same row in the table 13015. The determination of a row in which to place the data items for the particular entry can be based on the ordinal position of the row that corresponds to the ordinal position of the entry. For example, entry 13007B has an ordinal position of two. Row 13019A has an ordinal position that matches the ordinal position of entry 13007B. “1.1.1.1”, “2.2.2.2”, “j smith”, and “foobar” can be imported from the file 13009 and placed in row 13019A in the table 13015.


The determination of a column in which to place a particular data item can be based on the ordinal position of the column within the table 13015 that corresponds to the ordinal position of the data items within a particular entry in the file 13009. For example, “1.1.1.1” in entry 13007B has an ordinal position of one. Column 13021A has an ordinal position that matches the ordinal position of “1.1.1.1”. “1.1.1.1” can be imported from the file 13009 and placed in row 13019A and in column 13021A.


Corresponding ordinal positions need not be equal in an implementation, and one may be calculated from the other using, for example, an offset value.


User input designating the component types 13013A-D in the table 13015 is received via the GUI. For example, a selection of “Alias” is received for component type 13013A, a selection of “Alias” is received for component type 13013B, a selection of “Informational Field” is received for component type 13013C, and a selection of “Name” is received for component type 13013D. One implementation of a GUI facilitating user input for designating the component types for each column is described in greater detail below in conjunction with FIGS. 1011-10I.


User input can be received via the GUI for creating entity definitions records 13027A,B using the element names 13011A-D, component types 13013A-D, and data items displayed in the table 13015 and importing the entity definitions records 13027A,B in a data store, as described in greater detail below in conjunction with FIGS. 10H-10L.


When user input designating the entity definition component types 13013A-D for the table 13015 is received, and user input indicating acceptance of the display of the data items from file 13009 into the table 13015 is received, the entity definition records can be created and stored. For example, two entity definition records 13027A,B are created.


As described above, in one implementation, an entity definition stores no more than one component having a name component type. The entity definition can store zero or more components having an alias component type, and can store zero or more components having an informational field component type. In one implementation, user input is received via a GUI (e.g., entity definition editing GUI, service definition GUI) to add one or more service association components and/or one or more other information components to an entity definition record. While not explicitly shown in the illustrative example of FIG. 10E, the teachings regarding the importation of component information into entity definition records from file data can understandably be applied to service association component information, after the fashion illustrated for alias and informational field component information, for example.


In one implementation, the entity definition records 13027A,B store the component having a name component type as a first component, followed by any component having an alias component type, followed by any component having an informational field component type, followed by any component having a service component type, and followed by any component having a component type for other information.



FIG. 10F illustrates an example of a GUI 14000 of a service monitoring system for creating entity definition(s) using a file or using a set of search results, in accordance with one or more implementations of the present disclosure. GUI 14000 can include an import file icon 14005, which can be selected, for starting the creation of entity definition(s) using a file. GUI 14000 can include a search icon 14007, which can be selected, for starting the creation of entity definition(s) using search results.


GUI 14000 can include a creation status bar 14001 that displays the various stages for creating entity definition(s) using the GUI. For example, when the import file icon 14005 is selected, the stages that pertain to creating entity definition(s) using a file are displayed in the status bar 14001. The stages can include, for example, and are not limited to, an initial stage, an import file stage, a specify columns stage, a merge entities stage, and a completion stage. The status bar 14001 can be updated to display an indicator (e.g., shaded circle) corresponding to a current stage. When the search icon 14007 is selected, the stages that pertain to creating entity definition(s) using search results are displayed in the status bar 14001, as described in greater detail below in conjunction with FIGS. 10Q-10Z.


GUI 14000 includes a next button 14003, which when selected, displays the next GUI for creating the entity definition(s). GUI 14000 includes a previous button 14002, which when selected, displays the previous GUI for creating the entity definition(s). In one implementation, if no icon (e.g., icon 14005, icon 14007) is selected, a default selection is used and if the next button 14003 is activated, the GUI corresponding to the default selection is displayed. In one implementation, the import file icon is the default selection. The default selection can be configurable.



FIG. 10G illustrates an example of a GUI 15000 of a service monitoring system for selecting a file for creating entity definitions, in accordance with one or more implementations of the present disclosure. The data items from the selected file can be imported into a table in the GUI, as described in greater detail below.


GUI 15000 can include a status bar 15001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., import file stage). User input can be received specifying the selected file. For example, if the select file button 15009 is activated, a GUI that allows a user to select a file is displayed. The GUI can display a list of directories and/or files. In another example, the user input may be a file being dragged to the drag and drop portion 15011 of the GUI 15000.


The selected file can be a delimited file. GUI 15000 can facilitate user input identifying a quote character 15005 and a separator character 15007 that is being used for the selected file. The separator character 15007 is the character that is being used as a data item delimiter to separate data items in the selected file. For example, user input can be received identifying a comma character as the separator character being used in the selected file.


At times, the separator character 15007 (e.g., comma character) may be part of a data item. For example, if the separator character is a comma character and the data item in the file may be “joe,machine”. In such a case, the comma character in the “joe,machine” should not be treated as a separator character and should be treated as part of the data item itself. In the delimited file, such situations are addressed by using special characters (e.g., quotes around a data item that includes a comma character). Quote characters 15005 in GUI 15000 indicate that a separator character inside a data item surrounded by those quote characters 15005 should not be treated as a separator but rather part of the data item itself. Example quote characters 15005 can include, and are not limited to, single quote characters, double quote characters, slash characters, and asterisk characters. The quote characters 15005 to be used can be specified via user input. For example, user input may be received designating single quote characters to be used as quote characters 15005 in the delimited file. If a file has been selected, and if the next button 15003 has been activated, the data items from the selected file can be imported to a table. The table containing the imported data items can be displayed in a GUI, as described in greater detail below in conjunction with FIG. 10H.



FIG. 10H illustrates an example of a GUI 17000 of a service monitoring system that displays a table 17015 for facilitating user input for creating entity definition(s) using a file, in accordance with one or more implementations of the present disclosure. GUI 17000 can include a status bar 17001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., specify column stage).


GUI 17000 can facilitate user input for creating one or more entity definition records using the data items from a file. Entity definition records are stored in a data store. The entity definition records that are created as a result of user input that is received via GUI 17000 can replace any existing entity definition records in the data store, can be added as new entity definition records to the data store, and/or can be combined with any existing entity definition records in the data store. The type of entity definition records that are to be created can be based on user input. GUI 17000 can include a button 17005, which when selected, can display a list of record type options, as described in greater detail below in conjunction with FIG. 10J.


Referring to FIG. 10H, GUI 17000 can display a table 17015 that has automatically been populated with data items that have been imported from a selected file (e.g., file 13009 in FIG. 10E). Table 170015 includes columns 17021A-D, a column identifier row 17012A containing element names 17011A-D for the columns 17021A-D, and another column identifier row 17012B containing component types 17013A-D for the columns 17021A-D.


The data items (e.g., “IP” 13001, “IP2” 13003, “user” 13005, and “name” 13006 in FIG. 10E), of the first entry (e.g., first entry 13007A in FIG. 10E) can automatically be imported as the element names 17011 A-D into the column identifier row 17012A in the table 17015. The placement of the data items (e.g., “IP”, “IP2”, “user”, and “name”) within the column identifier row 17012A is based on the matching of ordinal positions of the element names 17011A-D within the column identifier row 17012A to the ordinal positions of the data items within the first entry (e.g., entry 13007A of FIG. 10E) of the selected file.


GUI 17000 includes input text boxes 17014A-D to receive user input of user selected element names for the columns 17021A-D. In one implementation, user input of an element name that is received via a text box 17014A-D overrides the element names (e.g., “IP”, “IP2”, “user”, and “name”) that that are imported from the data items in the first header row in the file. As discussed above, an element name—element value pair that is defined for an entity definition component via GUI 17000 can be used as a field-value pair for a search query. An element name in the file may not correspond to an existing field name. A user (e.g., business analyst) can change the element name, via a text box 17014A-D, to a name that maps to an existing or desired field name. The mapping of an element name to an existing field name is not limited to a one-to-one mapping. For example, a user may rename “IP” to “dest” via text box 17014A and may also rename “IP2” to “dest” via text box 17014B.


The data items of the subsequent entries in the file can automatically be imported into the table 17015. The placement of the data items of the subsequent entries into a particular row in the table 17015 can be based on the matching of ordinal positions of the data rows 17019A,B within the table 17015 to the ordinal positions of the entries within the file. The placement of the data items into a particular column within the table 17015 can be based on the matching of the ordinal positions of the columns 17021A-D within the table 17015 to the ordinal positions of the data items within a particular entry in the file.


User input designating the entity definition component types 17013A-D in the table 17015 is received via the GUI. In one implementation, a button 17016 for each column 17021A-D can be selected to display a list of component types to select from. FIG. 10I illustrates an example of a GUI 18000 of a service monitoring system for displaying a list 18050 of entity definition component types, in accordance with one or more implementations of the present disclosure. List 18050 can include an alias component type 18001, a name component type 18003, an informational field component type 18005, and an import option 18007 indicating that the data items in a file that correspond to a particular column in the table 18015 should not be imported for creating an entity definition record. In one implementation, GUI 18000 includes buttons, which when selected, displays service and description drop down columns.



FIG. 10J illustrates an example of a GUI 19000 of a service monitoring system for specifying the type of entity definition records to create, in accordance with one or more implementations of the present disclosure. GUI 19000 can include a button 19001, which when selected, can display a list 19050 of record type options from which a user may select.


As discussed above, entity definition records are stored in a data store. The entity definition records that are created as a result of user input that is received via GUI 19000 can be added as new entity definition records to the data store, can replace any existing entity definition records in the data store, and/or can be combined with any existing entity definition records in the data store. The list 19050 can include an option for to append 19003 the created entity definition records to the data store, to replace 19005 existing entity definition records in the data store with the created entity definition records, and to combine 19007 the created entity definition records with existing entity definition records in the data store. In one implementation, the record type is set to a default type. In one implementation, the default record type is set to the replacement type. The default record type is configurable.


When the append 19003 option is selected, the entity definition records (e.g., records 13027A,B in FIG. 10E) that are created as a result of using the GUI 19000 are added as new entity definition records to the data store.


When the replace 19005 option is selected, one or more of the entity definition records that are created as a result of using the GUI 19000 replace existing entity definition records in the data store that match one or more element values in the newly created records. In one implementation, an entire entity definition record that exists in the data store is replaced with a new entity definition record. In another implementation, one or more components of an entity definition record that exist in the data store are replaced with corresponding components of a new entity definition record.


In one implementation, the match is based on the element value for the name component in the entity definition records. A search of the data store can be executed to search for existing entity definition records that have an element value for a name component that matches the element value for the name component of a newly created entity definition record. For example, two entity definition records are created via GUI 19000. A first record has an element value of “foobar” for the name component of the record. The first record also includes an alias component having the element name “IP2” and element value of “2.2.2.2”, and another alias component having the element name “IP” and element value of “1.1.1.1”. There may be an existing entity definition record in the data store that has a matching element value of “foobar” for the name component. The existing entity definition record in the data store may have an alias component having the element name “IP2,” but may have an element value of “5.5.5.5”. The element value of “2.2.2.2” for the element name “IP2” in the new entity definition record can replace the element value of “5.5.5.5” in the existing entity definition record.


When the combine 19007 option is selected, one or more of the entity definition records that are created as a result of using the GUI 19000 can be combined with a corresponding entity definition record, which exists in the data store and has a matching element value for a name component. For example, a new entity definition record has an element value of “foobar” for the name component of the record. The first record also includes an alias component having the element name “IP2” and element value of “2.2.2.2”, and another alias component having the element name “IP” and element value of “1.1.1.1”. There may be an existing entity definition record in the data store that has a matching element value of “foobar” for the name component. The existing entity definition record in the data store may have an alias component having the element name “IP2,” but may have an element value of “5.5.5.5”. The element value of “2.2.2.2” for the element name “IP2” in the new entity definition record can be added as another element value in the existing entity definition record for the alias component having the element name “IP2,” as described above in conjunction with alias component 12053B in FIG. 10C. In one implementation, if an alias component stores an element name of “IP2” and multiple element values “2.2.2.2” and “5.5.5.5,” and when the element name-element value pair is used for a search query, the search query uses the values disjunctively. For example, a search query may search for fields named “IP2” and having either a “2.2.2.2” value or a “5.5.5.5” value.


If input of the selected file has been received, and if the next button 19003 has been selected, a GUI for merging entity definition records is displayed, as described in greater detail below in conjunction with FIG. 10K.



FIG. 10K illustrates an example of a GUI 20000 of a service monitoring system for merging entity definition records, in accordance with one or more implementations of the present disclosure. GUI 20000 can include a status bar 20001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., merge entities stage). During the merge entity definition records stage, a determination of whether there would be duplicate entity definition records in the data store is made, and the results 20015 of the determination are displayed in the GUI 20000. For example, if the append option (e.g., append 19003 option if FIG. 10J) was selected to add any the newly created entity definition records to the data store, the results 20015 may be that multiple entity definition records that have the same element value for the name component would exists in the data store. For example, the results 20015 include an indicator 20014 indicating that there would be one duplicated entity definition record having the element name “foobar” as the name component in the records. A user (e.g., business analyst) can decide whether or not to allow the multiple entity definition records in the data store that have the same value (e.g., foobar) for the name component. If the user does not wish to allow the multiple records to have the same name in the data store, the previous 20002 button can be selected to display the previous GUI (e.g., GUI 19000 in FIG. 10J) and the user may select another record type (e.g., replace, combine). If the user wishes to allows the multiple records to have the same name, the submit 20003 button can be selected to create the new entity definition records and to add the new entity definition records to the data store. If the submit 20003 button is selected, GUI 21000 in FIG. 10L can be displayed.



FIG. 10L illustrates an example of a GUI 21000 of a service monitoring system for providing information for newly created and/or updated entity definition records, in accordance with one or more implementations of the present disclosure. GUI 21000 can include a status bar 21001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., completion stage).


GUI 21000 can include information 21003 pertaining to the entity definition records that have been imported into the data store. The information 21003 can include the number of records that have been imported. In one implementation, the information 21003 includes the type (e.g., replace, append, combine) of import that has been made. If button 21005 is selected, GUI 24000 for editing the entity definition records can be displayed. FIG. 10P illustrates an example of a GUI 24000 of a service monitoring system for creating and/or editing entity definition record(s), in accordance with one or more implementations of the present disclosure. GUI 24000 displays a portion 24001 of a list of the entity definition records that are stored in the data store. A button 24003 for an entity definition record in the list can be selected, and a GUI for editing the selected entity definition record can be displayed.


Referring to FIG. 10L, as described above, the selected file (e.g., file 13000 in FIG. 10E) that was used to import entity definition records in to the data store may be a file that is generated by a source (e.g., inventory system). The file may be periodically output by the source (e.g., inventory system), and a user (e.g., business analyst) may wish to execute another import using the newly outputted file from the source. The configuration (e.g., selected component types, selected type of import, etc.) of the current import that was executed using the file can be saved for future execution using an updated file.


If button 21007 is selected, GUI 22000 in FIG. 10M can be displayed to save the configuration of the current import that was executed using the file as a new modular input that can be used for future imports using new versions of the file.



FIG. 10M illustrates an example of a GUI 22000 of a service monitoring system for saving configurations settings of an import, in accordance with one or more implementations of the present disclosure. The configuration of a current import that was executed using a file (e.g., file 13000 in FIG. 10E) can be saved as a new modular input that can be used for future imports using new versions of the file. When a new modular input is created for the file, the file (e.g., file 13000 in FIG. 10E) will be monitored for updates. If the file is updated, an import can be automatically executed using the configuration (e.g., selected component types, selected type of import, etc.) of the modular input that was saved for the file.


A user (e.g., business analyst) can provide a name 22001 for modular input and metadata information for the modular input, such as an entity type 22003 for the modular input. When the create 22005 button is selected, a modular input GUI is displayed for setting the parameters for monitoring the file.



FIGS. 10N-10O illustrates an example of GUIs of a service monitoring system for setting the parameters for monitoring a file, in accordance with one or more implementations of the present disclosure. GUI 23000 can automatically be populated with the configuration of the current import that is to be saved. For example, GUI 23000 in FIG. 10N displays parameters from the current import, such as the file location 23002, the entity type 23004, the column identifier 23006 to be used to identify rows in the file, the file column headers 23008 in the file, and the record type 23010.


The monitoring of a file (e.g., file 13009 in FIG. 10E) to determine whether the file has changed can run at a particular interval. A user can provide input of the interval 23051 via GUI 23050 in FIG. 10O. In one implementation, a change is when new data is found in the file. In another implementation, a change is when data has been removed from the file. In one implementation, a change includes data being added to the file and data being removed from the file. In one implementation, when a change is identified in the file, new entity definition records that reflect the change can be imported into the data store. Depending on the import type that has been saved in the modular input, the new entity definition records can automatically replace, append, or be combined with existing entity definition records in the data store. For example, the append 23010 option has been saved in the modular input settings and will be used for imports that occur when the file has changed. When a change has been detected in the file, new entity definition records will automatically be appended (e.g., added) to the data store. In one implementation, when a change has been detected in the file that pertains to data being removed from the file, the import of the new entity definition records, which reflect the removed data, into the data store does not occur automatically.


Creating Entity Definition from a Search Result List



FIG. 10Q is a flow diagram of an implementation of a method 25000 for creating entity definition(s) using a search result set, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 25002, the computing machine performs a search query to produce a search result set. The search query can be performed in response to user input. The user input can include a user selection of the type of search query to use for creating entity definitions. The search query can be an ad-hoc search or a saved search. A saved search is a search query that has search criteria, which has been previously defined and is stored in a data store. An ad-hoc search is a new search query, where the search criteria are specified from user input that is received via a graphical user interface (GUI). Implementations for receiving user input for the search query via a GUI are described in greater detail below in conjunction with FIGS. 10S-10T.


In one implementation, the search query is directed to searching machine data. As described above, the computing machine can be coupled to an event processing system (e.g., event processing system 205 in FIG. 2). Machine data can be represented as events. Each of the events can include raw data. The event processing system can apply a late-binding schema to the events to extract values for fields defined by the schema, and determine which events have values that are extracted for a field. The search criteria for the search query can specify a name of one or more fields defined by the schema and a corresponding value for the field name. The field-value pairs in the search query can be used to search the machine data for the events that have matching values for the fields named in search criteria. For example, the search criteria may include the field name “role” and the value “indexer.” The computing machine can execute the search query and return a search result set that includes events with the value “indexer” in the associated field named “role.”


In one implementation, the search query is directed to search a data store storing service monitoring data pertaining to the service monitoring system. The service monitoring data, can include, and is not limited to, entity definition records, service definition records, key performance indicator (KPI) specifications, and KPI thresholding information. The data in the data store can be based on one or more schemas, and the search criteria for the search query can include identifiers (e.g., field names, element names, etc.) for searching the data based on the one or more schemas. For example, the search criteria can include a name of one or more elements defined by the schema for entity definition records, and a corresponding value for the element name. The element name element value pair in the search query can be used to search the entity definition records for the records that have matching values for the elements named in search criteria.


The search result set can be in a tabular format, and can include one or more entries. Each entry includes one or more data items. The search query can search for information pertaining to an IT environment. For example, the search query may return a search result set that includes information for various entities (e.g., physical machines, virtual machines, APIs, processes, etc.) in an IT environment and various characteristics (e.g., name, aliases, user, role, owner, operating system, etc.) for each entity. One or more entries in the search result set can correspond to entities. Each entry can include one or more data items. As discussed above, an entity has one or more characteristics (e.g., name, alias, informational field, service association, and/or other information). Each data item in an entry in the search result set can correspond to a characteristic of a particular entity.


Each entry in the search result set has an ordinal position within the search result set, and each data item has an ordinal position within the corresponding entry in the search result set. An ordinal position is a specified position in a numbered series. Each entry in the search result set can have the same number of data items. Alternatively, the number of data items per entry can vary.


At block 25004, the computing machine creates a table having one or more rows, and one or more columns in each row. The number of rows in the table can be based on the number of entries in the search result set, and the number of columns in the table can be based on the number of data items within an entry in the search result set (e.g., the number of data items in an entry having the most data items). Each row has an ordinal position within the table, and each column has an ordinal position within the table.


At block 25006, the computing machine associates the entries in the search result set with corresponding rows in the table based on the ordinal positions of the entries within the search result set and the ordinal positions of the rows within the table. For each entry, the computing machine matches the ordinal position of the entry with the ordinal position of one of the rows. The matched ordinal positions need not be equal in an implementation, and one may be calculated from the other using, for example, an offset value.


At block 25008, for each entry in the search result set, the computing machine imports each of the data items of a particular entry in the search result set into a respective column of the same row of the table. An example of importing the data items of a particular entry to populate a respective column of a same row of a table is described in greater detail below in conjunction with FIG. 10R.


At block 25010, the computing system causes display in a GUI of one or more rows of the table populated with data items imported from the search result set. An example GUI presenting a table with data items imported from a search result set is described in greater detail below in conjunction with FIG. 10R and FIG. 10V.


At block 25012, the computing machine receives user input designating, for each of one or more respective columns, an element name and a type of entity definition component to which the respective column pertains. As discussed above, an entity definition component type represents a particular characteristic type (e.g., name, alias, information, service association, etc.) of an entity. An element name represents a name of an element associated with a corresponding characteristic of an entity. For example, the entity definition component type may be an alias component type, and an element associated with an alias of an entity may be an element name “role”.


The user input designating, for each respective column, an element name and a type (e.g., name, alias, informational field, service association, and other) of entity definition component to which the respective column pertains can be received via the GUI. One implementation of user input designating, for each respective column, an element name and a type of entity definition component to which the respective column pertains is discussed in greater detail below in conjunction with FIG. 10V.


At block 25014, the computing machine stores, for each of one or more of the data items of the particular entry of the search result set, a value of an element of an entity definition. I data item will be stored if it appeared in a column for which a proper element name and entity definition component type were specified. As discussed above, an entity definition includes one or more components. Each component stores information pertaining to an element. The element of the entity definition has the element name designated for the respective column in which the data item appeared. The element of the entity definition is associated with an entity definition component having the type designated for the respective column in which the data item appeared. The element names and the values for the elements can be stored in an entity definition data store, which may be a relational database (e.g., SQL server) or a document-oriented database (e.g., MongoDB), for example.



FIG. 10R is a block diagram 26000 of an example of creating entity definition(s) using a search result set, in accordance with one or more implementations of the present disclosure. A search result set 26009 can be produced from the execution of a search query. The search result set 26009 can have a tabular format that has one or more columns of data items and one or more rows of entries. The search result set 26009 includes multiple entries 26007A-B. Each entry 26007A-B includes one or more data items.


The first entry 26007A in the search result set 26009 may be a “header” entry. The data items (e.g. serverName 26001, role 26003, and owner 26005) in the “header” entry 26007A can be names defining the types of data items in the search result set 26009.


A table 26015 can be displayed in a GUI. The table 26015 can include one or more rows. In one implementation, a top row in the table 26015 is a column identifier row 26017, and each subsequent row 26019 is a data row. A column identifier row 26017 contains column identifiers, such as an element name 26011A-C and an entity definition component type 26013A-C, for each column 26021A-C in the table 26015. User input can be received via the GUI for designating the element names 26011A-C and component types 26013A-C for each column 26021A-C.


In one implementation, the data items of the first entry (e.g., entry 26007A) in the search result set 26009 are automatically imported as the element names 26011A-C into the column identifier row 26017 in the table 26015, and user input is received via the GUI that indicates acceptance of using the data items of the first entry 26007A in the search result set 26009 as the element names 26011A-C in the table 26015. For example, a user selection of a save button or a next button in a GUI can indicate acceptance. In one implementation, user input designating the component types is also received via the GUI. One implementation of a GUI facilitating user input for designating the element names and component types for each column is described in greater detail below in conjunction with FIG. 10V.


The determination of how to import a data item from the search result set 26009 to a particular location in the table 26015 is based on ordinal positions of the data items within a respective entry in the search result set 26009 and ordinal positions of columns within the table 26015. In one implementation, ordinal positions of the entries 26007A-B within the search result set 26009 and ordinal positions of the rows (e.g., row 26017, row 26019) within the table 26015 are used to determine how to import a data item from the search result set 26009 into the table 26015.


Each of the entries and data items in the search result set 26009 has an ordinal position. Each of the rows and columns in the table 26015 has an ordinal position. In one implementation, the first position in a numbered series is zero. In another implementation, the first position in a numbered series is one.


For example, each entry 26007A-B in the search result set 26009 has an ordinal position within the search result set 26009. In one implementation, the top entry in the search result set 26009 has a first position in a numbered series, and each subsequent entry has a corresponding position in the number series relative to the entry having the first position. For example, for search result set 26009, entry 26007A has an ordinal position of one, and entry 26007B has an ordinal position of two.


Each data item in an entry 26007A-B has an ordinal position within the respective entry. In one implementation, the left most data item in an entry has a first position in a numbered series, and each subsequent data item has a corresponding position in the number series relative to the data item having the first position. For example, for entry 26007A, “serverName” 26001 has an ordinal position of one, “role” 26003 has an ordinal position of two, and “owner” 26005 has an ordinal position of three.


Each row in the table 26015 has an ordinal position within the table 26015. In one implementation, the top row in the table 26015 has a first position in a numbered series, and each subsequent row has a corresponding position in the number series relative to the row having the first position. For example, for table 26015, row 26017 has an ordinal position of one, and row 26019 has an ordinal position of two.


Each column in the table 26015 has an ordinal position within the table 26015. In one implementation, the left most column in the table 26015 has a first position in a numbered series, and each subsequent column has a corresponding position in the number series relative to the column having the first position. For example, for table 26015, column 26021A has an ordinal position of one, column 26021B has an ordinal position of two, and column 26021C has an ordinal position of three.


Each element name 26011A-C in the table 26015 has an ordinal position within the table 26015. In one implementation, the left most element name in the table 26015 has a first position in a numbered series, and each subsequent element name has a corresponding position in the numbered series relative to the element name having the first position. For example, for table 26015, element name 26011A has an ordinal position of one, element name 26011B has an ordinal position of two, and element name 26011C has an ordinal position of three.


The ordinal positions of the rows in the table 26015 and the ordinal positions of the entries 26007A-B in the search result set 26009 can correspond to each other. The ordinal positions of the columns in the table 26015 and the ordinal positions of the data items in the search result set 26009 can correspond to each other. The ordinal positions of the element names in the table 26015 and the ordinal positions of the data items in the search result set 26009 can correspond to each other.


The determination of an element name GUI element 26011A-C in which to place a data item (when importing a search results entry that contains the element (column) names) can be based on the ordinal position of the entity name 26011A-C that corresponds to the ordinal position of the data item. For example, “serverName” 26001 has an ordinal position of one within entry 26007A in the search result set 26009. Element name 26011A has an ordinal position that matches the ordinal position of “serverName” 26001. “serverName” 26001 can be imported from the search result set 26009 and placed in element name 26011A in row 26017.


The data items for a particular entry in the search result set 26009 can appear in the same row in the table 26015. The determination of a row in which to place the data items for the particular entry can be based on the ordinal position of the row that corresponds to the ordinal position of the entry. For example, entry 26007B has an ordinal position of two. Row 26019 has an ordinal position that matches the ordinal position of entry 26007B. The data items “jdoe-mbp15r.splunk.com”, “search_head, indexer”, and “jdoe” can be imported from entry 26007B in the search result set 26009 and placed in row 26019 in the table 26015.


The determination of a column in which to place a particular data item can be based on the ordinal position of the column within the table 26015 that corresponds to the ordinal position of the data items within a particular entry in the search result set 26009. For example, the data item “jdoe-mbp15r.splunk.com” in entry 26007B has an ordinal position of one. Column 26021A has an ordinal position that matches the ordinal position of “jdoe-mbp15r.splunk.com”. The data item “jdoe-mbp15r.splunk.com” can be imported from the search result set 26009 and placed in row 26019 and in column 26021A.


User input designating the component types 26013A-C in the table 26015 is received via the GUI. For example, a selection of “Name” is received for component type 26013A, a selection of “Alias” is received for component type 26013B, and a selection of “Informational Field” is received for component type 26013C. One implementation of a GUI facilitating user input for designating the component types for each column is described in greater detail below in conjunction with FIG. 10V.


Corresponding ordinal positions need not be equal in an implementation, and one may be calculated from the other using, for example, an offset value.


User input can be received via the GUI for creating entity definitions records, such as 26027, using the element names 26011A-C, component types 26013A-C, and data items displayed in the table 26015, and importing the entity definitions records, such as 26027, in a data store, as described in greater detail below in conjunction with FIGS. 10V-10X.


When user input designating the entity definition component types 26013A-C for the table 26015 is received, and user input indicating acceptance of the display of the data items from search result set 26009 into the table 26015 is received, the entity definition record(s) can be created and stored. For example, the entity definition record 26027 is created.


As described above, in one implementation, an entity definition stores no more than one component having a name component type. The entity definition can store zero or more components having an alias component type, and can store zero or more components having an informational field component type. In one implementation, user input is received via a GUI (e.g., entity definition editing GUI, service definition GUI) to add one or more service association components and/or one or more other information components to an entity definition record. While not explicitly shown in the illustrative example of FIG. 10R, the teachings regarding the importation of component information into entity definition records from search query results can understandably be applied to service association component information, after the fashion illustrated for alias and informational field component information, for example.


In one implementation, an entity definition record (e.g., entity definition record 26027) stores the component having a name component type as a first component, followed by any component having an alias component type, followed by any component having an informational field component type, followed by any component having a service component type, and followed by any component having a component type for other information.



FIG. 10S illustrates an example of a GUI 28000 of a service monitoring system for defining search criteria for a search query for creating entity definition(s), in accordance with one or more implementations of the present disclosure.


GUI 28000 can be displayed, for example, if search icon 14007 in FIG. 10F is selected, as described above. GUI 28000 can include a status bar 28001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., search stage). The stages can include, for example, and are not limited to, an initial stage, a search stage, a specify columns stage, a merge entities stage, and a completion stage. GUI 28000 includes a next button 28003, which when selected, displays the next GUI for creating the entity definition(s). GUI 28000 includes a previous button 28002, which when selected, displays the previous GUI for creating the entity definition(s).


The search query can be an ad-hoc search or a saved search. As described above, a saved search is a search query that has search criteria, which has been previously defined and is stored in a data store. An ad-hoc search is a new search query, where the search criteria are specified from user input that is received via a graphical user interface (GUI).


If the ad-hoc search button 2807 is selected, user input can be received via text box 28009 indicating search language that defines the search criteria for the ad-hoc search query. If the saved search button 28005 is selected, GUI 29000 in FIG. 10T is displayed.



FIG. 10T illustrates an example of a GUI 29000 of a service monitoring system for defining a search query using a saved search, in accordance with one or more implementations of the present disclosure. GUI 29000 includes a GUI element (e.g., a button) 29005, which when selected, displays a list 29007 of saved searches to select from. The list 29007 of saved searches corresponds to searches that are stored in a data store. In one implementation, the list 29007 of saved searches includes default saved searches. In one implementation, when a new search is saved to the data store, the list 29007 is updated to include the newly saved search—that is to say, the content of list 29007 is populated dynamically, in whole or in part.


Referring to FIG. 10S, the search query can be directed to search machine data that is stored in a data store and/or service monitoring data (e.g., entity definition records, service definition records, etc.) that is stored in a data store. The data (e.g., machine data, service monitoring data) used by a search query to produce a search result set can be based on a time range. The time range can be a user-defined time range or a default time range. The default time range can be configurable. GUI 28000 can include a button 28011, which when selected, displays a list of time ranges to select from. For example, a user may select, via the button 28011, the time range “Last 1 day” and when the search query is executed, the search query will search data (e.g., machine data, service monitoring data) from the last one day.


When a search query has been defined, for example, as user input received for an ad-hoc search via text box 28009, or from a selection of a saved search, and when a time range has been selected, the search query can be executed in response to the activation of button 28013. The search result set produced by performing the search query can be displayed in a results portion 28050 of the GUI 2800, as described in greater detail below in conjunction with FIG. 10U.



FIG. 10U illustrates an example of a GUI 30000 of a service monitoring system that displays a search result set 30050 for creating entity definition(s), in accordance with one or more implementations of the present disclosure. The saved search button 30005 has been selected, and the saved search “Get indexer entities” has been selected from the list of 30008 (not shown).


In one implementation, when a saved search is selected from the list of 30008, the search language defining the search criteria for the selected save search is displayed in the text box 30009. For example, the search language that defines the “Get indexer entities” saved search is shown displayed in text box 30009. In one implementation, user input can be received via text box 30009 to edit the saved search.


The search language that defines the search query can include a command to output the search result set in a tabular format having one or more rows (row 30012, row 30019) and one or more columns (e.g., columns 30021A-C) for each row. The search language defining the “Get indexer entities” search query can include commands and values that specify the number of columns and the column identifiers for the search result set. For example, the search language in text box 30009 may include “table serverName,role,owner”. In one implementation, if the search query definition does not output a table, an error message is displayed.


The “Get indexer entities” saved search searches for events that have the value “indexer” in the field named “role.” For example, the search language in text box 30009 may include “search role=indexer”. When the “Get indexer entities” search query is performed, GUI 30000 displays a search result set 30050 that is a table having a first entry as the column identifier row 30012, and a second entry as a data row 30019, which represents the one event that has the value “indexer” in the field named “role.”


The second entry shown as a data row 30019 has data items “jdoe-mbp15r.sv.splulnk.com”, “search_head indexer”, and “jdoe” that correspond to the columns. As described above, the command in the search query definition may include “table serverName,role,owner” and the column identifier row 30012 can include serverName 30010A, role 30010B, and owner 30010C as column identifiers. The entries and data items in the search result set 30050 can be imported into a user-interactive table for creating entity definitions, as described below. GUI 3000 includes a next button 30003, which when selected, displays GUI 31000 in FIG. 10V that translates the entries and data items in the search result set 30050 into a table for creating entity definitions.



FIG. 10V illustrates an example of a GUI 31000 of a service monitoring system that displays a table 31015 for facilitating user input for creating entity definition(s) using a search result set, in accordance with one or more implementations of the present disclosure. GUI 31000 can include a status bar 31001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., specify column stage).


GUI 31000 can facilitate user input for creating one or more entity definition records using the data items from a search result set (e.g., search result set 30050 in FIG. 10U). Entity definition records are stored in a data store. The entity definition records that are created as a result of user input that is received via GUI 31000 can replace any existing entity definition records in the data store, can be added as new entity definition records to the data store, and/or can be combined with any existing entity definition records in the data store. The type of entity definition records that are to be created can be based on user input. GUI 31000 can include a button 31040, which when selected, can display a list of record type options, as described above in conjunction with button 19001 in FIG. 10J.


Referring to FIG. 10V, GUI 31000 can display a table 31015 that has automatically been populated with data items that have been imported from a search result set (e.g., search result set 30050 in FIG. 10U). Table 310015 includes columns 31021A-C, a column identifier row 31012A containing element names 31011A-C for the columns 31021A-C, and another column identifier row 31012B containing component types 31013A-C for the columns 31021A-C.


The data items (e.g., “serverName” 30010A, “role” 30010B, “user” 26005, and “owner” 30010C in FIG. 10U) of the first entry (e.g., first entry in row 30012 in FIG. 10U) can automatically be imported as the element names 31011 A-C into the column identifier row 31012A in the table 31015. The placement of the data items (e.g., “serverName”, “role”, and “owner”) within the column identifier row 31012A is based on the matching of ordinal positions of the element names 31011A-C within the column identifier row 31012A to the ordinal positions of the data items within the first entry (e.g., first entry in row 30012 in FIG. 10U) of the search result set.


The data items of the subsequent entries (e.g., second entry in row 30019 in FIG. 10U) in the search result set can automatically be imported into the table 31015. The placement of the data items of the subsequent entries into a particular row in the table 31015 can be based on the matching of ordinal positions of the data rows 31019 within the table 31015 to the ordinal positions of the entries within the search result set. The placement of the data items into a particular column within the table 31015 can be based on the matching of the ordinal positions of the columns 31021A-D within the table 31015 to the ordinal positions of the data items within a particular entry in the search result set.


User input designating the entity definition component types 31013A-C in the table 31015 is received via the GUI. In one implementation, a button 31016 for each column 31021A-C can be selected to display a list of component types to select from, as described above in conjunction with FIG. 10I. The list of component types can include an alias component type, a name component type, an informational field component type, and an import option indicating that the data items in a search result set that correspond to a particular column in the table 18015 should not be imported for creating an entity definition record.


If the next button 31003 has been selected, a GUI for merging entity definition records is displayed, as described in greater detail below in conjunction with FIG. 10W.



FIG. 10W illustrates an example of a GUI 32000 of a service monitoring system for merging entity definition records, in accordance with one or more implementations of the present disclosure. GUI 32000 can include a status bar 32001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., merge entities stage). During the merge entity definition records stage, a determination of whether there would be duplicate entity definition records in the data store is made, and the information related to the determination 32015, including an indicator 32017 of the determination result, are displayed in the GUI 32000. For example, if the append option via a button (e.g., button 31040 in FIG. 10V) was selected to add any newly created entity definition records to the data store, the result of the prospective addition may or may not be that multiple entity definition records by the same name would exist in the data store (i.e., multiple entity definition records would have the same element value for the name component). For example, the displayed information related to the determination 32015 includes an indicator 32017 indicating that there would be no duplicated entity definition records having the element name “jdoe-mbp15r.splunk.com” 32013 as the name component in the records.


If a user does not wish to import the entity definition records into the data store, the previous 32002 button can be selected to display the previous GUI (e.g., GUI 31000 in FIG. 10V) and the user may edit the configuration (e.g., record type, component type, etc.) of the import. If a user wishes to import the entity definition records into the data store, the submit 32003 button can be selected to import the entity definition records into the data store. If the submit 32003 button is selected, GUI 33000 in FIG. 10X can be displayed.



FIG. 10X illustrates an example of a GUI 33000 of a service monitoring system for providing information for newly created and/or updated entity definition records, in accordance with one or more implementations of the present disclosure. GUI 33000 can include a status bar 33001 that is updated to display an indicator (e.g., shaded circle) corresponding to the current stage (e.g., completion stage).


GUI 33000 can include information 33003 pertaining to the entity definition records that have been imported into the data store. The information 33003 can include the number of records that have been imported. In one implementation, the information 33003 includes the type (e.g., replace, append, combine) of import that has been made. If button 33005 is selected, GUI 33000 for editing the entity definition records can be displayed, as described above in conjunction with FIG. 10P.


Referring to FIG. 10X, the search query (e.g., search query defined in GUI 30000 in FIG. 10U) that was used to produce the search result set for importing entity definition record(s) in to the data store may be executed periodically. The search result set may differ from when the search query was previously run. A user (e.g., business analyst) may wish to execute another import using the new search result set that is produced from another execution of the search query. The configuration (e.g., selected component types, selected type of import, etc.) of the current import that was executed using the search query can be saved for future execution.


If button 33007 is selected, GUI 34000 in FIG. 10Y can be displayed to save the configuration of the current import that was executed using a search query as a saved search. The saved search can be used for future imports using contemporaneous versions of the search result set that is produced by the saved search.



FIG. 10Y illustrates an example of a GUI 34000 of a service monitoring system for saving configurations settings of an import, in accordance with one or more implementations of the present disclosure. The configuration of a current import that was executed using a search query (e.g., search query defined in GUI 30000 in FIG. 10U) can be saved as a saved search that can be used for future imports using new versions of the search result set that may be produced by executing the saved search. When a saved search is created for a search query, the search query will be executed periodically and the search result set that is produced can be monitored for changes. If the search result set has changes, an import can be automatically executed using the configuration (e.g., selected component types, selected type of import, etc.) of the saved search that was saved for the search query.


A user (e.g., business analyst) can provide a name 34001 for the saved search. When the create 34005 button is selected, a saved search GUI is displayed for setting the parameters for the saved search, as described in greater detail below in conjunction with FIG. 10Z.



FIG. 10Z illustrates and example GUI 35000 of a service monitoring system for setting the parameters of a saved search, in accordance with one or more implementations of the present disclosure. GUI 35000 can automatically be populated with the configuration of the current import that is to be saved. For example, GUI 35000 displays parameters from the current import, such as the definition of the search query 35001. The search query definition 35001 can include the (1) search language for the search query (e.g., search language in text box 30009 in FIG. 10U) and (2) and commands for creating entity definition records and storing the entity definition records. The commands can automatically be generated based on the user input received via the GUIs in FIGS. 10S-10W and included in the search query definition 35001. In one implementation, the commands are appended to the search language for the search query. For example, the commands “store_entities title_field=serverName identifier_fields=serverName informational_fields=owner insertion_mode=APPEND” can be automatically generated based on the user input received via the GUIs in FIGS. 10S-10W and included in the search query definition 35001.


User input can be received via text box 35003 for a description of the saved search that is being created. User input can be received via a list 35005 for the type of schedule to use for executing the search query. The list 35005 can include a Cron schedule type and a basic schedule type. For example, if the basic schedule type is selected, user input may be received specifying that the search query should be performed every day, or, if the Cron schedule type is selected, user input may be received specifying scheduling information in a format compatible with an operating system job scheduler.


The search result set that is produced by executing the search query can be monitored for changes. In one implementation, a change is when new data is found in the search result set. In another implementation, a change is when data has been removed from the search result set. In one implementation, a change includes data being added to the search result set or data being removed from the search result set.


In one implementation, when a change is identified in the search result set, new entity definition records that reflect the change can be imported into the data store. Depending on the import type that has been saved in the search query definition 35001, the new entity definition records can automatically replace, append, or be combined with existing entity definition records in the data store. For example, the append option may have been saved in the search query definition 35001 and will be used for imports that occur when the search result set has changed. In one implementation, when a change has been detected in the search result set, new entity definition records will automatically be appended (e.g., added) to the data store. In one implementation, when a change has been detected in the search result set that pertains to data being removed from the search result set, the import of the new entity definition records, which reflect the removed data, into the data store does not occur automatically.


Informational Fields


As discussed above, an event processing system (e.g., event processing system 205 in FIG. 2) may include a machine data store that stores machine data represented as machine data events. An entity definition of an entity providing one or more services may include information for associating a subset of the machine data events in the machine data store with that entity. An entity definition of an entity specifies one or more characteristics of the entity such as a name, one or more aliases for the entity, one or more informational fields for the entity, one or more services associated with the entity, and other information pertaining to the entity. An informational field is an entity definition component for storing user-defined metadata for a corresponding entity, which includes information about the entity that may not be reliably present in, or may be absent altogether from, the machine data events.



FIG. 10AA is a flow diagram of an implementation of a method for creating an informational field and adding the informational field to an entity definition, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 35100 is performed by a client computing machine. In another implementation, the method 35100 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 35101, the computing machine creates an associated pair of data items. In one embodiment, the associated pair of data items may include a key representing a metadata field name and a value representing a metadata value for the metadata field. At block 35103, the computing machine adds the associated pair of data items to an entity definition for a corresponding entity. In one embodiment, the entity definition is stored in a service monitoring data store, separate from a machine data store. The associated pair of the metadata field name and value can be added to the entity definition as an entity definition component type “informational field.” The metadata data field name can represent an element name of the informational field (also referred to as “info field”), and the metadata field value can represent an element value of the informational field. Some other components of the entity definition may include the entity name, one or more aliases of the entity, and one or more services provided by the entity, as shown in FIG. 10B. The metadata field and metadata value may be added to the informational field component of the entity definition based on user input to provide additional information about the entity that may be useful in searches of an event store including machine data events pertaining to the entity, in searches for entities or entity definitions, in information visualizations or other actions. For example, the entity definition may be created for a particular server machine, and the informational field may be added to specify an operating system of that server machine (e.g., the metadata field name of “operating system,” and the metadata field value of “Linux”), which may not be part of machine data events pertaining to the entity represented by the entity definition.


At block 35105, the computing machine exposes the added informational field for use by a search query. In one embodiment, entity aliases may be exposed for use by a search query as part of the same process. S In one embodiment, exposing the added informational field (or alias) for use by a search query includes modifying an API to, for example, support a behavior for specifically retrieving the field name, the field value, or both of the information field (or alias). In one embodiment, exposing the added informational field (or alias) for use by a search query includes storing the informational field (or alias) information at a particular logical location within an entity definition, such as an information field (or alias) component. In such a case, certain processing of blocks 35103 and 35105 may be accomplished by a single action.


In one implementation, an alias can include a key-value pair comprised of an alias name and an alias value. Some examples of the alias name can include an identifier (ID) number, a hostname an IP (internet protocol) address, etc. A service definition of a service provided by the entity specifies an entity definition of the entity, and when a search of the machine data store is performed, for example, to obtain information pertaining to performance characteristics of the service, an exposed alias from the entity definition can be used by the search to arrive at those machine data events in the machine data store that are associated with the entity providing the service. Furthermore, storing the informational field in the entity definition together with the aliases can expose the pair of data items that make up the informational field for use by the search to attribute the metadata field and metadata value to each machine data event associated with the entity providing the service. In one example, a search for information pertaining to performance characteristics of a service provided by multiple entities (e.g., multiple virtual machines), may use the information field name and value to further filter the search result. For example, by including an additional criterion of “os=linux” (where “os” is the metadata field name and “linux” is the metadata value of the information field) in a search query, a search result may only include performance characteristics of those virtual machines of the service that run the Linux® guest operating system.


In one implementation, the informational field can be used to search for specific entities or entity definitions. For example, a user can submit a search query including a criterion of “os=linux” to find entity definitions of entities running the Linux operating system, as will be discussed in more detail below in conjunction with FIGS. 10AD and 10AE.



FIG. 10AB illustrates an example of a GUI 35200 facilitating user input for creating an informational field and adding the informational field to an entity definition, in accordance with one or more implementations of the present disclosure. For example, GUI 35200 can include multiple GUI fields 35201-35205 for creating an entity definition, as discussed above in conjunction with FIG. 6. In one implementation, name GUI field 35201 may receive user input of an identifying name for referencing the entity definition for an entity (e.g., “foobar.splunk.com”). Description GUI field 35202 may receive user input of information that describes the entity, such as what type of machine it is, what the purpose of the machine is, etc. In the illustrated example, the description of “webserver” has been entered into description GUI field 35202 to indicate that the entity named “foobar.splunk.com” is a webserver. Service GUI field 35203 may receive user input of one or more services of which the entity is a part. In one implementation, service GUI field 35203 is optional and may be left black if the user does not which to assign the entity to a service. Additional details related to the association of entities with services are provided below with respect to FIG. 11. Aliases GUI fields 35204 may receive user input of an alias name-value pair. Each machine data event pertaining to the entity can include one or more aliases that denote additional ways to reference the entity, aside from the entity name. In one implementation, the alias can include a key-value pair comprised of an alias name and an alias value. GUI 35200 may allow a user to provide multiple aliases for the entity.


Info Fields GUI fields 35205 may receive user input of an information field name-value pair. The informational field name-value pair may be added to the entity definition to store user-defined metadata for the entity, which includes information about the entity that may not be reliably not present in, or may be absent altogether from, the machine data events pertaining to that entity. The informational field name-value pair may include data about the entity that may be useful in searches of an event store including machine data events pertaining to the entity, in searches for entities or entity definitions, in information visualizations or other actions. GUI 35200 can allow a user to add multiple informational fields for the entity.


Upon entering the above characteristics of the entity, the user can request that the entity definition be created (e.g., by selecting the “Create Entity” button). In response, the entity definition is created using, for example, the structure described above in conjunction with FIG. 10B.



FIG. 10AC is a flow diagram of an implementation of a method for filtering events using informational field-value data, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 35300 is performed by a client computing machine. In another implementation, the method 35300 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 35301, the computing machine receives a search query for selecting events from the machine data store that satisfy one or more event selection criteria of the search query. The event selection criteria include a first field-value pair. The first field-value pair may include a name of a specific entity characteristic (e.g., “OS,” “owner,” etc.) and a value of a specific entity characteristic (e.g., “Linux,” “Brent,” etc.). In one implementation, the event selection criteria may be part of a search query entered by a user in a search field provided in a user interface.


At block 35303, the computing machine performs the search query to determine if events in a machine data store satisfy the event selection criteria in the search query including the first field-value pair. Determining whether one of the events satisfies the event selection criteria can involve comparing the first field-value pair of the event selection criteria with a second field-value pair from an entity definition associated with the event by using a third field-value pair from data corresponding to the event in the machine data store. In particular, in one implementation, an entity definition is located that has the second field-value pair matching the first field-value pair from the search criteria. The second field-value pair may include a metadata field name and metadata value that match the query field name and query value, respectively. In one implementation, the metadata field name and metadata value may be an informational field that was added to the entity definition as described above with respect to FIGS. 10AA-10AB. The identified entity definition may include a third field-value pair (e.g., an alias) that includes an alias name and alias value. This third field-value pair denotes an additional way to reference the entity, using data found in event records pertaining to the entity. Using this alias, the events in the machine data store that correspond to the entity definition can be identified, and the informational field (the second field-value pair) can be attributed to those events, indicating that those events satisfy at least a part of the event selection criteria that includes the first field-value pair. If the event selection criteria includes at least one other event selection criterion, a further determination can be made as to whether the above events satisfy the at least one other event selection criteria.


At block 35305, the computing machine returns a search query result pertaining to events that satisfy the event selection criteria received in the search query. For example, the search result can include at least portions of the events that satisfy the event selection, the number of the events that satisfy the event selection criteria (e.g., 0, 1, . . . 100, etc.), or any other pertinent data.


Referring again to FIG. 10AB, an entity definition includes an alias 35204 and info field 35205. Referring now again to FIG. 10AC, if a search query is submitted with an event selection criteria including “owner=brent” (a first field-value pair), a data store including various entity definitions is searched to find at least one entity definition having an information field (a second field-value pair) that matches the first field-value pair of “owner=brent.” As a result, entity definition 35201 is located and alias 35204 (a third field-value pair) is obtained and used to arrive at events in the machine data store that include a value matching “1.1.1.1” in the field named “ip.” Those events satisfy at least a part of the event selection criteria that includes the first field-value pair. Alternate orders for satisfying individual search criteria during a search are possible.


In some implementations, informational fields can also be used to filter entities or entity definitions. In particular, a service monitoring data store can be searched for entities or entity definitions having an informational field that matches one or more search criteria.



FIG. 10AD-10AE illustrate examples of GUIs facilitating user input for filtering entity definitions using informational field-value data, in accordance with one or more implementations of the present disclosure. In FIG. 10AD, GUI 35400 includes a search field 35410. Search field 35410 can receive user input including a search query command (e.g., “getentity” or “getentity generate”). In one implementation, execution of the command identifies one or more entity definitions. The specific “getentity” or “getentity generate” command may return all or a subset of all entity definitions that have been created, without using any specific filtering criteria. Additional filtering may be performed (e.g., using information fields), as shown in FIG. 10AE. A corresponding entry for each entity definition may be displayed in search results region 35420 of GUI 35400. In one implementation, various columns are displayed for each entry in search results region 35420, including for example, informational field names 35421, informational field values 35422, particular informational field names 35423 and 35424, alias names 35425, alias values 35426 and particular alias names 35427. The informational field names column 35421 may include a name or other identifier of the metadata field names associated with the corresponding entity definition (e.g., “os,” “utensil,” “site,” “entity_type”). The informational field values column 35422 may include the metadata values that correspond to the metadata field names associated with the corresponding entity definition (e.g., “linux,” “fork,” “Omaha,” “link_layer_all_traffic”). The particular informational field names columns 35423 and 35424 may include a name or other identifier of one of the metadata field names associated with the corresponding entity definition (e.g., “os” 35423 and “site” 35424). The values in these columns may include the corresponding metadata values (e.g., “linux” and “Omaha,” respectively). The alias names column 35425 may include a name or other identifier of the alias field names associated with the corresponding entity definition (e.g., “dest_mac,” “src_mac,” “dvc_mac”). The alias values column 35426 may include the alias values that correspond to the alias field names associated with the corresponding entity definition (e.g., “10:10:10:10:40:40”). The particular alias name column 35427 may include a name or other identifier of one of the alias field names associated with the corresponding entity definition (e.g., “src_mac”) and the values in this columns may include the corresponding alias values (e.g., “10:10:10:10:40:40”).


Referring to FIG. 10AE, GUI 35500 also includes a search field 35510. Search field 35510 can receive user input including a search query command (e.g., “getentity” or “getentity generate”) as well as selection criteria including a first-field value pair. As described above, execution of the “getentity” or “getentity generate” command” returns all or a subset of all entity definitions that have been created. The inclusion of the selection criteria (e.g., “search os=linux”) further filters the results of the “getentity” or “getentity generate” command to limit the returned entity definitions to those having an informational field-value pair that matches the selection criteria. A corresponding entry for each filtered entity definition may be displayed in search results region 35520 of GUI 35500. In one implementation, various columns are displayed for each entry in search results region 35520, including for example, informational field column 35521 and alias columns 35522 and 35523. In the illustrated example, there is only one entry in search results region 35520 indicating that only one entity definition included an informational field-value pair that matched the selection criteria entered in search field 35510. As shown, the entry includes an information field column 25521 named “os” which includes the value of “linux.” This metadata field name and metadata value match the query field name and query value (i.e., “os=linux”) from the event selection criteria. In the illustrated example, the entry also includes at least two alias columns 35522 and 35523. These alias columns “dest_mac” 35522 and “src_mac” 35523 include alias values (e.g., “10:10:10:10:40:40”) that can be used to locate events in a machine data store that satisfy the event selection criteria. By having the information field and aliases stored as part of the entity definition, the informational field values can be associated with the events that are determined to correspond to the entity using an alias. Upon having identified the entity definition, the computing machine can locate and return events from the machine data store that satisfy the event selection criteria. As such, the user can filter events using the information fields.


Embodiments are possible where the entity name (as represented in the entity name component of an entity definition) may be treated as a de facto entity alias. This is useful where the value of the entity name is likely to appear in event data and so, like an alias value, can be used to identify an event with the entity. Accordingly, one of skill recognizes that foregoing teachings about aliases can be sensibly expanded to include entity names.


A service monitoring system of some embodiments may include the capability to practice methods to automatically update information that defines the entities that perform services that the system is monitoring. Of the updates that can occur through the use of such methods, none may be more valuable than updating the information by creating a new entity definition for an entity newly added to the monitored environment. In some environments, machine data generated by or about a new entity may be received and collected before a corresponding entity definition was or could have been created through a more manual or administrative approach. In one example, machine data for an entity may be collected by an event processing system for purposes other than service monitoring well in advance of the service monitoring need. In another example, meeting service level agreements in a high-speed, high-volume, high-demand, hot-swappable IT environment requires technicians to frequently and without notice remove, add, replace, and reconfigure machinery in the IT environment faster than the changes can be accurately and reliably reflected in the service monitoring system. The methods now described enable an embodiment to take advantage of machine data collected for an undefined entity to discover the entity and to glean the information necessary to create a working entity definition in the service monitoring system.



FIG. 10AF is a flow diagram of a method addressing automatic updating of a set of stored entity definitions, including depictions of certain components in the computing environment. The processing performed in the illustrative method and environment 10100 of FIG. 10AF is principally discussed in relation to Receive and Store Machine Data block 10110, Identify Undefined Entity block 10112 and its associated timer 10112a, Derive Descriptive Content block 10114, Store Entity Definition block 10116, Utilize Entity Definition block 10118, Background block 10120, and relationships and control flow therebetween. Discussion of the method processing is enhanced by consideration of certain aspects of an example computing environment. Those aspects, as illustrated, include a configuration of machine entities that generate or otherwise supply machine data, and a selection of information available to the method from computer-readable storage. The configuration of machines includes machine A 10130, machine B 10132, machine C 10134, machine D 10136, considered collectively as the pre-existing entities 10102, and machine E 10138, considered for purposes of illustration as a newly added machine. The variety of information in computer-readable storage 10140 includes DA Content 10142, Machine Data 10144, a set of Entity Definitions 10148, and single Service Definition 10150. Service Definition 10150 further includes entity association rule 10156, and KPI definitional information 10152 that includes search query (SQ) 10154. Entity Definitions 10148 further includes a set of pre-existing entity definitions 10104 and a single entity definition 10170 that includes name information 10172, alias information 10174, and info field information 10176. For purposes of illustration entity definition 10170 is considered a newly added entity definition. Connection 10128 illustrates the connection between the processing blocks of the method and computer-readable storage 10140. Computer-readable storage 10140 should be understood as able to encompass storage apparatus and mechanisms at any level and any combination of levels in a storage hierarchy at one time, and able to encompass at one time transient and persistent, volatile and non-volatile, local and remote, host- and network-attached, and other computer-readable storage. Moreover, commonly identified collections of data such as DA Content 10142, Machine Data 10144, Service Definition 10150, and Entity Definitions 10148, should each be understood as able to have its constituent data stored in and/or across one or more storage mechanisms implementing storage 10140.


The method illustrated and discussed in relation to FIG. 10AF may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method may be performed by a client computing machine. In another implementation, the method may be performed by a server computing machine coupled to the client computing machine over one or more networks.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks). However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, the acts can be subdivided or combined. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media.


Processing for the method illustrated by FIG. 10AF that supports, for example, automatic entity definition for a service monitoring system begins at block 10110. At block 10110, machine data is received from a number of machine entities, each a data source, and processed for storage in a machine data store 10144. The types of machines or entities from which block 10110 may receive machine data are wide and varied and may include computers of all kinds, network devices, storage devices, virtual machines, servers, embedded processors, intelligent machines, intelligent appliances, sensors, telemetry, and any other kind or category of data generating device as may be discussed within this document or appreciated by one of skill in the art. The machine data may be minimally processed before storage and may be organized and stored as a collection of timestamped events. The processing of block 10110 may be performed by an event processing system such as disclosed and discussed elsewhere in this detailed description including, for example, the discussion related to FIGS. 76-79A. The processing of block 10110 receives machine data from pre-existing machines 10102 as well as from newly added machine 10138. The heavy lines showing connections between the entity machines of FIG. 10AF illustrate operational connections as may exist between machines in a computing environment. The operational connections may be based on data transfer, processing flow, or some other connection. The operational connections may provide a basis for one machine to generate or supply machine data pertaining to a different machine.


As illustrated by way of example, FIG. 10AF depicts block 10110 receiving from entity machine A 10130 machine data pertaining to entity machines A, D, and E; receiving from entity machine B 10132 machine data pertaining to itself (i.e., machine B); receiving from entity machine C 10134 machine data pertaining to entity machines C, and D; and receiving from entity machine E 10138 machine data pertaining to itself (i.e., machine E). The variability shown permits one of skill in the art to appreciate the variability with which machine data pertaining to a particular machine entity may be received at block 10110, including receiving data from a single machine which is itself, a single machine which is a different machine, multiple machines including itself, and multiple machines apart from itself. Notably, the processing of block 10110 may be largely or completely agnostic to service monitoring processes or activities, or to any notion of entities or entity definitions in a service monitoring context.


After the processing and storage represented by block 10110 the machine data can be accessed from the machine data store 10144. The machine data may be stored in machine data store 10144 in accordance with a data model in an embodiment, and the data model may represent a portion of, be derived from, or have accordance with content of DA Content 10142. Where the processing of block 10110 is performed using the capabilities of an event processing system, the event processing system may provide an exclusive or best capability for accessing the data of the machine data store 10144. The event processing system of some embodiments may provide a robust search query processing capability to access and process the machine data of the machine data store 10144. The processing of Receive and Store Machine Data block 10110 may be continuously performed in an embodiment, collecting operational data on an ongoing basis and amassing a wealth of stored machine data. At some point after block 10110 has received and stored machine data pertaining to newly added entity E 10138, the processing of block 10112, Identify Undefined Entity, can begin.


At block 10112, machine data received and stored at block 10110 is processed to identify any undefined entities as possible. As the processing of block 10112 begins, entity definitions 10148 includes only pre-existing definitions 10104, as definition 10170 is yet to be created by the method now being discussed.


The identification process of block 10112 uses identification criteria in one embodiment. For the example now discussed, the identification criteria is maintained in storage 10140 as part of DA Content 10142. Other embodiments and examples may include identification criteria stored or reflected elsewhere.


DA Content 10142 may be introduced into storage by the installation of a Domain Add-on facility as part of or as an extension of a service monitoring system. A domain add-on facility may include computer program code or process specification information in another form such as control parameters. A domain add-on facility may include data components in an embodiment. Data components may include customization and tailoring information such as configuration parameters, option selections, and extensible menu options, for example. Data components may also include templates, models, definitions, patterns, and examples. Templates for a service or entity definition, and an operationally-ready KPI definition are illustrative examples of such data components. Some aspects included in DA Content 10142 may be a mixture of process specification and data component information or may be otherwise difficult to clearly categorize as being one or the other. DA content 10142 in an embodiment may represent the codification of expert knowledge for a specific domain of knowledge such as workload balancing or web services provision within the field of Information Technology, and specifically applying that expert knowledge to service monitoring.


The identification criteria of DA Content 10142 in the example 10100 illustrated in FIG. 10AF may specify data selection criteria for selecting or identifying data of machine data 10144 useful for discovering undefined entities (i.e., machines that perform a service but do not have an entity definition in existence when a discovery attempt begins). The data selection criteria may include regular expressions (REGEX) expressions and/or may be in the form of a complete or partial search query ready for processing by an event processing system, in some embodiments. Such data selection criteria may include aspects for selecting machine data from multiple sources possibly associated with multiple source types. Such data selection criteria may include conditional factors extending beyond the condition of matching certain data values to include conditions requiring, certain relationships to exist between multiple data items or requiring a certain data item location, for example. For example, a data selection criteria may specify that an IP address field is to be selected if its value matches the pattern “192.168.10.*” but only if it also appears in a log data event with a sourceID matching the sourceID in a network event of a particular type within a particular timeframe.


The identification criteria may include information specifying the process used to identify an undefined entity from machine data at block 10112, or some aspect of the process. The information specifying the process may be a module of computer program code written in a programming language such as Java or Python, or may be a set of control parameters used at block 10112 to determine the pattern or flow of processing it actually performs in order to identify an undefined entity, for example. The identification criteria may include these and any other criteria affecting, defining, determining, or specifying the process or algorithm(s) being effected or exercised to perform the identification.


Identification criteria may include criteria to prevent or minimize false positive and/or false-negative identifications. Identification criteria may include criteria for inclusion or exclusion based on the sources of machine data pertaining to an entity represented in machine data 10144. For example, identification criteria may include criteria that results in the identification of an undefined entity where the entity has machine data pertaining to itself in machine data 10144 produced only by itself, or by itself and another entity, or by only one other entity, or by multiple other entities and not itself. As another example, the criteria mentioned in the preceding example can be expanded to specify that the entity and/or one or more of the other entities produces machine data associated with a particular source type or types.


Identification criteria may include criteria limiting the identification of undefined entities to machine entities discovered or suspected to be performing an existing service or performing work relevant to a service type of interest. The service type of interest may be known because an existing service of that type is already being monitored or because of domain add-on content having been installed, selected, implemented, or otherwise activated by the user. These and other identification criteria are possible.


When any predefined, customized, or configured process for identifying one or more undefined entities using applicable identification criteria at block 10112 is wholly or partially complete and successful, processing can advance to block 10114. Machine entity E 10138 is assumed for purposes of illustration to have been successfully identified by the processing of block 10112, in this discussion.


In some embodiments the processing of block 10112 is automatically repeated on a regular basis as represented in FIG. 10AF by icon 10112a. The regular basis may be defined in terms of a repetition frequency or a schedule. The regular basis may also be defined in terms of a predictable execution in response to an event, for example, performing the processing of block 10112 every time block 10110 stores a 50 GB increment of machine data, or at sometime overnight whenever that event occurs. Other regular execution schemes are possible, and on-demand, user-initiated execution represents an alternative or supplementary implementation.


At block 10114, descriptive information about an entity identified at block 10112 is derived in whole or in part from machine data of 10144 pertaining to the entity. (A real-time or near real-time implementation may instead use machine data directly from block 10110 before it is added to machine data store 10144.) The descriptive information is used to populate the content of an entity definition such as entity definition 10170. The particular items or components of the entity definition populated with the derived descriptive information may be identified by DA Content 10142 in one embodiment. In one embodiment, DA content 10142 may provide procedural code or information specifying in whole or in part how to derive the descriptive information from machine data. These and other embodiments are possible.


As an illustrative example, the derivation of descriptive content for newly added machine E 10138 is now described. Based on an entity definition template included in DA Content 10142, processing block 10114 undertakes to derive descriptive content including a hostname field as name information, an IP address as alias information, and an operating system identification as info field information. (FIGS. 10B-10C and the related descriptions, for example, provide additional information on entity definition formats and contents in example embodiments.) Certain machine data pertaining to machine E 10138 that was encountered during the processing of block 10112 is available during the processing of block 10114 described here. Entity E provided machine data in the form of a security exceptions log file in which it identified itself using the hostname “WEBSF211.” The entity definition template of DA Content 10142 indicates that a hostname field is a valid source for name information and, accordingly, block 10114 harvests the hostname from the security exceptions log data and formats it for inclusion in new entity definition 10170 as block 10172. Entity A 10130 provided machine data in the form of an error log that included an entry having hostname “WEBSF211” appearing in conjunction with IP address 10.250.15.56. (The conjunction may have been determined by search criteria, extraction rules, late-binding schemas, and/or other information of an entity processing system storing the machine data in one embodiment, or by using DA Content 10142, or by some other means.) Accordingly, block 10114 harvests the IP address from the error log machine data and formats it for inclusion in new entity definition 10170 as block 10174. Entity A further provided machine data in the form of an inventory record having hostname “WEBSF211” appearing in conjunction with a software version field with the value “Apache_httpd_2.4.16_L.” DA Content 10142 was able to draw the correspondence between the software version and the use of the LINUX operating system. Accordingly, block 10114 formats the operating system information for inclusion in new entity definition 10170 as block 10176.


At block 10114, the derived descriptive content along with any additional information including, possibly, information from an entity definition template of DA Content 10142, is prepared for storage as an entity definition. Preparing information for storage as an entity definition may include organizing the information into a particular order or structure, in one embodiment. Preparing information for storage as an entity definition may include formatting the information into a request format, such as a function call, procedure call, RPC, HTTP request, or the like. These and other embodiments are possible. Processing may then proceed to block 10116.


At block 10116, the derived descriptive content of block 10114 is stored as an entity definition of the service monitoring system, such as entity definition 10170. In one embodiment the processing described in relation to blocks 10112 and 10114 is effected by a search query. The search query produces its results in a format compatible with a method for updating entity definitions as described or suggested by FIG. 10D or 10Q and the related discussion. The processing described in relation to block 10116 is then effected by executing an implementation of a method described or suggested by FIG. 10D or 10Q and the related discussion.


Once stored at block 10116, the new entity definition is available for use in the service monitoring system, and is shown in use in FIG. 10AF at block 10118. In one example use, information from the entity definition may be displayed in a GUI permitting a user to update the entity definition. See for example, FIG. 9C and the related discussion. In another example use, information from the entity definition may be displayed in a GUI permitting a user to select entities to associate with the service. See for example, FIG. 15 and the related discussion. In another example use, a KPI search query, such as search query 10154 of KPI 10152, may use information from entity definition 10170 such as alias information 10174, to identify machine data in the machine data store 10144 for use in determining a KPI value. In another example use, a search query based on a rule in a service definition, such as rule 10156, may be executed to identify entities that should be associated with a particular service definition such as 10150, and to make that association. See for example, FIG. 17D and the related discussion. In some embodiments, a rule-based search query to associate entities with a service may be executed on a regular time-based or event-driven basis as part of background processing. Such background processing is represented in FIG. 10AF by block 10120 and represents ongoing use of entity definitions 10148, including newly created entity definition 10170. Execution of KPI search queries that may rely on entity definition information to identify machine data also occur in background processing in some embodiments.


While the preceding discussion has focused on using machine data to identify new machine entities and to create entity definitions for them, one of skill will appreciate from this disclosure that the method of 10100 as disclosed and described may be adapted to achieve updates or deletions for entity definitions 10148 based on received and stored machine data and their patterns. For example, identification criteria for a deletion could specify that a machine not supplying data for 4 weeks or more is to be deleted. As another example, identification criteria for a modification could specify that where an old alias value is absent from machine data for at least 7 days, and where a new alias value is seen consistently for the same 7 days, then the old alias value should be replaced in the entity definition with the new alias value. These and other embodiments enabled to one of skill in the art by the disclosure of 10100 are possible.



FIG. 11 is a flow diagram of an implementation of a method 1100 for creating a service definition for a service, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 1102, the computing machine receives input of a title for referencing a service definition for a service. At block 1104, the computing machine receives input identifying one or more entities providing the service and associates the identified entities with the service definition of the service at block 1106.


At block 1108, the computing machine creates one or more key performance indicators for the service and associates the key performance indicators with the service definition of the service at block 1110. Some implementations of creating one or more key performance indicators are discussed in greater detail below in conjunction with FIGS. 19-31.


At block 1112, the computing machine receives input identifying one or more other services which the service is dependent upon and associates the identified other services with the service definition of the service at block 1114. The computing machine can include an indication in the service definition that the service is dependent on another service for which a service definition has been created.


At block 1116, the computing machine can optionally define an aggregate KPI score to be calculated for the service to indicate an overall performance of the service. The score can be a value for an aggregate of the KPIs for the service. The aggregate KPI score can be periodically calculated for continuous monitoring of the service. For example, the aggregate KPI score for a service can be updated in real-time (continuously updated until interrupted). In one implementation, the aggregate KPI score for a service is updated periodically (e.g., every second). Some implementations of determining an aggregate KPI score for the service are discussed in greater detail below in conjunction with FIGS. 32-34.



FIG. 12 illustrates an example of a GUI 1200 of a service monitoring system for creating and/or editing service definitions, in accordance with one or more implementations of the present disclosure. GUI 1200 can display a list 1202 of service definitions that have already been created. Each service definition in the list 1202 can include a button 1204 to proceed to a drop-down menu 1208 listing editing options related to the corresponding service definition. Editing options can include editing the service definition, editing one or more KPIs for the service, editing a title and/or description of the service description, and/or deleting the service definition. When an editing option is selected from the drop-down menu 1208, one or more other GUIs can be displayed for editing the service definition. GUI 1200 can include a button 1210 to proceed to the creation of a new service definition.



FIG. 13 illustrates an example of a GUI 1300 of a service monitoring system for creating a service definition, in accordance with one or more implementations of the present disclosure. GUI 1300 can facilitate user input specifying a title 1302 and optionally a description 1304 for the service definition for a service. GUI 1300 can include a button 1306 to proceed to GUI 1400 of FIG. 14, for associating entities with the service, creating KPIs for the service, and indicating dependencies for the service.



FIG. 14 illustrates an example of a GUI 1400 of a service monitoring system for defining elements of a service definition, in accordance with one or more implementations of the present disclosure. GUI 1400 can include an accordion pane (accordion section) 1402, which when selected, displays fields for facilitating input for creating and/or editing a title 1404 of a service definition, and input for a description 1406 of the service that corresponds to the service definition. If input for the title 1404 and/or description 1406 was previously received, for example, from GUI 1300 in FIG. 13, GUI 1400 can display the title 1404 and description 1406.


GUI 1400 can include a drop-down 1410 for receiving input for creating one or more KPIs for the service. If the drop-down 1410 is selected, GUI 1900 in FIG. 19 is displayed as described in greater detail below.


GUI 1400 can include a drop-down 1412 for receiving input for specifying dependencies for the service. If the drop-down 1412 is selected, GUI 1800 in FIG. 18 is displayed as described in greater detail below.


GUI 1400 can include one or more buttons 1408 to specify whether entities are associated with the service. A selection of “No” 1416 indicates that the service is not associated with any entities and the service definition is not associated with any entity definitions. For example, a service may not be associated with any entities if an end user intends to use the service and corresponding service definition for testing purposes and/or experimental purposes. In another example, a service may not be associated with any entities if the service is dependent one or more other services, and the service is being monitored via the entities of the one or more other services upon which the service depends upon. For example, an end user may wish to use a service without entities as a way to track a business service based on the services which the business service depends upon. If “Yes” 1414 is selected, GUI 1500 in FIG. 15 is displayed as described in greater detail below.



FIG. 15 illustrates an example of a GUI 1500 of a service monitoring system for associating one or more entities with a service by associating one or more entity definitions with a service definition, in accordance with one or more implementations of the present disclosure. GUI 1500 can include a button 1510 for creating a new entity definition. If button 1510 is selected, GUI 1600 in FIG. 16 is displayed facilitating user input for creating an entity definition.



FIG. 16 illustrates an example of a GUI 1600 facilitating user input for creating an entity definition, in accordance with one or more implementations of the present disclosure. For example, GUI 1600 can include multiple fields 1601 for creating an entity definition, as discussed above in conjunction with FIG. 6. GUI 1600 can include a button 1603, which when selected can display one or more UIs (e.g., GUIs or command line interface) for importing a data file for creating an entity definition. The data file can be a CSV (comma-separated values) data file that includes information identifying entities in an environment. The data file can be used to automatically create entity definitions for the entities described in the data file. GUI 1600 can include a button 1605, which when selected can display one or more UIs (e.g., GUIs or command line interface) for using a saved search for creating an entity definition. For example, the computing machine can execute a search query from a saved search to extract data to identify an alias for an entity in machine data from one or more sources, and automatically create an entity definition for the entity based on the identified aliases.


Referring to FIG. 15, GUI 1500 can include an availability list 1504 of entity definitions for entities, which can be selected to be associated with the service definition. The availability list 1504 can include one or more entity definitions. For example, the availability list 1504 may include thousands of entity definitions. GUI 1500 can include a filter box 1502 to receive input for filtering the availability list 1504 of entity definitions to display a portion of the entity definitions. Each entity definition in the availability list 1502 can include the entity definition name 1506 and the entity type 1508. GUI 1500 can facilitate user input for selecting an entity definition from the availability list 1504 and dragging the selected entity definition to a selected list 1512 to indicate that the entity for the selected entity definition is associated with service of the service definition. For example, entity definition 1514 (e.g., webserver01.splunk.com) can be selected and dragged to the selected list 1512.



FIG. 17A illustrates an example of a GUI 1700 indicating one or more entities associated with a service based on input, in accordance with one or more implementations of the present disclosure. The selected list 1712 can include the entity definition (e.g., webserver01.splunk.com) that was dragged from the availability list 1704. The availability list 1704 can remove any selected entity definitions (e.g., webserver01.splunk.com). The selected list 1712 indicates which entities are members of a service via the entity definitions of the entities and service definition for the service.



FIG. 17B illustrates an example of the structure 1720 for storing a service definition, in accordance with one or more implementations of the present disclosure. A service definition can be stored in a service monitoring data store as a record that contains information about one or more characteristics of a service. Various characteristics of a service include, for example, a name of the service, the entities that are associated with the service, the key performance indicators (KPIs) for the service, one or more other services that depend upon the service, one or more other services which the service depends upon, and other information pertaining to the service.


The service definition structure 1720 includes one or more components. Each service definition component relates to a characteristic of the service. For example, there is a service name component 1721, one or more entity filter criteria components 1723A-B, one or more entity association indicator components 1725, one or more KPI components 1727, one or more service dependencies components 1729, and one or more components for other information 1731. The characteristic of the service being represented by a particular component is the particular service definition component's type. In one implementation, the entity filter criteria components 1723A are stored in a service definition. In another implementation, the entity filter criteria components 1723B are stored in association with a service definition (e.g., separately from the service definition but linked to the service definition using, for example, identifiers of the entity filter criteria components 1723B and/or an identifier of the service definition).


The entity definitions that are associated with a service definition can change. In one implementation, as described above in conjunction with FIG. 15, users can manually and explicitly select entity definitions from a list (e.g., list 1504 in GUI 1500 in FIG. 15) of pre-defined entities to include in a service definition to reflect the environment changes. In another implementation, the entity filter criteria component(s) 1723A-B can include filter criteria that can be used for automatically identifying one or more entity definitions to be associated with the service definition without user interaction. The filter criteria in the entity filter criteria components 1723A-B can be processed to search the entity definitions that are stored in a service monitoring data store for any entity definitions that satisfy the filter criteria. The entity definitions that satisfy the filter criteria can be associated with the service definition. The entity association indicator component(s) 1725 can include information that identifies the one or more entity definitions that satisfy the filter criteria and associates those entity definitions with the service definition, thereby creating an association between a service and one or more entities. One implementation for using filter criteria and entity association indicators to identify entity definition(s) and to associate the identified entity definition(s) with a service definition is described in greater detail below in conjunction with FIGS. 17C-17D.


The KPI component(s) 1727 can include information that describes one or more KPIs for monitoring the service. As described above, a KPI is a type of performance measurement. For example, various aspects (e.g., CPU usage, memory usage, response time, etc.) of the service can be monitored using respective KPIs.


The service dependencies component(s) 1729 can include information describing one or more other services for which the service is dependent upon, and/or one or more other services which depend on the service being represented by the service definition. In one implementation, a service definition specifies one or more other services which a service depends upon and does not associate any entities with the service, as described in greater detail below in conjunction with FIG. 18. In another implementation, a service definition specifies a service as a collection of one or more other services and one or more entities. Each service definition component stores information for an element. The information can include an element name and one or more element values for the element.


In one implementation, the element name—element value pair(s) within a service definition component serves as a field name-field value pair for a search query. In one implementation, the search query is directed to search a service monitoring data store storing service monitoring data pertaining to the service monitoring system. The service monitoring data can include, and is not limited to, entity definition, service definitions, and key performance indicator (KPI) specifications.


In one example, an element name—element value pair in the entity filter criteria component 1723A-B in the service definition can be used to search the entity definitions in the service monitoring data store for the entity definitions that have matching values for the elements that are named in the entity filter criteria component 1723A-B.


Each entity filter criteria component 1723A-B corresponds to a rule for applying one or more filter criteria defined by the element name-element value pair to the entity definitions. A rule for applying filter criteria can include an execution type and an execution parameter. User input can be received specifying filter criteria, execution types, and execution parameters via a graphical user interface (GUI), as described in greater detail below. The execution type specifies whether the rule for applying the filter criteria to the entity definitions should be executed dynamically or statically. For example, the execution type can be static execution or dynamic execution. A rule having a static execution type can be executed to create associations between the service definition and the entity definitions on a single occurrence based on the content of the entity definitions in a service monitoring data store at the time the static rule is executed. A rule having a dynamic execution type can be initially executed to create current associations between the service definition and the entity definitions, and can then be re-executed to possibly modify those associations based on the then-current content of the entity definitions in a service monitoring data store at the time of re-execution. For example, if the execution type is static execution, the filter criteria can be applied to the entity definitions in the service monitoring data store only once. If the execution type is dynamic execution, the filter criteria can automatically be applied to the entity definitions in the service monitoring data store repeatedly.


The execution parameter specifies when the filter criteria should be applied to the entity definitions in the service monitoring data store. For example, for a static execution type, the execution parameter may specify that the filter criteria should be applied when the service definition is created or when a corresponding filter criteria component is added to (or modified in) the service definition. In another example, for a static execution type, the execution parameter may specify that the filter criteria should be applied when a corresponding KPI is first calculated for the service.


For a dynamic execution type, the execution parameter may specify that the filter criteria should be applied each time a change to the entity definitions in the service monitoring data store is detected. The change can include, for example, adding a new entity definition to the service monitoring data store, editing an existing entity definition, deleting an entity definition, etc. In another example, the execution parameter may specify that the filter criteria should be applied each time a corresponding KPI is calculated for the service.


In one implementation, for each entity definition that has been identified as satisfying any of the filter criteria in the entity filter criteria components 1723A-B for a service, an entity association indicator component 1725 is added to the service definition 1720.



FIG. 17C is a block diagram 1750 of an example of using filter criteria to dynamically identify one or more entities and to associate the entities with a service, in accordance with one or more implementations of the present disclosure.


A service monitoring data store can store any number of entity definitions 1751A-B. As described above, an entity definition 1751A-B can include an entity name component 1753A-B, one or more alias components 1755A-D, one or more informational field components, one or more service association components 1759A-B, and one or more other components for other information. A service definition 1760 can include one or more entity filter criteria components 1763A-B that can be used to associate one or more entity definitions 1751A-B with the service definition.


A service definition can include a single service name component that contains all of the identifying information (e.g., name, title, key, and/or identifier) for the service. The value for the name component type in a service definition can be used as the service identifier for the service being represented by the service definition. For example, the service definition 1760 includes a single entity name 1761 component that has an element name of “name” and an element value of “TestService”. The value “TestService” becomes the service identifier for the service that is being represented by service definition 1760.


There can be one or multiple components having the same service definition component type. For example, the service definition 1760 has two entity filter criteria component types (e.g., entity filter criteria components 1763A-B). In one implementation, some combination of a single and multiple components of the same type are used to store information pertaining to a service in a service definition.


Each entity filter criteria component 1763A-B can store a single filter criterion or multiple filter criteria for identifying one or more of the entity definitions (e.g., entity definitions 1751A-B). For example, the entity filter criteria component 1763A stores a single filter criterion that includes an element name “dest” and a single element value “192.*” A value can include one or more wildcard characters as described in greater detail below in conjunction with FIG. 17H. The entity filter criterion in component 1763A can be applied to the entity definitions 1753A-B to identify the entity definitions that satisfy the filter criterion “dest=192.*” Specifically, the element name-element value pair can be used for a search query. For example, a search query may search for fields named “dest” and containing a value that begins with the pattern “192.”.


An entity filter criteria component that stores multiple filter criteria can include an element name and multiple values. In one implementation, the multiple values are treated disjunctively. For example, the entity filter criteria 1763B include an element name “name” and multiple values “192.168.1.100” and “hope.mbp14.local”. The entity filter criteria in component 1763B can be applied to the entity definition records 1753A-B to identify the entity definitions that satisfy the filter criteria “name=192.168.1.100” or “name=hope.mbp14.local”. Specifically, the element name and element values can be used for a search query that uses the values disjunctively. For example, a search query may search for fields in the service monitoring data store named “name” and having either a “192.168.1.100” or a “hope.mbp14.local” value.


An element name in the filter criteria in an entity filter criteria component 1763A-B can correspond to an element name in an entity name component (e.g., entity name component 1753A-B), an element name in an alias component (e.g., alias component 1755A-D), or an element name in an informational field component (not shown) in at least one entity definition 1753A-B in a service monitoring data store. The filter criteria can be applied to the entity definitions in the service monitoring data store based on the execution type and execution parameter in the entity filter criteria component 1763A-B.


In one implementation, an entity association indicator component 1765A-B is added to the service definition 1760 for each entity definition that satisfies any of the filter criteria in the entity filter criteria component 1763A-B for the service. The entity association indicator component 1765A-B can include an element name-element value pair to associate the particular entity definition with the service definition. For example, the entity definition record 1751A satisfies the rule “dest=192.*” and the entity association indicator component 1765A is added to the service definition record 1760 to associate the entity definition record 1751A with the TestService specified in the service definition record 1760.


In one implementation, for each entity definition that has been identified as satisfying any of the filter criteria in the entity filter criteria components 1763A-B for a service, a service association component 1758A-B is added to the entity definition 1751A-B. The service association component 1758A-B can include an element name-element value pair to associate the particular service definition 1760 with the entity definition 1751A. For example, the entity definition 1751A satisfies the filter criterion “dest=192.*” associated with the service definition 1760, and the service association component 1758A is added to the entity definition 1751A to associate the TestService with the entity definition 1753A.


In one implementation, the entity definitions 1751A-B that satisfy any of the filter criteria in the service definition 1760 are associated with the service definition automatically. For example, an entity association indicator component 1765A-B can be automatically added to the service definition 1760. In one example, an entity association indicator component 1765A-B can be added to the service definition 1760 when the respective entity definition has been identified.


As described above, the entity definitions 1751A-B can include alias components 1755A-D for associating machine data (e.g., machine data 1-4) with a particular entity being represented by a respective entity definition 1751A-B. For example, entity definition 1753A includes alias component 1755A-B to associate machine data 1 and machine data 2 with the entity named “foobar”. When any of the entity definition components of an entity definition satisfy filter criteria in a service definition 1760, all of the machine data that is associated with the entity named “foobar” can be used for the service being represented by the service definition 1760. For example, the alias component 1755A in the entity definition 1751A satisfies the filter criteria in entity filter criteria 1763A. If a KPI is being determined for the service “TestService” that is represented by service definition 1760, the KPI can be determined using machine data 1 and machine data 2 that are associated with the entity represented by the entity definition 1751A, even though only machine data 1 (and not machine data 2) is associated with the entity represented by definition record 1751A via alias 1755A (the alias used to associate entity definition record 1751A with the service represented by definition record 1760 via filter criteria 1763 A).


When filter criteria in the entity filter criteria components 1763A-B are applied to the entity definitions dynamically, changes that are made to the entity definitions 1753A-B in the service monitoring data store can be automatically captured by the entity filter criteria components 1763A-B and reflected, for example, in KPI determinations for the service, even after the filter criteria have been defined. The entity definitions that satisfy filter criteria for a service can be associated with the respective service definition even if a new entity is created significantly after a rule has already been defined.


For example, a new machine may be added to an IT environment and a new entity definition for the new machine may be added to the service monitoring data store. The new machine has an IP address containing “192.” and may be associated with machine data X and machine data Y. The filter criteria in the entity filter criteria component 1763 can be applied to the service monitoring data store and the new machine can be identified as satisfying the filter criteria. The association of the new machine with the service definition 1760 for TestService is made without user interaction. An entity association indicator for the new machine can be added to the service definition 1760 and/or a service association can be added to the entity definition of the new machine. A KPI for the TestService can be calculated that also takes into account machine data X and machine data Y for the new machine.


As described above, in one implementation, a service definition 1760 stores no more than one component having a name component type. The service definition 1760 can store zero or more components having an entity filter criteria component type, and can store zero or more components having an informational field component type. In one implementation, user input is received via a GUI (e.g., service definition GUI) to add one or more other service definition components to a service definition record.


Various implementations may use a variety of data representation and/or organization for the component information in a service definition record based on such factors as performance, data density, site conventions, and available application infrastructure, for example. The structure (e.g., structure 1720 in FIG. 17B) of a service definition can include rows, entries, or tuples to depict components of an entity definition. A service definition component can be a normalized, tabular representation for the component, as can be used in an implementation, such as an implementation storing the entity definition within an RDBMS. Different implementations may use different representations for component information; for example, representations that are not normalized and/or not tabular. Different implementations may use various data storage and retrieval frameworks, a JSON-based database as one example, to facilitate storing entity definitions (entity definition records). Further, within an implementation, some information may be implied by, for example, the position within a defined data structure or schema where a value, such as “192.*” in FIG. 17C, is stored—rather than being stored explicitly. For example, in an implementation having a defined data structure for a service definition where the first data item is defined to be the value of the name element for the name component of the service, only the value need be explicitly stored as the service component and the element name (name) are known from the data structure definition.



FIG. 17D is a flow diagram of an implementation of a method 1740 for using filter criteria to associate entity definition(s) with a service definition, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 1741, the computing machine causes display of a graphical user interface (GUI) that enables a user to specify filter criteria for identifying one or more entity definitions. An example GUI that enables a user to specify filter criteria is described in greater detail below in conjunction with FIG. 17E.


At block 1743, the computing machine receives user input specifying one or more filter criteria corresponding to a rule. A rule with a single filter criterion can include an element name—element value pair where there is a single value. For example, the single filter criterion may be “name=192.168.1.100”. A rule with multiple filter criteria can include an element name and multiple values. The multiple values can be treated disjunctively. For example, the multiple criteria may be “name=192.168.1.100 or hope.mbp14.local”. In one example, an element name in the filter criteria corresponds to an element name of an alias component in at least one entity definition in a data store. In another example, an element name in the filter criteria corresponds to an element name of an informational field component in at least one entity definition in the data store.


At block 1744, the computing machine receives user input specifying an execution type and execution parameter for each rule. The execution type specifies how the filter criteria should be applied to the entity definitions. The execution type can be static execution or dynamic execution. The execution parameter specifies when the filter criteria should be applied to the entity definitions. User input can be received designating the execution type and execution parameter for a particular rule via a GUI, as described below in conjunction with FIG. 17H.


Referring to FIG. 17D, at block 1745, the computing machine stores the filter criteria in association with a service definition. The filter criteria can be stored in one or more entity filter criteria components. In one implementation, the entity filter criteria components (e.g., entity filter criteria components 1723B in FIG. 17B) are stored in association with a service definition. In another implementation, the entity filter criteria components (e.g., entity filter criteria components 1723A in FIG. 17B) are stored within a service definition.


At block 1746, the computing machine stores the execution type for each rule in association with the service definition. As described above, the execution type for each rule can be stored in a respective entity filter criteria component.


At block 1747, the computing machine applies the filter criteria to identify one or more entity definitions satisfying the filter criteria. The filter criteria can be applied to the entity definitions in the service monitoring data store based on the execution type and the execution parameter that has been specified for a rule to which the filter criteria pertains. For example, if the execution type is static execution, the computing machine can apply the filter criteria a single time. For a static execution type, the computing machine can apply the filter criteria a single time when user input, which accepts the filter criteria that are specified via the GUI, is received. In another example, the computing machine can apply the filter criteria a single time the first KPI is being calculated for the service.


If the execution type is dynamic execution, the computing machine can apply the filter criteria multiple times. For example, for a dynamic execution type, the computing machine can apply the filter criteria each time a change to the entity definitions in the service monitoring data store is detected. The computing machine can monitor the entity definitions in the service monitoring data store to detect any change that is made to the entity definitions. The change can include, for example, adding a new entity definition to the service monitoring data store, editing an existing entity definition, deleting an entity definition, etc. In another example, the computing machine can apply the filter criteria each time a KPI is calculated for the service.


At block 1749, the computing machine associates the identified entity definitions with the service definition. The computing machine stores an association indicator in a stored service definition or a stored entity definition.


A static filter criterion can be executed once (or on demand). Static execution of the filter criteria for a particular rule can produce one or more entity associations with the service definition. For example, a rule may have the static filter criterion “name=192.168.1.100”. The filter criterion “name=192.168.1.100” may be applied to the entity definitions in the service monitoring data store once, and a search query is performed to identify the entity definition records that satisfy “name=192.168.1.100”. The result may be a single entity definition, and the single entity definition is associated with the service definition. The association will not the static filter criterion “name=192.168.1.100” is applied another time (e.g., on demand).


Dynamic filter criterion can be run multiple times automatically, i.e., manual vs. automatic. Dynamic execution of the filter criteria for a particular rule can produce a dynamic entity association with the service definition. The filter criteria for the rule can be executed at multiple times, and the entity associations may be different from execution to execution. For example, a rule may have the dynamic filter criterion “name=192.*”. When the filter criterion “name=192.*” is applied to the entity definitions in the service monitoring data store at time X, a search query is performed to identify the entity definitions that satisfy “name=192.*”. The result may be one hundred entity definitions, and the one hundred entity definitions are associated with the service definition. One week later, a new data center may be added to the IT environment, and the filter criterion “name=192.*” may be again applied to the entity definitions in the service monitoring data store at time Y. A search query is performed to identify the entity definitions that satisfy “name=192.*”. The result may be four hundred entity definitions, and the four hundred entity definitions are associated with the service definition. The filter criterion “name=192.168.1.100” can be applied multiple times and the entity definitions that satisfy the filter criterion may differ from time to time.



FIG. 17E illustrates an example of a GUI 1770 of a service monitoring system for using filter criteria to identify one or more entity definitions to associate with a service definition, in accordance with one or more implementations of the present disclosure. In one implementation, GUI 1770 is displayed when button 1306 in FIG. 13 is activated.


GUI 1770 can include a service definition status bar 1771 that displays the various stages for creating a service definition using the GUIs of the service monitoring system. The stages can include, for example, and are not limited to, a service information stage, a key performance indicator (KPI) stage, and a service dependencies stage. The status bar 1771 can be updated to display an indicator (e.g., shaded circle) corresponding to a current stage.


GUI 1770 can include a save button 1789 and a save-and-next button 1773. For each stage, if the save button 1789 is activated, the settings that have been specified via the GUI 1770 for a particular stage (e.g., service information stage) can be stored in a data store, without having to progress to a next stage. For example, if user input for the service name, description, and entity filter criteria has been received, and the save button 1789 is selected, the specified service name, description, and entity filter criteria can be stored in a service definition record (e.g., service definition record 1760 in FIG. 17C) and stored in the service monitoring data store, without navigating to a subsequent GUI to specify any KPI or dependencies for the service. If the save and next button 1773 is activated, the settings that have been specified via the GUI 1770 for a particular stage can be stored in a data store, and a GUI for the next stage can be displayed. In one implementation, user interaction with the save button 1789 or the save-and-next button 1773 produces the same save operation that stores service definition information in the service monitoring data store. Unlike the save button 1789, save-and-next button 1773 has a further operation of navigating to a subsequent GUI. GUI 1770 includes a previous button 1772, which when selected, displays the previous GUI for creating the service definition.


GUI 1770 can facilitate user input specifying a name 1775 and optionally a description 1777 for the service definition for a service. For example, user input of the name “TestService” and the description “Service that contains entities” is received.


GUI 1770 can include one or more buttons (e.g., “Yes” button 1779,“No” button 1781) that can be selected to specify whether entities are associated with the service. A selection of the “No” button 1781 indicates that the service being defined will not be associated with any entities, and the resulting service definition has no associations with any entity definitions. For example, a service may not be associated with any entities if an end user intends to use the service and corresponding service definition for testing purposes and/or experimental purposes. In another example, a service may not be associated with any entities if the service is dependent on one or more other services, and the service is being monitored via the entities of the one or more other services upon which the service depends upon. For example, an end user may wish to use a service without entities as a way to track a business service based on the services which the business service depends upon.


If the “Yes” button 1779 is selected, an entity portion 1783 enabling a user to specify filter criteria for identifying one or more entity definitions to associate with the service definition is displayed. The filter criteria can correspond to a rule. The entity portion 1783 can include a button 1785, which when selected, displays a button and text box to receive user input specifying an element name and one or more corresponding element values for filter criteria corresponding to a rule, as described below in conjunction with FIG. 17F.


Referring to FIG. 17E, the entity portion 1783 can include preview information 1787 that displays information pertaining to any entity definitions in the service monitoring data store that satisfy the particular filter criteria for the rule. The preview information 1787 can be updated as the filter criteria are being specified, as described in greater detail below. GUI 1770 can include a link 1791, which when activated, can display a GUI that presents a list of the matching entity definitions, as described in greater detail below.



FIG. 17F illustrates an example of a GUI 17100 of a service monitoring system for specifying filter criteria for a rule, in accordance with one or more implementations of the present disclosure. GUI 17100 can display a button 17107 for selecting an element name for filter criteria of a rule, and a text box 17109 for specifying one or more values that correspond to the selected element name. If button 17107 is activated, a list 17105 of element names can be displayed, and a user can select an element name for the filter criteria from the list 17105.


In one implementation, the list 17105 is populated using the element names that are in the alias components that are in the entity definition records that are stored in the service monitoring data store. In one implementation, the list 17105 is populated using the element names from the informational field components in the entity definitions. In one implementation, the list 17105 is populated using field names that are specified by a late-binding schema that is applied to events. In one implementation, the list 17105 is populated using any combination of alias component element names, informational field component element names, and/or field names.


User input can be received that specifies one or more values for the specified element name. For example, a user can provide a string for specifying one or more values via text box 17109. In another example, a user can select text box 17109, and a list of values that correspond to the specified element name can be displayed as described below.



FIG. 17G illustrates an example of a GUI 17200 of a service monitoring system for specifying one or more values for filter criteria of a rule, in accordance with one or more implementations of the present disclosure. In this example, filter criteria for rule 17203 is being specified via GUI 17200. GUI 17200 displays a selection of an element name “name” 17201 for the filter criteria of rule 17203. When text box 17205 is activated (e.g., when a user selects text box 17205 by, for example, clicking or tapping on text box 17205, or moving the cursor to text box 17205), a list 17207 of values that correspond to the element name “name” 17201 is displayed. For example, various entity definitions may include a name component having the element name “name”, and the list 17207 can be populated with the values from the name components from those various entity definition records.


One or more values from the list 17207 can be specified for the filter criteria of a rule. For example, the filter criteria for rule 17203 can include the value “192.168.1.100” 17209 and the value “hope.mbp14.local” 17211. In one implementation, when multiple values are part of the filter criteria for a rule, the rule treats the values disjunctively. For example, when the rule 17203 is to be executed, the rule triggers a search query to be performed to search for entity definition records that have either an element name “name” and a corresponding “192.168.1.100” value, or have an element name “name” and a corresponding “hope.mbp14.local” value.


A service definition can include multiple sets of filter criteria corresponding to different rules. In one implementation, the different rules are treated disjunctively, as described below.



FIG. 17H illustrates an example of a GUI 17300 of a service monitoring system for specifying multiple sets of filter criteria for associating one or more entity definitions with a service definition, in accordance with one or more implementations of the present disclosure. As described above, a service definition can include multiple sets of filter criteria corresponding to different rules. For example, two sets of filter criteria for two rules 17303 and 17305 can be specified via GUI 17300.


Rule 17303 has multiple filter criteria that include an element name “name” 17301 and multiple element values (e.g., the value “192.168.100” 17309 and the value “hope.mbp14.local” 17391). In one implementation, the multiple filter criteria are processed disjunctively. For example, rule 17303 can be processed to search for entity definitions that satisfy “name=192.168.1.100” or “name=hope.mbp14.local”. Rule 17305 has a single filter criterion that includes element name “dest” 17307 and a single element value “192.*” 17313 for a single filter criterion of “dest=192.*”.


In one example, an element value for filter criteria of a rule can be expressed as an exact string (e.g., “192.168.1.100” and “hope.mbp14.local”) and the rule can be executed to perform a search query for an exact string match. In another example, an element value for filter criteria of a rule can be expressed as a combination of characters and one or more wildcard characters. For example, the value “192.*” for rule 17305 contains an asterisk as a wildcard character. A wildcard character in a value can denote that when the rule is executed, a wildcard search query is to be performed to identify entity definitions using pattern matching. In another example, an element value for a filter criteria rule can be expressed as a regular expression (regex) as another possible option to identify entity definitions using pattern matching.


In one implementation, when multiple sets of filter criteria for different rules are specified for a service definition, the multiple rules are processed disjunctively. The entity definitions that satisfy any of the rules are the entity definitions that are to be associated with the service definition. For example, any entity definitions that satisfy “name=192.168.1.100 or hope.mbp14.local” or “dest=192.*” are the entity definitions that are to be associated with the service definition.


GUI 17300 can display, for each rule being specified, a button 17327A-B for selecting the execution parameter for the particular rule. GUI 17300 can display, for each rule being specified, a button 17325A-B for selecting the execution type (e.g., static execution type, dynamic execution type) for the particular rule. For example, rule 17303 has a static execution type, and rule 17305 has a dynamic execution type.


A user may wish to select a static execution type for a rule, for example, if the user anticipates that one or more entity definitions may not satisfy a rule that has a wildcard-based filter criterion. For example, a service may already have the rule with filter criterion “dest=192.*”, but the user may wish to also associate a particular entity, which does not have “192” in its address, with the service. A static rule that searches for the particular entity by entity name, such as rule with filter criterion “name=hope.mbp14.local” can be added to the service definition.


In another example, a user may wish to select a static execution type for a rule, for example, if the user anticipates that only certain entities will ever be associated with the service. The user may not want any changes to be made inadvertently to the entities that are associated with the service by the dynamic execution of a rule.


GUI 17300 can display preview information for the entity definitions that satisfy the filter criteria for the rule(s). The preview information can include a number of the entity definitions that satisfy the filter criteria and/or the execution type of the rule that pertains to the particular entity definition. For example, preview information 17319 includes the type “static” and the number “2”. In one implementation, when the execution type is not displayed, the preview information represents a dynamic execution type. For example, preview information 17315 and preview information 17318 pertain to rules that have a dynamic execution type.


The preview information can represent execution of a particular rule. For example, preview information 17315 is for rule 17305. A combination of the preview information can represent execution of all of the rules for the service. For example, the combination of preview information 17318 and preview information 17319 is a summary of the execution of rule 17303 and rule 17305.


GUI 17300 can include one or more buttons 17317, 17321, which when selected, can re-apply the corresponding rule(s) to update the corresponding preview information. For example, the filter criteria for rule 17305 may be edited to “dest=192.168.*” and button 17317 can be selected to apply the edited filter criteria for rule 17305 to the entity definitions in the service monitoring data store. The corresponding preview information 17315 and the preview information 17318 in the summary may or may not change depending on the search results.


In one implementation, the preview information includes a link, which when selected, can display a list of the entity definitions that are being represented by the preview information. For example, preview information 17315 for rule 17307 indicates that there are 4 entity definitions that satisfy the rule “dest=192.*”. The preview information 17315 can include a link, which when activated can display a list of the 4 entity definition, as described in greater detail below in conjunction with FIG. 17I. Referring to FIG. 17H, GUI 17300 can include a link 17323, which when selected can display a list of all of the entity definitions that satisfy all of the rules (having both static and dynamic execution types such as rule 17303 and rule 17305) for the service definition.



FIG. 17I illustrates an example of a GUI 17400 of a service monitoring system for displaying entity definitions that satisfy filter criteria, in accordance with one or more implementations of the present disclosure. GUI 17400 can display list 17401 of the entity definitions that satisfy a particular rule “dest=192.*” (e.g., rule 17305 in FIG. 17H). The list 17401 can include, for each entity definition, the value (e.g., value 192.168.1.100 17403A, value 192.168.0.1 17403B, value 192.168.0.2 17403B, and value 192.168.0.3 17403B) that satisfies the filter criteria for the rule.


Service Discovery


A service monitoring system of the present disclosure uses service definitions to represent services to be monitored. A service definition may have associations with definitions for all of the entities involved in providing the service. Each entity definition represents an entity in the environment that provides the service, for example, a network device or a server machine. The entity definitions may play an important role of identifying machine data that pertains to the entity. Accordingly, the entity definitions can serve as a bridge between machine data and defined services, making it possible to perform a monitoring of services using machine data. Various modes and methods for creating service and entity definitions are described elsewhere herein. The service discovery processing now described teaches additional novel modes and methods for creating an interrelated set of service and entity definitions automatically through the processing of extant machine data. Companion user interfaces and related processes are also disclosed.



FIG. 17J is a system diagram including a process flow for implementing service discovery in one embodiment. Block 17550 of system 17500 includes processing blocks for service discovery in one embodiment. The processing of block 17550 may be principally performed by a service monitoring system (SMS) such as represented by block 17544, diverse and plentiful aspects of which are robustly described throughout this written description (see, for example, SMS 210 of FIG. 2). The ongoing operation of SMS 17544 may be managed, directed, controlled, configured, or otherwise influenced by information of a command/control/configuration (CCC) data store such as represented by block 17546. Information of the CCC data store 17546 may include, for example, one or more entity definitions as represented by block 17534, and one or more service definitions as represented by block 17536. Entity definitions are discussed more fully elsewhere and are not further elaborated here (see, e.g., FIGS. 4, 10B, 10C, 17C, and the related discussions). Service definitions are discussed more fully elsewhere and are not further elaborated here (see, e.g., FIGS. 4, 17B, 17C, and the related discussions). SMS 17544 is shown coupled to a data input and query system, such as an event processing system, 17542. The functional and communicative coupling of blocks 17542 and 17544 may be variously implemented and in one embodiment SMS 17544 may be implemented as an application running under the auspices of EPS 17542. An event processing system such as block 17542 may be better understood by consideration of additional material contained herein, such as FIG. 76, et seq, and the related discussion. EPS 17542 is shown coupled to event data store 17540. Event data store 17540, in one embodiment, may hold machine data from a variety of sources segmented into events, each of which may be time stamped. Information may be principally populated into event data store 17540 by EPS 17542, and may be principally accessed and utilized using functions provided by EPS 17542, perhaps a generalized search or search engine function accessible by an API or other means. When performing a search function, EPS 17542 may utilize information of a late binding schema, such as extraction rules, to identify, access, extract, or otherwise utilize needed field values from the machine data of events. Such field-searchable event databases are described in detail elsewhere in this written description (see, for example, FIG. 76, et seq, and the related discussions).



FIG. 17J illustrates that machine data ingested by EPS 17542 for representation in the event data store 17540 may come from many different sources and in many different formats. EPS 17542 is shown to be coupled for machine data transfer by 17532 to network device 17520, a Windows server running an Exchange email service 17522, a Linux server running a database service 17524, a UNIX server running both DNS and DHCP services 17526, and network cloud 17528. Machine data coupling 17532 represents a general logical coupling between EPS 17542 and its machine data sources, which coupling may be implemented between the EPS and any machine data source by any number and variety of logical and physical data communication modalities. Such data communication modalities may include network 17530 which is shown to couple network device 17520, servers 17522, 17524, and 17526, and network cloud 17528, and which may couple to computing apparatus of 17542, though not specifically shown. Network 17530 in an illustrative embodiment may include one or more TCP/IP networks using IP addressing and accessible via the Internet. Further, it is noted, that UNIX server 17526 is shown with a dual coupling to network 17530 which represents, for this example, two different IP ports, one associated with the DNS service and one associated with the DHCP service of server 17526. IP addresses, including the use of port numbers are well understood for TCP/IP networks. Other networks may have other addressing schemes to provide nodes and/or network interfaces with an address to identify the node/interface in the communication network. MAC addresses are another understood network address type. In one embodiment, a network address may also be implemented in abstraction layers well above the physical communication layer, such as an application level. Such an embodiment may utilize an identifier, (e.g., hostname), to identify a node of a network application, for example, that is in communication with other nodes to implement a system using a networked architecture.


Against this backdrop the processing of block 17550 is performed, sometimes accessing machine data using the facilities of EPS 17542, sometimes adding or otherwise modifying the contents of CCC data store 17546 (possibly directly, as shown, or possibly using an API or other functionality provided by SMS 17544, though not specifically shown), and sometimes interfacing with a computer user, such as a system administrator or analyst, via a human interface device such as 17568, in example system 17500.


The processing of block 17550 represents processing as may occur in one embodiment for a single session, run, occurrence, instance, execution, or the like of a process to examine a corpus of machine data and to therefrom derive (i) an identification of performed services (as may be monitored by an SMS) and (ii) an identification and association of entities (e.g., host computers or its processes) that perform those services.


At block 17552, parameters that define, control, direct, limit, bound, or otherwise influence processing performed during a service discovery run or session are determined. In one embodiment, one or more parameters may be determined automatically in consideration of current, extant conditions, such as the amount of time that has elapsed since the most recent service discovery run. In one embodiment, one or more parameters may be determined based at least in part on user input received from a user interface device such as 17568. Embodiments may vary as to the number and types of parameters that influence a service discovery session and may include, for example: a time range of machine data to include in the corpus; other selection criteria for the machine data to include in the corpus; recognition and identification criteria for entity properties, attributes, characteristics, descriptors, affinities, or the like; logic or rules to determine the same; data translation, normalization, look-ups, or the like; the name of a field indicative of a service identification or grouping; and others.


At block 17554, one or more entities that provide services are determined. The identification and determination of the entities is derived by processing the corpus of machine data, possibly as influenced by parameters determined by processing of block 17552. In the instant example embodiment, the processing of the corpus of machine data may principally result from submitting a properly formatted search query to EPS 17542. In one embodiment, the derived identification for an entity is its IP address. In one embodiment, the derived identification for an entity is its IP address including a post-fixed port number. In one embodiment, the derived identification for an entity is a hostname attribute. In one embodiment, multiple identification factors may be derived for each entity that may be usable alone or in combination to provide a useful identifier for the entity. In one such embodiment, an application name is included among the identification factors. As an application name in a large IT environment may not be useful by itself to identify a particular entity with any uniqueness, the application name may be properly considered to be entity attribute information. Embodiments may include other entity attribute information as part of the processing associated with block 17554. These and other embodiments are possible.


The processing of block 17554 may work to passively or affirmatively to include entities represented in the corpus that provide services (e.g., server machines/ports), and may work passively or affirmatively to exclude entities represented in the corpus that do not provide services (e.g., client machines/ports). In one embodiment, for example, machine data of a network traffic stream, such as may be provided by a network device such as 17520 to EPS 17542 for representation in event data store 17540 and such as may possibly include all or some subset of transmission-formatted data stream traffic flowing over a communication network or channel, may be processed to determine a list, set, group, collection, or the like, of entities that communicate using a particular communication category (e.g., protocol, application traffic type, etc.) and, for each entity, the number of other entities with which it communicates—its connectedness or degree. The entities in the list may then be ranked according to connectedness and the ranking used to differentiate server machines from client machines. For example, in one embodiment, an entity in the list is determined, designated, ascribed, identified, or attributed as a server machine if its rank position is less than its connectedness. Entities so determined to be server machines are included in the logical list, set, group, collection, or the like, of service entities resulting from the processing of block 17554, while the others are not. (In this context, service entities are entities that are involved in performing services, as distinguished from a grander category of, essentially, potential entities in the service discovery context that may include, for example, client machines before they are culled.)


In one embodiment, as another example, Linux machine data as may have been produced by an execution of the PS command (i.e., report a snapshot of current processes)—and such as may be provided by a Linux host such as 17524 to EPS 17542 for representation in event data store 17540—may be processed at block 17554 to locate a particular application or process name, such as “MySQL”. If found, an identification of the Linux host is included among the determined service entities of block 17554. Operating systems (OS'es) other than Linux, and other functionality of Linux, may offer similar production of data describing active units of work, such as processes, tasks, subtasks, or the like, in the system.


The processing of block 17554 may vary greatly in scope from embodiment to embodiment. In order to recognize entities represented in machine data that are related to the performance of services, an embodiment may variously make its determination in consideration of any number or combination of elements or factors in or about the machine data including, for example: sourcetype; the class, category, or type of machine data; the class, category, or type of host producing the machine data; known, recognized, or ascertainable attribute or field values within the machine data; evidence of protocols; evidence of standard data representation formats; machine data content and any representation formats utilized, especially as in compliance with known standards or specifications, particularly as being suggestive, highly indicative, definitive, or dispositive of the identification of a service; the communication direction; the number of communicating partners; attributes of communicating partners; and so on.


In order to recognize entities represented in machine data that are related to the performance of services, one embodiment may make its determination at least in part in consideration of a list of known or recognized services and/or their associated attributes (e.g, communication protocol or data formats). In one such embodiment, the list of known or recognized services includes those network applications that are widely known, recognized, used, or supported in the computing industry. Such network applications may run on host machines and expose their services via a network interface. Such network applications may include, for example, email (POP, SMTP, etc.), web server, instant messaging, remote login, authentication, file sharing, database, media streaming, IP telephony, and Infrastructure as a Service (Iaas). Such network applications may utilize client-server, peer-to-peer (P2P), hybrid, or other architecture paradigms.


In one embodiment, the processing of blocks 17552 and 17554 may be combined somewhat iteratively. In one such embodiment, the user is prompted by processing related to block 17552 to indicate certain session parameters. As those session parameters are indicated, the processing of block 17554 is conducted, as possible, and a list of determined service entities is displayed, providing a form of feedback to the user. In response to the displayed list of determined service entities, the user may decide to alter or correct session parameters by engaging processing of 17552 and, in turn, processing of block 17554 ensues to determine a new set of service entities based on the changed parameters. The cycle may continue until the user is satisfied with the displayed set of determined service entities.


At block 17556, each of the service entities determined at block 17554 is preliminarily associated with a particular service. The processing performed at block 17556 may depend upon certain session parameters determined at block 17552. In one embodiment, an entity identification factor or attribute from the processing of 17554 is used as the identification of a service to which the entity will be associated. In one embodiment, an entity identification factor or attribute from the processing of 17554 may match a pattern to determine its service association. Embodiments may vary as to the correlation (determination, resolution, derivation, identification, selection, or the like) of data extracted from or derived from machine data to a service association identified for a service entity. A service association indicates the association or relationship between a service and a entity. The service association may include an identifier for the service and the logical link to the entity so associated. The logical link may be represented explicitly, such as by a paired entity identifier and service identifier, or implicitly, such as by the colocation of a service identifier and an entity identifier (e.g. in the same row of a table, adjacently, in close proximity, in an informational grouping), or the storage of the service identifier in data representing the entity, or vice versa. These and other embodiments are possible. In one embodiment, the service identification comports with a list of known or recognized network applications.


At block 17558, a list of service-related entities determined at block 17554 and their respective service associations made at block 17556 are displayed to a user, perhaps via interface device 17568. The employed user interface enables a user to indicate edits to the list of entities and service associations. User input indicating the desired edits is received and processed at block 17560. At block 17562, the computing machine receives an input from the user, perhaps via user interface device 17568, confirming their acceptance or approval of the entity and service association list. The processing of block 17562 may include a preview presentation of discovered entities and services and/or related information for user information and assessment before signaling confirmation. At block 17564, the computing machine processes an entity and service association list, as may have been confirmed at 17562, and updates CCC datastore 17546 to reflect the contents of the entity and service association list. In one embodiment, the processing of block 17564 may entail creating a new service definition such as 17536 for each uniquely identified service in the entity and service association list, creating a new entity definition such as 17534 for each uniquely identified entity in the entity and service association list, and reflecting the association between each of the new entity definitions and the appropriate new service definition in accordance with the contents of the entity and service association list. In one embodiment, the processing of block 17564 may entail creating a new service definition such as 17536 for each uniquely identified service in the entity and service association list that does not have a pre-existing service definition in CCC data store 17546, creating a new entity definition such as 17534 for each uniquely identified entity in the entity and service association list that does not have a pre-existing entity definition in CCC data store 17546, and reflecting the association between each of the new entity definitions and the appropriate service definition in accordance with the contents of the entity and service association list. These and other embodiments are possible. By or at about the conclusion of processing of block 17564 in one embodiment, the processing of block 17566 causes a presentation to the user indicating the results of the service discovery session, i.e., the update to the command/configuration/control data store of the service monitoring system to reflect service entities determined from a corpus of machine data in association the services they provide.


One of skill will now appreciate the novelty of the bottom-up approach to entity and service definition illustrated by reference to the method and system of 17500, where a broad base of machine data produced by an actively operating IT environment is distilled up to representative entity and service definitions. This stands in contrast to top-down approaches whereby entities and services must be manually recognized and entered into a configuration system to which system data is subsequently subjected regardless of its accuracy.



FIG. 17K depicts a user interface display related to service and entity discovery processing in one embodiment. User interface display 17501 is such as might be used in relation to the processing of block 17552, and possibly 17554, of FIG. 17J. User interface display 17501 is such as might be caused for display to enable a user determine parameters for an automated service discovery session. User interface display 17501 of FIG. 17K is shown to include workflow header 17570, workflow segment header 17572, discovery options section 17574, and grouping options section 17576. Workflow header 17570 is shown to include workflow title, “Entity/Service Import”, workflow progress bar 17580, progress indicator 17582, and navigation action area 17584. Workflow header 17570 is such as might be displayed as part of multiple, successive user interface displays all related to a common workflow objective. As such, progress indicator 17582 may change positions along workflow progress bar 17580 on different interface displays in order to indicate the position of the currently displayed interface in the context of a greater workflow. Similarly, navigation action area 17584 may be variously populated with appropriate navigation indicators and controls that reflect the current position within the workflow context.


Workflow segment header 17572 is shown to include workflow segment title, “Welcome to Service and Entity Discovery”, user prompt, “Select a time range to begin the Discovery search”, timeframe component 17586, and action button 17588 entitled, “Run Entity Discovery Search”. Timeframe component 17586 is shown as a drop-down selection box containing the default or most-recently-selected timeframe value of “Last 15 minutes.” User interaction with timeframe component 17586 may result in the appearance of a drop-down selection list (not shown) of various timeframe specifications from which a user may make a selection, and may include options such as “Last 15 minutes”, “Last 7 days”, “Prior Month”, and others. The time frame indication selected by the user by interaction with timeframe component 17586 may be a control parameter for the current service discovery session that seemingly begins with the display of interface 17501. In one embodiment, the control parameter may be used in the formulation of a search query for execution by the event processing system. User interaction with action button 17588 may result in the computing machine causing the formulation and execution of such a search query in accordance with the control parameters of the current working context in order to identify service entities, such as contemplated by the processing of block 17554 of FIG. 17J.


Discovery options section 17574 of FIG. 17K is shown to include section header 17590 which may enable user interaction to selectively collapse or expand the presentation of the section 17574. Discovery options section 17574 is shown to further include the descriptive text “Provide additional Parameters for Discovery”, and a row of related text boxes including: text box 17592 showing the user prompt “Name the Discovered Applicati[on]” for a data item Application Name; text box 17594 showing the user prompt “user” for a data item Linux Process Name; text box 17596 showing the user prompt “Name” for a data item Windows Process Name (tasklist); text box 17598 showing the user prompt “sourcetype” for a data item sourcetype; and text box 17600 showing the user prompt “app” for a data item App Field (From Stream). In one embodiment, a user is enabled to enter and/or edit text in as many as necessary of the related text boxes to populate discovery parameters that provide lookup, translation, or normalization data for mapping a machine data value(s) to a chosen application name. The chosen application name is entered into text box 17592. Values corresponding to the chosen application name as would be found in machine data are entered into the other related text boxes. Each of the other related text boxes that may correspond to a field of a source, class, category, type, or the like, of the machine data where the value corresponding to the chosen application name would be found. For example, a user may enter or edit the text of box 17594 to indicate the value found in a Process Name field of a Linux category of machine data that corresponds to, should be translated to, should be mapped to, should be looked up as, or otherwise identified with the chosen application name as entered into text box 17592. Similarly, a user may enter or edit the text of box 17596 to indicate the value found in a Process Name field of a Windows category of machine data that corresponds to, should be translated to, should be mapped to, should be looked up as, or otherwise identified with the chosen application name as entered into text box 17592. Similarly, a user may enter or edit the text of box 17598 to indicate the value found in a sourcetype field of machine data that corresponds to, should be translated to, should be mapped to, should be looked up as, or otherwise identified with the chosen application name as entered into text box 17592. Similarly, a user may enter or edit the text of box 17600 to indicate the value found in an App Field field of a data stream category of machine data that corresponds to, should be translated to, should be mapped to, should be looked up as, or otherwise identified with the chosen application name as entered into text box 17592. The contents of the row of related text boxes can conceptually be considered as a row in a lookup table for translating category-field values to a normalized application name. User interaction with action element 17602, in one embodiment, adds another row of related text boxes as just described, enabling a user to provide normalization/lookup information for another application. In one embodiment, the additional discovery parameters provided by a user in such rows of related text boxes may supplement built-in normalization/lookup information that perhaps addresses a small or large number of known network applications or other applications. A similar lookup capability could be implemented to normalize fields other than the application name.


Discovery options section 17574 is shown to further include “Add Discovery Parameter” action element 17602 as already discussed, and a “Run Search With Additional Parameters” action button 17604. User interaction with action button 17604, in one embodiment, produces much the same effect as user interaction with button 17588 along with the certain inclusion of the additional parameters for discovery of section 17574 factored into the processing.


Grouping options section 17576 is shown to include section header 17610 which may enable user interaction to selectively collapse or expand the presentation of the section 17576. Grouping options section 17576 is shown to further include the descriptive text “Choose how Entities should be grouped into Services”, selection drop-down element 17612, generated search display area 17614, search action button 17616, and “Add Grouping” action element 17618. Selection drop-down element 17612 enables a user to select a mapping or correspondence to a preliminary service association for a service entity from a data component (e.g., a field, field combination, or calculated or derived value) determined for a service entity as the result of processing as described in relation to block 17554 of FIG. 17J, for example. Preliminary Service Association selection element 17612 is shown as a drop-down selection box containing the default or most-recently-selected value of “Application Name.” User interaction with Preliminary Service Association selection element 17612 may result in the appearance of a predefined or dynamically determined list of data components of service entities from which a user may make a selection, and may include options such as “Application Name”, “Port Number”, “IP Address”, “Host name”, and “Clustering Algorithm”, for example. Generated search display area 17614, in one embodiment, displays the text of a properly formatted search query for execution by an event processing system (EPS) based at least in part on any user interaction with interface 17501, including the mere interaction with an action button to indicate acceptance of default values and initiate a discovery search. The search query displayed at 17614 is generated by the computing machine, in one embodiment, on the basis of one or more of its fixed programming, information in a CCC data store of the SMS related to service discovery configuration and control, dynamically determined session parameters, and/or user supplied session parameters, among others. User interaction with search action button 17616 in one embodiment results in the display of the search query text in the context of a text editing user interface, or perhaps a search query IDE (integrated development environment) interface, enabling a user to modify the system generated search query. Interaction with “Add Grouping” action element 17618 enables a user to request the appearance of an additional instance of Preliminary Service Association selection element 17612 in order to establish compound grouping criteria.


In one embodiment, user interaction with a run-search interface components such as button 17588 or button 17604 will result in the computing machine executing a search query in accordance with any user specified search query text or processing parameters. In one embodiment the display of interface 17501 is essentially extended to include a list of service entities discovered, determined, and identified by the search. An example of such an interface display in such an embodiment follows.



FIG. 17L depicts a user interface display related to service and entity discovery processing in one embodiment including a presentation of discovered items. Interface 17502 may represent a scrolled down version of interface display 17501 of FIG. 17K after certain user interaction. Interface 17502 of FIG. 17L is shown to include grouping options section 17576, previously described, and discovered entities display area 17620. Discovered entities display area 17620 is shown to include header row 17622, and a service entities display table that includes column header row 17630 and service entity entry rows 17632a to 17632j. Header row 17622 is shown to include a count of the discovered entities 17624, a display options selection element 17626, and display table page navigation elements 17628. Column header row 17630 is shown to include a column title or heading of “IP” for column 17642, “Port” for column 17644, “Hostname” for column 17646, and “Application” for column 17648. Each of service entity entry rows 17632a to 17632j displays information for a respective service entity identified by an execution of the search query. The search query result may include information in fields with names corresponding to the column headings appearing in row 17630. Each of the service entry rows additionally includes a result number in column 17640. A user may advantageously utilize the interface display 17502 to get immediate feedback on the propriety or effectiveness of the currently established service discovery session parameters. If the feedback is favorable and acceptable to the user, the user may interact with an action button such as “Next” button 17584 of interface 17501 of FIG. 17K to proceed with service discovery processing workflow.



FIG. 17M depicts a user interface display related to editing and confirmation of discovered items. User interface display 17503 is such as might be useful during the processing of blocks 17558, 17560, and/or 17562 of FIG. 17J. User interface display 17503 of FIG. 17M is shown to include workflow header 17570, workflow segment header 17650, and service association results display table 17660. Workflow segment header 17650 is shown to include segment title “Edit Discovery Results”, user prompting information, “Select entity import type and Edit Discovered Entities and Services”, import-type element 17652, bulk action element 17654, and filter element 17656. Import-type element 17652 is shown as a drop-down selection box containing the default or most-recently-selected value of “Always Append to Data store.” User interaction with import-type element 17652 may result in the appearance of a drop-down selection list of import types from which a user may make a selection, and may include options for incorporating the service discovery results into the service and entity definitional data of the CCC data store of the SMS such as “Always Append to Data store”, “Update existing and Add New”, “Retain existing and Add New”, and “Replace all contents of Data store”, for example. User interaction with bulk action element 17654 is shown as a drop-down selection box. User interaction with bulk action element 17654 may result in the appearance of a drop-down selection list of available bulk actions from which a user may make a selection, and may include options such as “Delete Selected”, “Edit Selected Entities”, and “Edit Selected Services”, for example. A bulk action may be an action that is specified and/or initiated once for iteration over some set of one or more objects, such as the logical set of all of the service entities represented in the service entity entry rows of a results display table such as 17660 that are in the selected state. Filter element 17656 is shown as a text box. Filter element 17656 is interactive and enables a user to enter and/or edit text representing filter criteria by which to qualify discovered service entities appearing in service association results display table 17660.


Service association results display table 17660 as first presented to a user in a service discovery session, in one embodiment, displays identification information for discovered service entities as well as a preliminary service association. Service association results display table 17660 is shown to include column header row 17661 and service entity entry rows 17662a to 17662o. Column header row 17661 shows a column identification, such as a title or field name, for each column displayed including a check box for column 17670, “Entity” for column 17672, “IP: Port” for column 17674, and “Service” for column 17676. Each of service entity entry rows 17662a to 17662o includes a check box in column 17670 that is interactive enabling a user to toggle the selection state of the service entity represented in the corresponding row. The selected state of the service entity as indicated by the check box in column 17670 may, for example, determine whether a bulk action selected using 17654 is applied to the particular service entity. In one embodiment, the value displayed for an entity in “Entity” column 17672 is an identifier corresponding to the hostname identified for the entity from the machine data corpus. In one embodiment, the value displayed for an entity in “IP: Port” column 17674 is a concatenation of IP address and port number fields identified for the entity from the machine data corpus. In one embodiment, the “Entity” value and the “IP: Port” value for an entity is each able to uniquely identify the entity. In one embodiment, using an example where the application name was selected as the grouping option, perhaps by interaction with interface element 17612 of FIG. 17K, the value displayed for an entity in “Service” column 17676 of FIG. 17M is the application name identified for the entity from the machine data corpus, possibly after a lookup, translation, normalization, or such an operation, using built-in or user-supplied lookup data.


Upon viewing interface 17503, a user may determine that the computer-generated information in the service association results display table 17660 is proper and requires no changes. Such may be the case where network applications are the services the user desires to define for monitoring by the SMS, in the current example. In such a case, the user may interact with action button 17584 to indicate and confirm acceptance of the data, which may result in the computing machine proceeding to processing for another segment of the workflow and presenting a corresponding user interface such as that depicted in FIG. 17P. If interface 17503 of FIG. 17M does not reflect entity identifications and service associations desired by the user, the user may exercise the interface to perform edits. Such may be the case where, for example, a user wants to combine multiple network application servers across a mix of applications into a single, higher-level service. In such a case, the user may use the selection boxes of column 17670 to identify the entities belonging to the higher-level service and use bulk action element 17654 to initiate a bulk edit of the service name for all of the selected entities to the name of a higher-level service the user will then specify. As an alternative in such a case, particularly where a large number of entities are involved, a user may advantageously employ the filtering capability of interface 17503 in the process of properly identifying relevant entities with an association to the higher-level service. In the course of such a process, a user interface display may appear as that shown in FIG. 17N.



FIG. 17N depicts a user interface display related to editing and confirmation of discovered items with filtering and bulk edit aspects. User interface display 17504 shows a display as may result after certain user interactions with interface 17503 of FIG. 17M such as, in the first instance, the entry of the value “dhcp” into the filter text box 17656. Interface 17504 of FIG. 17N is shown to include workflow header 17570, workflow segment header 17650, and service association results display table 17660. Notably, service association results display table 17660 is reduced to showing only the contents from entry rows 17662a, 17662e, 17662k, 17662l, 17662m, and 17662n, out of the set 17662a-o, seen earlier, as the result of the application of the “dhcp” filter criteria. (Note that now displayed entity rows are those with “dhcp” in Service column 17676.) User interface display 17504 further evidences user interaction to place all of the entity rows in the selected state as indicated by the selected checkboxes in column 17670. User interface display 17504 further evidences user interaction with bulk action element 17654 as indicated by the appearance of drop-down selection list 17680. User interface display 17504 further evidences user interaction to select the last entry in drop-down selection list 17680, i.e., “Edit Selected Services”, as indicated by the pointer icon (a hand) pointing to that list entry in the drop-down selection list. With the pointer icon so positioned, a user activation action (e.g., a mouse click) for the indicated selection list option may result in the display of a user interface component associated with the selected bulk action option.



FIG. 17O depicts a user interface display related to bulk editing, particularly bulk editing of the service association. User interface 17505 is shown to include title area 17690, footer area 17694, and bulk operation parameter area 17692. Bulk operation parameter area 17692 is shown to include parameter name text, “New Service Name”, and text box 17696 enabling a user to enter and/or edit the service name identifying a service association that will be applied to all of the entities selected for the bulk edit, i.e., all of the entities appearing in 17660 of FIG. 17N. The text appearing in parameter text box 17696 of interface 17505 of FIG. 17O indicate user interaction to specify “uber” as the replacement service association. In one embodiment, user interaction with action button 17698 will signal the user's desire to change the service association for the selected entities to the newly specified service name which may result in the computing machine removing user interface display 17505, performing the indicated bulk edit, and causing the display of a user interface substantially as 17504 of FIG. 17N, albeit without selection list 17680, with filter text box 17656 returned to a blank state, and with Service column 17676 now indicating “uber” where it had formerly indicated “dhcp”. User interaction with the “Save & Next” action button of 17570 may advance workflow processing, resulting in the display of a user interface corresponding to a different segment of workflow processing.



FIG. 17P depicts a user interface display related to graphically visualizing discovered items. Interface 17506 is such as might be utilized during the processing of block 17562 of FIG. 17J. User interface 17506 of FIG. 17P is shown to include workflow header 17570, workflow segment header 17700, discovered services and entities area 17702, visualization control area 17704, and discovery visualization area 17706. Workflow segment header 17700 is shown to include the title “Preview Entities and Services”, and descriptive text “Preview entities and services.” Discovered services and entities area 17702 is shown to include the headers only for a number of collapsible display areas. Each header is shown to display the name of a discovered service and is interactive to toggle between the collapsed and expanded states of the display area. User interaction with one of the collapsed headers will result in the computing machine modifying the interface display to include an expanded version of the corresponding display area. An expanded version of a display area in one embodiment includes information about the relevant discovered service, including a list of discovered entities having a service association to the service. Other embodiments are possible.


Visualization control area 17704 is shown to include options and/or controls for regulating, controlling, configuring, or otherwise influencing, the content and/or appearance of discovery visualization area 17706; particularly, for example, zoom controls including a selectable zoom level control showing the default or most recently selected value of “Fit to area”, a zoom out button (i.e., “−”), and a zoom in button (i.e., “+”). Discovery visualization area 17706 is shown to include a graphical depiction of discovered entities and their associations to discovered services. Each discovered entity is represented, in this example, by a small circle icon of a first color (here, black) such as entity icon 17712. Each discovered services represented, in this example, by a circle icon of a second color (here, blue) large enough to contain the icons for entities having an association to the service, such as service icon 17710. Discovery visualization area 17706 is also shown to include a cursor/pointer icon of an arrow 17714. In one embodiment, the user interface enables user interaction such that when cursor/pointer icon 17714 is positioned over a service or entity icon, the interface display is modified to include a “hover-over” interface component displaying detailed information for the service or entity represented by the underlying icon. A portion of the interface 17506 as modified with such a hover-over display element is depicted in FIG. 17Q.



FIG. 17Q depicts a user interface display aspect related to graphically visualizing discovered items. Interface 17507 is such as might be utilized during the processing of block 17562 of FIG. 17J. Interface 17507 of FIG. 17Q depicts a detail portion of a display such as interface 17506 of FIG. 17P. Interface 17507 of FIG. 17Q is shown to include service icon 17710 and entity icon 17712, positioned as before. Cursor/pointer icon 17714 is shown moved to a position over entity icon 17712. As a result of the positioning of cursor/pointer icon 17714 over entity icon 17712, the appearance of icon 17712 is shown modified to include an outer border in a third, highlight color (here, orange). Interface 17507, also as a result of the positioning of cursor/pointer icon 17714 over entity icon 17712, is shown to include hover-over display box 17720 displaying detail information about the entity represented by icon 17712 that underlies cursor/pointer icon 17714. Specifically, hover-over display box 17720 is shown to include the fieldname “Name:” with the value “sv3dc02.sv.splunk.com:53”, the fieldname “Alias:” with the value “10.140.6.24:53”, and the service association value description “Service:” with the value “uber”. After possibly exploring the configuration of discovered entities and services using the facilities of user interface 17506, a user may confirm, approve, accede to, or otherwise accept, the configuration by interacting with the “Next” action button of 17570 of FIG. 17P. Such interaction may result in the conclusion of processing related to block 17562 of FIG. 17J, the performance of the processing of block 17564 to update the content of CCC data store 17546 with appropriate service and entity definitions, and associations therebetween, and the performance of the processing of block 17566. Such performance of the processing of block 17566 may result in the computing machine causing the display such as depicted in FIG. 17R.



FIG. 17R depicts a user interface display related to automated configuration updates of service discovery. Interface 17508 is shown to include workflow header 17570 and workflow segment display area 17730. Workflow segment display area 17730 is shown to include an indication of the number of entities imported into the SMS (reflected in entity definitions in the CCC data store of the SMS) 17732, an indication of the number of services imported into the SMS (i.e. reflected and service definitions in the CCC data store of the SMS) 17736, action button 17734 enabling a user to initiate navigation toward processing and a user interface related to entities (see, for example, FIG. 34ZB2), action button 17738 enabling a user to initiate navigation toward processing in a user interface related to services (see, for example, FIG. 34ZA2), and action button 17740 enabling the user to save parameters of the current service discovery session for future recall and reuse.



FIG. 18 illustrates an example of a GUI 1800 of a service monitoring system for specifying dependencies for the service, in accordance with one or more implementations of the present disclosure. GUI 1800 can include an availability list 1804 of services that each has a corresponding service definition. The availability list 1804 can include one or more services. For example, the availability list 1804 may include dozens of services. GUI 1800 can include a filter box 1802 to receive input for filtering the availability list 1804 of services to display a portion of the services. GUI 1800 can facilitate user input for selecting a service from the availability list 1804 and dragging the selected service to a dependent services list 1812 to indicate that the service is dependent on the services in the dependent services list 1812. For example, the service definition may be for a Sandbox service. For example, the drop-down 1801 can be selected to display a title “Sandbox” in the service information for the service definition. The availability list 1804 may initially include four other services: (1) Revision Control service, (2) Networking service, (3) Web Hosting service, and (4) Database service. The Sandbox service may depend on the Revision Control service and the Networking service. A user may select the Revision Control service and Networking service from the availability list 1804 and drag the Revision Control service and Networking service to the dependent services list 1812 to indicate that the Sandbox service is dependent on the Revision Control service and Networking service. In one implementation, GUI 1800 further displays a list of other services which depend on the service described by the service definition that is being created and/or edited.


Thresholds for Key Performance Indicators



FIG. 19 is a flow diagram of an implementation of a method 1900 for creating one or more key performance indicators for a service, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 1902, the computing machine receives input (e.g., user input) of a name for a KPI to monitor a service or an aspect of the service. For example, a user may wish to monitor the service's response time for requests, and the name of the KPI may be “Request Response Time.” In another example, a user may wish to monitor the load of CPU(s) for the service, and the name of the KPI may be “CPU Usage.”


At block 1904, the computing machine creates a search query to produce a value indicative of how the service or the aspect of the service is performing. For example, the value can indicate how the aspect (e.g., CPU usage, memory usage, request response time) is performing at point in time or during a period of time. Some implementations for creating a search query are discussed in greater detail below in conjunction with FIG. 20. In one implementation, the computing machine receives input (e.g., user input), via a graphical interface, of search processing language defining the search query. Some implementations for creating a search query from input of search processing language are discussed in greater detail below in conjunction with FIGS. 22-23. In one implementation, the computing machine receives input (e.g., user input) for defining the search query using a data model. Some implementations for creating a search query using a data model are discussed in greater detail below in conjunction with FIGS. 24-26.


At block 1906, the computing machine sets one or more thresholds for the KPI. Each threshold defines an end of a range of values. Each range of values represents a state for the KPI. The KPI can be in one of the states (e.g., normal state, warning state, critical state) depending on which range the value falls into. Some implementations for setting one or more thresholds for the KPI are discussed in greater detail below in conjunction with FIGS. 28-31.



FIG. 20 is a flow diagram of an implementation of a method 2000 for creating a search query, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 2002, the computing machine receives input (e.g., user input) specifying a field to use to derive a value indicative of the performance of a service or an aspect of the service to be monitored. As described above, machine data can be represented as events. Each of the events is raw data. A late-binding schema can be applied to each of the events to extract values for fields defined by the schema. The received input can include the name of the field from which to extract a value when executing the search query. For example, the received user input may be the field name “spent” that can be used to produce a value indicating the time spent to respond to a request.


At block 2004, the computing machine optionally receives input specifying a statistical function to calculate a statistic using the value in the field. In one implementation, a statistic is calculated using the value(s) from the field, and the calculated statistic is indicative of how the service or the aspect of the service is performing. As discussed above, the machine data used by a search query for a KPI to produce a value can be based on a time range. For example, the time range can be defined as “Last 15 minutes,” which would represent an aggregation period for producing the value. In other works, if the query is executed periodically (e.g., every 5 minutes), the value resulting from each execution can be based on the last 15 minutes on a rolling basis, and the value resulting from each execution can be based on the statistical function. Examples of statistical functions include, and are not limited to, average, count, count of distinct values, maximum, mean, minimum, sum, etc. For example, the value may be from the field “spent” the time range may be “Last 15 minutes,” and the input may specify a statistical function of average to define the search query that should produce the average of the values of field “spent” for the corresponding 15 minute time range as a statistic. In another example, the value may be a count of events satisfying the search criteria that include a constraint for the field (e.g., if the field is “response time,” and the KPI is focused on measuring the number of slow responses (e.g., “response time” below x) issued by the service).


At block 2006, the computing machine defines the search query based on the specified field and the statistical function. The computing machine may also optionally receive input of an alias to use for a result of the search query. The alias can be used to have the result of the search query to be compared to one or more thresholds assigned to the KPI.



FIG. 21 illustrates an example of a GUI 2100 of a service monitoring system for creating a KPI for a service, in accordance with one or more implementations of the present disclosure. GUI 2100 can display a list 2104 of KPIs that have already been created for the service and associated with the service via the service definition. For example, the service definition “Web Hosting” includes a KPI “Storage Capacity” and a KPI “Memory Usage”. GUI 2100 can include a button 2106 for editing a KPI. A KPI in the list 2104 can be selected and the button 2106 can be activated to edit the selected KPI. GUI 2100 can include a button 2102 for creating a new KPI. If button 2102 is activated, GUI 2200 in FIG. 22 is displayed facilitating user input for creating a KPI.



FIG. 22 illustrates an example of a GUI 2200 of a service monitoring system for creating a KPI for a service, in accordance with one or more implementations of the present disclosure. GUI 2200 can facilitate user input specifying a name 2202 and optionally a description 2204 for a KPI for a service. The name 2202 can indicate an aspect of the service that is to be monitored using the KPI. As described above, the KPI is defined by a search query that produces a value derived from machine data pertaining to one or more entities identified in a service definition for the service. The produced value is indicative of how an aspect of the service is performing. In one example, the produced value is the value extracted from a field when the search query is executed. In another example, the produced value is a result from calculating a statistic based on the value in the field.


In one implementation, the search query is defined from input (e.g., user input), received via a graphical interface, of search processing language defining the search query. GUI 2200 can include a button 2206 for facilitating user input of search processing language defining the search query. If button 2206 is selected, a GUI for facilitating user input of search processing language defining the search query can be displayed, as discussed in greater detail below in conjunction with FIG. 23.


Referring to FIG. 22, in another implementation, the search query is defined using a data model. GUI 2200 can include a button 2208 for facilitating user input of a data model for defining the search query. If button 2208 is selected, a GUI for facilitating user input for defining the search query using a data model can be displayed, as discussed in greater detail below in conjunction with FIG. 24.



FIG. 23 illustrates an example of a GUI 2300 of a service monitoring system for receiving input of search processing language for defining a search query for a KPI for a service, in accordance with one or more implementations of the present disclosure. GUI 2300 can facilitate user input specifying a KPI name 2301, which can optionally indicate an aspect of the service to monitor with the KPI, and optionally a description 2302 for a KPI for a service. For example, the aspect of the service to monitor can be response time for received requests, and the KPI name 2301 can be Request Response Time. GUI 2300 can facilitate user input specifying search processing language 2303 that defines the search query for the Request Response Time KPI. The input for the search processing language 2303 can specify a name of a field (e.g., spent 2313) to use to extract a value indicative of the performance of an aspect (e.g., response time) to be monitored for a service. The input of the field (e.g., spent 2313) designates which data to extract from an event when the search query is executed.


The input can optionally specify a statistical function (e.g., avg 2311) that should be used to calculate a statistic based on the value corresponding to a late-binding schema being applied to an event. The late-binding schema will extract a portion of event data corresponding to the field (e.g., spent 2313). For example, the value associated with the field “spent” can be extracted from an event by applying a late-binding schema to the event. The input may specify that the average of the values corresponding to the field “spent” should be produced by the search query. The input can optionally specify an alias (e.g., rsp_time 2315) to use (e.g., as a virtual field name) for a result of the search query (e.g., avg(spent) 2314). The alias 2315 can be used to have the result of the search query to be compared with one or more thresholds assigned to the KPI.


GUI 2300 can display a link 2304 to facilitate user input to request that the search criteria be tested by running the search query for the KPI. In one implementation, when input is received requesting to test the search criteria for the search query, a search GUI is displayed.


In some implementations, GUI 2300 can facilitate user input for creating one or more thresholds for the KPI. The KPI can be in one of multiple states (e.g., normal, warning, critical). Each state can be represented by a range of values. During a certain time, the KPI can be in one of the states depending on which range the value, which is produced at that time by the search query for the KPI, falls into. GUI 2300 can include a button 2307 for creating the threshold for the KPI. Each threshold for a KPI defines an end of a range of values, which represents one of the states. Some implementations for creating one or more thresholds for the KPI are discussed in greater detail below in conjunction with FIGS. 28-31.


GUI 2300 can include a button 2309 for editing which entity definitions to use for the KPI. Some implementations for editing which entity definitions to use for the KPI are discussed in greater detail below in conjunction with FIG. 27.


In some implementations, GUI 2300 can include a button 2320 to receive input assigning a weight to the KPI to indicate an importance of the KPI for the service relative to other KPIs defined for the service. The weight can be used for calculating an aggregate KPI score for the service to indicate an overall performance for the service, as discussed in greater detail below in conjunction with FIG. 32. GUI 2300 can include a button 2323 to receive input to define how often the KPI should be measured (e.g., how often the search query defining the KPI should be executed) for calculating an aggregate KPI score for the service to indicate an overall performance for the service, as discussed in greater detail below in conjunction with FIG. 32. The importance (e.g., weight) of the KPI and the frequency of monitoring (e.g., a schedule for executing the search query) of the KPI can be used to determine an aggregate KPI score for the service. The score can be a value of an aggregate of the KPIs of the service. Some implementations for using the importance and frequency of monitoring for each KPI to determine an aggregate KPI score for the service are discussed in greater detail below in conjunction with FIGS. 32-33.


GUI 2300 can display an input box 2305 for a field to which the threshold(s) can be applied. In particular, a threshold can be applied to the value produced by the search query defining the KPI. Applying a threshold to the value produced by the search query is described in greater detail below in conjunction with FIG. 29.



FIG. 24 illustrates an example of a GUI 2400 of a service monitoring system for defining a search query for a KPI using a data model, in accordance with one or more implementations of the present disclosure. GUI 2400 can facilitate user input specifying a name 2403 and optionally a description 2404 for a KPI for a service. For example, the aspect of the service to monitor can be CPU utilization, and the KPI name 2403 can be CPU Usage. If button 2402 is selected, GUI 2400 displays button 2406 and button 2408 for defining the search query for the KPI using a data model. A data model refers to one or more objects grouped in a hierarchical manner and can include a root object and, optionally, one or more child objects that can be linked to the root object. A root object can be defined by search criteria for a query to produce a certain set of events, and a set of fields that can be exposed to operate on those events. Each child object can inherit the search criteria of its parent object and can have additional search criteria to further filter out events represented by its parent object. Each child object may also include at least some of the fields of its parent object and optionally additional fields specific to the child object, as will be discussed in greater detail below in conjunction with FIGS. 74B-D.


If button 2402 is selected, GUI 2500 in FIG. 25 is displayed for facilitating user input for selecting a data model to assist with defining the search query. FIG. 25 illustrates an example of a GUI 2500 of a service monitoring system for facilitating user input for selecting a data model and an object of the data model to use for defining the search query, in accordance with one or more implementations of the present disclosure. GUI 2500 can include a drop-down menu 2503, which when expanded, displays a list of available data models. When a data model is selected, GUI 2500 can display a list 2505 of objects pertaining to the selected data model. For example, the data model Performance is selected and the objects pertaining to the Performance data model are included in the list 2505. Objects of a data model are described in greater detail below in conjunction with FIGS. 74B-D. When an object in the list 2505 is selected, GUI 2500 can display a list 2511 of fields pertaining to the selected object. For example, the CPU object 2509 is selected and the fields pertaining to the CPU object 2509 are included in the list 2511. GUI 2500 can facilitate user input of a selection of a field in the list 2511. The selected field (e.g., cpu_load_percent 2513) is the field to use for the search query to derive a value indicative of the performance of an aspect (e.g., CPU usage) of the service. The derived value can be, for example, the field's value extracted from an event when the search query is executed, a statistic calculated based on one or more values of the field in one or more events located when the search query is executed, a count of events satisfying the search criteria that include a constraint for the field (e.g., if the field is “response time” and the KPI is focused on measuring the number of slow responses (e.g., “response time” below x) issued by the service).


Referring to FIG. 24, GUI 2400 can display a button 2408 for optionally selecting a statistical function to calculate a statistic using the value(s) from the field (e.g., cpu_load_percent 2513). If a statistic is calculated, the result from calculating the statistic becomes the produced value from the search query, which indicates how an aspect of the service is performing. When button 2408 is selected, GUI 2400 can display a drop-down list of statistics. The list of statistics can include, and are not limited to, average, count, count of distinct values, maximum, mean, minimum, sum, etc. For example, a user may select “average” and the value produced by the search query may be the average of the values of field cpu_load_percent 2513 for a specified time range (e.g., “Last 15 minutes”). FIG. 26 illustrates an example of a GUI 2600 of a service monitoring system for displaying a selected statistic 2601 (e.g., average), in accordance with one or more implementations of the present disclosure.


Referring to FIG. 24, GUI 2400 can facilitate user input for creating one or more thresholds for the KPI. GUI 2400 can include a button 2410 for creating the threshold(s) for the KPI. Some implementations for creating one or more thresholds for the KPI are discussed in greater detail below in conjunction with FIGS. 28-31.


GUI 2400 can include a button 2412 for editing which entity definitions to use for the KPI. Some implementations for editing which entity definitions to use for the KPI are discussed in greater detail below in conjunction with FIG. 27.


GUI 2400 can include a button 2418 for saving a definition of a KPI and an association of the defined KPI with a service. The KPI definition and association with a service can be stored in a data store.


The value for the KPI can be produced by executing the search query of the KPI. In one example, the search query defining the KPI can be executed upon receiving a request (e.g., user request). For example, a service-monitoring dashboard, which is described in greater detail below in conjunction with FIG. 35, can display a KPI widget providing a numerical or graphical representation of the value for the KPI. A user may request the service-monitoring dashboard to be displayed, and the computing machine can cause the search query for the KPI to execute in response to the request to produce the value for the KPI. The produced value can be displayed in the service-monitoring dashboard


In another example, the search query defining the KPI can be executed based on a schedule. For example, the search query for a KPI can be executed at one or more particular times (e.g., 6:00 am, 12:00 pm, 6:00 pm, etc.) and/or based on a period of time (e.g., every 5 minutes). In one example, the values produced by a search query for a KPI by executing the search query on a schedule are stored in a data store, and are used to calculate an aggregate KPI score for a service, as described in greater detail below in conjunction with FIGS. 32-33. An aggregate KPI score for the service is indicative of an overall performance of the KPIs of the service.


Referring to FIG. 24, GUI 2400 can include a button 2416 to receive input specifying a frequency of monitoring (schedule) for determining the value produced by the search query of the KPI. The frequency of monitoring (e.g., schedule) of the KPI can be used to determine a resolution for an aggregate KPI score for the service. The aggregate KPI score for the service is indicative of an overall performance of the KPIs of the service. The accuracy of the aggregate KPI score for the service for a given point in time can be based on the frequency of monitoring of the KPI. For example, a higher frequency can provide higher resolution which can help produce a more accurate aggregate KPI score.


The machine data used by a search query defining a KPI to produce a value can be based on a time range. The time range can be a user-defined time range or a default time range. For example, in the service-monitoring dashboard example above, a user can select, via the service-monitoring dashboard, a time range to use (e.g., Last 15 minutes) to further specify, for example, based on time-stamps, which machine data should be used by a search query defining a KPI. In another example, the time range may be to use the machine data since the last time the value was produced by the search query. For example, if the KPI is assigned a frequency of monitoring of 5 minutes, then the search query can execute every 5 minutes, and for each execution use the machine data for the last 5 minutes relative to the execution time. In another implementation, the time range is a selected (e.g., user-selected) point in time and the definition of an individual KPI can specify the aggregation period for the respective KPI. By including the aggregation period for an individual KPI as part of the definition of the respective KPI, multiple KPIs can run on different aggregation periods, which can more accurately represent certain types of aggregations, such as, distinct counts and sums, improving the utility of defined thresholds. In this manner, the value of each KPI can be displayed at a given point in time. In one example, a user may also select “real time” as the point in time to produce the most up to date value for each KPI using its respective individually defined aggregation period.


GUI 2400 can include a button 2414 to receive input assigning a weight to the KPI to indicate an importance of the KPI for the service relative to other KPIs defined for the service. The importance (e.g., weight) of the KPI can be used to determine an aggregate KPI score for the service, which is indicative of an overall performance of the KPIs of the service. Some implementations for using the importance and frequency of monitoring for each KPI to determine an aggregate KPI score for the service are discussed in greater detail below in conjunction with FIGS. 32-33. FIG. 27 illustrates an example of a GUI 2700 of a service monitoring system for editing which entity definitions to use for a KPI, in accordance with one or more implementations of the present disclosure. GUI 2700 may be displayed in response to the user activation of button 2412 in GUI 2400 of FIG. 24. GUI 2700 can include a button 2710 for creating a new entity definition. If button 2710 is selected, GUI 1600 in FIG. 16 can be displayed and an entity definition can be created as described above in conjunction with FIG. 6 and FIG. 16.


Referring to FIG. 27, GUI 2700 can display buttons 2701, 2703 for receiving a selection of whether to include all of the entity definitions, which are associated with the service via the service definition, for the KPI. If the Yes button 2701 is selected, the search query for the KPI can produce a value derived from the machine data pertaining to all of the entities represented by the entity definitions that are included in the service definition for the service. If the No button 2703 is selected, a member list 2704 is displayed. The member list 2704 includes the entity definitions that are included in the service definition for the service. GUI 2700 can include a filter box 2702 to receive input for filtering the member list 2704 of entity definitions to display a subset of the entity definitions.


GUI 2700 can facilitate user input for selecting one or more entity definitions from the member list 2704 and dragging the selected entity definition(s) to an exclusion list 2712 to indicate that the entities identified in each selected entity definition should not be considered for the current KPI. This exclusion means that the search criteria of the search query defining the KPI is changed to no longer search for machine data pertaining to the entities identified in the entity definitions from the exclusion list 2712. For example, entity definition 2705 (e.g., webserver07.splunk.com) can be selected and dragged to the exclusion list 2712. When the search query for the KPI produces a value, the value will be derived from machine data, which does not include machine data pertaining to webserver07.splunk.com.


KPI Shared Base Search


The search queries that define and produce KPIs may be independently maintained and executed in an embodiment. Where different KPIs are derived from the same, or significantly overlapping, underlying machine event data, perhaps each KPI looking at different fields within those events, or perhaps looking at different statistics, calculations, analysis, or measures of a same field of those events, performance of the service monitoring system may be enhanced in an embodiment by accessing the event data once for use in the determination of multiple KPI values. Such an embodiment will now be described that enables control data for the service monitoring system (SMS) to be created and maintained that defines a common shared base search for the production of multiple KPIs.


FIG. 27A1 illustrates a process for the production of multiple KPIs using a common shared base search in one embodiment. As is apparent from the detailed description to this point, a service monitoring system (SMS) may be effectively controlled to perform the desired monitoring using definitional data for entities that provide services, definitional data for the services themselves, and some implied or explicit representation of the association between a defined service and the entities use to perform that service. Method 27000 is shown to begin at block 27010 where entity and service definitions are defined and related. Given the abundant disclosure related to those topics present elsewhere in this detailed description, no further discussion is made here other than to note that the processing of block 27010 results in the creation or maintenance of control data for an SMS 27022, i.e., entity and service definitions properly related. At block 27012 a base search query is defined. The base search query definition may specify (i) selection or filter criteria to identify the appropriate machine or event data from which KPI values are to be derived, (ii) various metrics, measures, calculations, statistics, or the like, to be produced in view of the identified data, (iii) other information as may be used to control the execution of an instance of the base search query such as timing information like a frequency or schedule, and (iv) other information related to the common shared base search query as may be useful in a particular embodiment. Illustrative embodiments of user interfaces useful to the processing of block 27012 are discussed below in relation to FIG. 27A2 and FIG. 27A3. The processing of block 27012 results in the creation or maintenance of additional SMS control data 27022.


At block 27014, KPIs are defined that rely on a shared base search. In one embodiment, such an individual KPI relies, for example, on the identification of the appropriate machine data and the determination of a metric over that data as provided by the shared base search. In such an embodiment the processing of block 27014 may include generating appropriately formatted SMS control data of a KPI definition that identifies a particular shared base search and a particular metric associated with that search. The processing of block 27014 of an embodiment may further include generating appropriately formatted SMS control data of the KPI definition that extends the KPI definition beyond what the shared base search provides. For example, threshold information specific to the KPI (embodiments for which are discussed elsewhere herein) may be received and incorporated with a shared base search identification and associated metric selection into a KPI definition of SMS control data 27022. One illustrative embodiment of a user interface useful to the processing of block 27014 is discussed below in relation to FIG. 27A4.


At block 27016, a search query based on the shared base search definition is executed, and values for multiple KPIs 27024 are derived from the machine data accessed during the single execution of the base search query. In one embodiment, processing of block 27016 is repeated automatically as indicated by cyclic arrow 27018. The service monitoring system of such an embodiment utilizes SMS control data 27022 to effect such automatic, repetitive production of KPI values 27024 relying on a common shared base search. In one embodiment, the SMS effects the automatic production of KPI values by repeatedly requesting an event processing system (EPS) to make a single execution of a search query based on the shared base search definition. In another embodiment, the SMS effects the automatic production of KPI values by making a single request to an EPS for the repeated execution of the search query, where the EPS supports such a request. In such an embodiment, the SMS may selectively use and reformat definitional data from the SMS control data 27022 as needed to make a properly formatted request to the EPS. One of skill appreciates that SMS control data 27022 can be implemented with a variety of data representations, structures, organizations, formatting, and the like, and that changes in those aspects may be made to the data or to copies of the data during its use. One of skill will also appreciate, in light of the illustrative formats for SMS control data illustrated and discussed elsewhere herein (for example, the entity definition of FIG. 10B, or the service definition of FIG. 17B, and their related discussions), how those examples might be extended or applied to SMS control data content not specifically illustrated.


Method 27000 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine. Many combinations of processing apparatus to perform the method are possible.


FIG. 27A2 illustrates a user interface as may be used for the creation and maintenance of shared base search definition information for controlling an SMS in one embodiment. User interface display 27100 depicts a visual display as may be presented to a user after user interaction to define a shared base search. Display 27100 includes both interactive and non-interactive elements. Display 27100 is shown to include system title bar area 27102, application menu/navigation bar area 27104, base search title component 27110, search text component 27112, search schedule component 27114, calculation window component 27116, monitoring lag component 27118, entity split component 27120, entity filter component 27122, entity lookup component 27124, entity alias filtering component 27126, metrics portion 27130, cancel button 27106, and save button 27108. System title bar component 27102 is further shown to include an editable text box 27106. Entity alias filtering component 27126 is further shown to include multiple field token components such as field token component 27128 representing a “host” field. Metrics portion 27130 is further shown to include metric count component 27136, metric filter component 27137, Add button 27138, and a tabular display of metric definition data. The tabular display of metric definition data is shown to include column heading portion 27132 and a table data portion shown to include metric definition entry components 27134a-d.


System title bar area 27102 of this illustrative embodiment is shown to include the name of an operating environment supporting service monitoring functionality (“splunk>®”), the name of the service monitoring system (“IT Service Intelligence”) which may be an application of the aforementioned operating environment, various menu and/or navigation options (“Administrator”, “Messages”, “Settings”, “Activity”, and “Help”) which may be pertinent to the aforementioned operating environment, and an editable text box 27106 which may be used to enter search text for an immediate search of operating environment and or application information possibly including help files. The name of the service monitoring system of one embodiment is interactive such that a click action results in the display of a list of applications available within the operating environment. The menu and/or navigation options of an embodiment may be similarly interactive.


Application menu/navigation bar area 27104 is shown to include various menu and/or navigation options (“Service Analyzer”, “Event Management”, “Glass Tables”, “Deep Dives”, “Multi KPI Alerts”, “Search”, and “Configure”) and the name of the service monitoring system (“IT Service Intelligence”). In this example, the menu and/or navigation options are options of the application of the operating environment having the focus currently, here, the service monitoring system application. Certain of the menu and/or navigation options of an embodiment may be interactive such that a click action results in the display of a list of further actions that may be invoked by the user. Certain of the menu and/or navigation options of an embodiment may be interactive such that a click action results in the replacement of interface 27100 with the display of an interface related to some other function than creating or maintaining a KPI shared base search definition.


Base search title component 27110 displays the name or title of a particular KPI shared base search definition, here, for example, “Shared Access Logs Data.” The base search title component 27110 may be interactive such that the user is enabled to edit the name of the shared base search definition that is being exposed to the user for creation or maintenance by interface 27100. Search text component 27112 of the presently described embodiment displays editable text of a search query for the shared base search and is comparable to search query text described elsewhere for KPIs (for example, 2902 of FIG. 29, and the related discussion). Search text component 27112, in an embodiment, may not be limited to search query text but may replace or augment it with one or more options, such as selecting from a list of predefined searches, or selecting or constructing a search based on a common information model. These and other embodiments are possible for a arriving at a search specification for a shared base search. Search schedule component 27114 is shown to include a drop-down button that indicates the currently selected search schedule (here, “Every Minute”), and that may be clicked by user interaction to present a list of selectable schedule options. Calculation window component 27116 is shown to include a drop-down button that indicates the currently selected calculation window (here, “Last Minute”), and that may be clicked by user interaction to present a list of selectable calculation window options. Monitoring lag component 27118 is shown as a text box that indicates the currently specified monitoring time lag. The text box is interactive to allow the user to directly edit its displayed contents in order to change the specification of the monitoring time lag. A monitoring time lag in an embodiment may introduce a delay in the execution of the base search query to allow for the late arrival of data relevant to the search. In one embodiment, the monitoring time lag is measured in seconds and less than a minute.


Entity split component 27120 is shown to include Yes and No option buttons selectable by the user to indicate whether search data is to be processed on a per-entity basis. Per-entity processing may be desired, for example, to utilize per-entity thresholds with a KPI associated with the shared base search. (Per-entity thresholds are discussed elsewhere, such as in relation to FIG. 31D.) A selection of Yes at 27120 may result in the enablement, activation, visibility or the like, of related interface components such as entity lookup component 27124. A selection of No at 27120 may produce an opposite result, in an embodiment.


Entity filter component 27122 is shown to include Yes and No option buttons selectable by the user to indicate whether search criteria for the shared base search should limit the search data on the basis of entities defined to have an association with the service to which the shared base search, itself, is associated. The Yes option may be desired, for example, to improve performance by avoiding unnecessary data accesses when monitoring a stable service environment having reliable service and entity definition associations. The No option may be desired, for example, to ensure complete data capture when monitoring a dynamic service environment where complete and accurate service and entity definition associations may be difficult to maintain in a timely fashion. A selection of Yes at 27122 may result in the enablement, activation, visibility or the like, of related interface components such as entity lookup component 27124 and entity alias filtering component 27126. A selection of No at 27122 may produce an opposite result.


Entity lookup component 27124 is shown to include an editable text box for indicating the identification of a field in the search data having entity identifier information. The example entity identifier field name is shown as “host” in 27124. Entity alias filtering component 27126 is shown to include an editable text box for indicating one or more entity definition aliases to be used for matching the entity lookup field. In an embodiment, an empty editable text box may indicate that all entity definition aliases are to be used for matching. In an embodiment, specifying fewer than all of the entity definition aliases in the editable text box of 27126 may result in a performance improvement by limiting the amount of machine data accessed or processed for the execution of the search. In an embodiment, each specified entity definition alias may be represented in editable text box 27126 by a field token such as 27128 that displays the alias name (such as “host”) along with one or more action icons (such as deletion icon “X”).


Components of interface 27100 already discussed, and their associated KPI shared base search definitional items, may be characterized as relating to different aspects of a KPI search generally, including data selection (e.g. 27112), search scheduling (e.g. 27114), and processing/output options (e.g. 27120). Parallels may be seen in embodiments described for KPI's using unshared search definitions including, for example, the search text of 2902 of FIG. 29C (a data selection aspect), the calculation frequency indicator of 2964 of FIG. 29C (a scheduling aspect), and per-entity processing during a KPI search query execution as may be invoked by the selection of per-entity threshold types using 3161 of FIG. 31D (a processing/output options aspect).


Embodiments described for KPI's using unshared search definitions may include further processing/output options aspects including, for example, the specification of a threshold field and related calculation (e.g., 2904 and 2966a of FIG. 29C). A parallel may be seen in the KPI shared base search context by consideration of metrics portion 27130 of interface 27100 of FIG. 27A2. Metrics portion 27130 relates to defining one or more metrics each to be determined by calculation, statistical analysis, or alternate means, over the field data produced by the search. Individual KPIs relying on the shared base search may make use of the metrics. In one embodiment, such an individual KPI uses a single metric from among those defined for the shared base search. Other embodiments are possible.


Metrics portion 27130 includes metric count component 27136. In one embodiment metric count component 27136 indicates the total number of metrics defined for the shared base search. In one embodiment, metric count component 27136 indicates the number of metrics defined for the shared base search that satisfy filter criteria entered by the user and displayed in metric filter component 27137. Add button 27138 enables the user to enter into an operational mode permitting the creation of a new metric definition for the shared base search. Entering such an operational mode may result in the display of a user interface component such as an Add Metric window, region, portion, or the like, enabling the display and user input of metric definition information. Such an Add Metric interface component is now illustrated and described in relation to FIG. 27A3.


FIG. 27A3 illustrates a user interface as may be used for the creation of metric definition information of shared base search in one embodiment. Illustrative user interface 27150 is shown in a state as might appear after user interaction. In an embodiment, on initial display, perhaps in response to receiving an indication of user input such as an indication of a user click on Add Metric button 27138 of interface 27100 of FIG. 27A2, user interface component of interface 27150 of FIG. 27A3 may appear without values, with SMS default or suggested values, with last-used values, with user profile default values, or the like. A Title interface component is shown to include editable text box 27162 for the display, entry, and modification of a metric name, here shown as “Avg Bytes Per Request.” A Threshold Field interface component is shown to include editable text box 27164 for the display, entry, and modification of an identifier for a field of the search data to be used as a threshold field. The threshold field name, here, shown as “bytes.” A Unit interface component is shown to include editable text box 27166 for the display, entry, and modification of a designation for a unit or measurement unit associated with the threshold field. Unit designation “byte” is shown.


An Entity Calculation interface component is shown to include a drop-down selection element 27172 for the display and selection of a per-entity calculation option associated with the threshold field and defining the metric. Drop down element 27172 is shown with the “Average” calculation option having been selected from a list of available options presented (not shown) because of a user interaction with element 27172, such as a mouse click or finger press. In an embodiment, drop-down selection element 27172 may have its visibility, enablement, or activation dependent on a user indication elsewhere, for example, on a user selection made at 27120 of FIG. 27A2. A Service/Aggregate Calculation interface component is shown in FIG. 27A3 to include a drop-down selection element 27174 for the display and selection of an overall service or aggregate calculation option associated with the threshold field and defining the metric. Drop down element 27174 is shown with the “Average” calculation option having been selected from a list of available options presented (not shown) because of a user interaction with element 27174, such as a mouse click or finger press.


“Add” button interface component 27153 may enable a user to provide the computing machine with an indication that the metric definition information appearing in interface 27150 is correct and should be included as a metric definition of the instant shared base search definition. In an embodiment, in response to a user activation of Add button 27153 the computing machine may store the metric definitional information indicated by interface 27150 and present it in a metric definition entry component such as shown by metric definition entry 27134a of FIG. 27A2. Metric definition entry 27134a of FIG. 27A2 is shown as the first of four metric definition entries 27134a-d appearing in interface 27100. The data values appearing in definition entry 27134a (“Avg Bytes Per Request”, “bytes”, “avg”, “avg”, and “byte”) correspond to definitional data item field names appearing in column heading portion 27132 (“Title”, “Threshold Field”, “Entity Calculation”, “Service Calculation”, and “Unit”, respectively). The “Actions” column of each metric definition entry does not contain a definitional data item but rather an interactive interface component enabling a user to select and engage an action to perform in relation to the metric definition represented by the entry. The interactive interface components shown for entries 27134a-d are each a drop-down selection component indicating “Edit” as the current selection. The definitional data items for each of metric entries 27134a-d may have been entered using an interface such as 27150 already described in relation to FIG. 27A3. In one embodiment the definitional data items for a metric entry may be entered or modified by direct interaction with a metric entry component such as 27134a of interface 27100 of FIG. 27A2. In an embodiment, a user may interact with an interface component, such as Save button 27108, to indicate acceptance of information presented by interface 27100 for storing or saving as KPI shared base search definitional information. Embodiments may variously store or save such KPI shared base search information as, for example, one or more collections, entries, structures, records, or the like, in an SMS control data store such as 27022 of FIG. 27A1.


In one embodiment, a search query may be derived from the information of the shared base search query definition as necessary, along with any other needed information as may be found in other SMS control data (such as definitions for services or entities), dynamically determined from the operating environment (such as the current time of day), and the like. In an embodiment, the search query may be passed to an EPS for execution, while in the same or different embodiment the search query may be performed against machine data by a search capability of the SMS itself.


FIG. 27A4 illustrates a user interface as may be used in one embodiment to establish an association between a KPI and a defined shared base search. In an embodiment, the illustrated interface, here as elsewhere, may be representative of an independent display image, a portion of a display image, a user interface component within a more comprehensive user interface, such as a pop-up window, or the like. The interface embodiment 27180 illustrates the display of an interactive interface as may be used in an embodiment to add a KPI definition. Interface 27180 represents a GUI portion that addresses data source information of a KPI definition, in one embodiment. KPI definitions, their creation and maintenance, a variety of options, and related user interfaces are illustrated and described in detail elsewhere, including, for example, FIG. 19, et seq., and the discussions related thereto.


Interface 27180 includes a header portion 27181 and a footer portion 27183. Header portion 27181 indicates the name or title of the KPI currently being defined, “Request Duration”, and that the interface 27180 relates to definitional information about a KPI data source which is the second step of a 6-step process for defining a KPI (“Step 2 of 6: Source”). Footer portion 27183 is shown to include Cancel, Back, Next, and Finish action buttons, with the Next button highlighted as the default action.


The main body of interface 27180 is shown to include a KPI Source component 27190, a Base Search component 27192, and a Metric component 27194. KPI Source component 27190 may be recognized for its similarity to the KPI Source component of the interface 2200 of FIG. 22 which includes selection buttons for “Data Model” 2208 and “Ad-hoc Search” 2206 options. Those selection buttons have counterparts here in KPI Source component 27190 of FIG. 27A4. Notably, KPI Source component 27190 further includes selection button 27190a for a “Base Search” option. A user may interact with selection button 27190a to indicate to the computing machine that the data source for the KPI being defined is a shared base search. In response to receiving such an indication from the user, the computing machine may activate, make visible, or otherwise enable Base Search component 27192. Base Search component 27192 is shown having a drop-down selection list component, with “Shared Access Logs Data” as the currently selected list option. In an embodiment, certain user interaction with Base Search component 27192 (e.g., a mouse click or finger press) results in the display of a list of identifiers for the shared base searches from which the user can indicate a selection. In an embodiment, the shared base searches from which the user can indicate a selection may include all base searches that are defined with at least one metric. In an embodiment, the shared base searches for selection may be filtered down based on some system or user criteria. In response to such a user selection indication in an embodiment, the computing machine may display an identifier for a shared base search, such as its name, in Base Search component 27192, and activate, make visible, or otherwise enable Metric component 27194. Metric component 27194 is shown having a drop-down selection list component with “Avg Request Duration” as the currently selected list option. In an embodiment, certain user interaction with Metric component 27194 may result in the display of a list of identifiers for the metrics included, directly or indirectly, in the definition for the KPI shared base search identified in 27192, enabling the user to make a selection from the list. In response to such a user selection in an embodiment, the computing machine may display an identifier for the metric, such as its name, in Metric component 27194. In an embodiment, a user may interact with an action button, such as the Next or Finish buttons of 27183, to indicate acceptance, agreement, approval, or the like, of the base search information shown on the interface 27180, and to thereby instruct the computing machine to store or otherwise use the information to represent the relationship between the KPI and the shared base search as part of SMS control data such as 27022 of FIG. 27A1.


One of skill appreciates that the foregoing examples related to KPI shared base searches are illustrative and the particular details shown, discussed, or implied are not intended to express limitations on the practice of inventive subject matter. For example, a method related to KPI shared base searches is not constrained by the details of the process shown or discussed in relation to FIG. 27A1. Such a method may, for example, perform all or only a limited number of the operations illustrated and discussed there, and may perform its operations in different combinations, orders, sequences, parallelisms, and the like, using different combinations, distributions, configurations, and the like of computing machinery. In a similar example, user interface apparatus related to KPI shared base searches is not constrained by the details shown and discussed in relation to FIGS. 27A2-27A4. Illustrative user interface apparatus may be selected, substituted, separated, combined, omitted, augmented, and the like in whole or in part, while not avoiding the inventive subject matter.



FIG. 28 is a flow diagram of an implementation of a method 2800 for defining one or more thresholds for a KPI, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 2802, the computing machine identifies a service definition for a service. In one implementation, the computing machine receives input (e.g., user input) selecting a service definition. The computing machine accesses the service definition for a service from memory.


At block 2804, the computing machine identifies a KPI for the service. In one implementation, the computing machine receives input (e.g., user input) selecting a KPI of the service. The computing machine accesses data representing the KPI from memory.


At block 2806, the computing machine causes display of one or more graphical interfaces enabling a user to set a threshold for the KPI. The KPI can be in one of multiple states. Example states can include, and are not limited to, unknown, trivial state, informational state, normal state, warning state, error state, and critical state. Each state can be represented by a range of values. At a certain time, the KPI can be in one of the states depending on which range the value, which is produced by the search query for the KPI, falls into. Each threshold defines an end of a range of values, which represents one of the states. Some examples of graphical interfaces for enabling a user to set a threshold for the KPI are discussed in greater detail below in conjunction with FIG. 29A to FIG. 31C.


At block 2808, the computing machine receives, through the graphical interfaces, an indication of how to set the threshold for the KPI. The computing machine can receive input (e.g., user input), via the graphical interfaces, specifying the field or alias that should be used for the threshold(s) for the KPI. The computing machine can also receive input (e.g., user input), via the graphical interfaces, of the parameters for each state. The parameters for each state can include, for example, and not limited to, a threshold that defines an end of a range of values for the state, a unique name, and one or more visual indicators to represent the state.


In one implementation, the computing machine receives input (e.g., user input), via the graphical interfaces, to set a threshold and to apply the threshold to the KPI as determined using the machine data from the aggregate of the entities associated with the KPI.


In another implementation, the computing machine receives input (e.g., user input), via the graphical interfaces, to set a threshold and to apply the threshold to a KPI as the KPI is determine using machine data on a per entity basis for the entities associated with the KPI. For example, the computing machine can receive a selection (e.g., user selection) to apply thresholds on a per entity basis, and the computing machine can apply the thresholds to the value of the KPI as the value is calculated per entity.


For example, the computing machine may receive input (e.g., user input), via the graphical interfaces, to set a threshold of being equal or greater than 80% for the KPI for Avg CPU Load, and the KPI is associated with three entities (e.g., Entity-1, Entity-2, and Entity-3). When the KPI is determined using data for Entity-1, the value for the KPI for Avg CPU Load may be at 50%. When the KPI is determined using data for Entity-2, the value for the KPI for Avg CPU Load may be at 50%. When the KPI is determined using data for Entity-3, the value for the KPI for Avg CPU Load may be at 80%. If the threshold is applied to the values of the aggregate of the entities (two at 50% and one at 80%), the aggregate value of the entities is 60%, and the KPI would not exceed the 80% threshold. If the threshold is applied using an entity basis for the thresholds (applied to the individual KPI values as calculated pertaining to each entity), the computing machine can determine that the KPI pertaining to one of the entities (e.g., Entity-3) satisfies the threshold by being equal to 80%.


At block 2810, the computing machine determines whether to set another threshold for the KPI. The computing machine can receive input, via the graphical interface, indicating there is another threshold to set for the KPI. If there is another threshold to set for the KPI, the computing machine returns to block 2808 to set the other threshold.


If there is not another threshold to set for the KPI (block 2810), the computing machine determines whether to set a threshold for another KPI for the service at block 2812. The computing machine can receive input, via the graphical interface, indicating there is a threshold to set for another KPI for the service. In one implementation, there are a maximum number of thresholds that can be set for a KPI. In one implementation, a same number of states are to be set for the KPIs of a service. In one implementation, a same number of states are to be set for the KPIs of all services. The service monitoring system can be coupled to a data store that stores configuration data that specifies whether there is a maximum number of thresholds for a KPI and the value for the maximum number, whether a same number of states is to be set for the KPIs of a service and the value for the number of states, and whether a same number of states is to be set for the KPIs of all of the service and the value for the number of states. If there is a threshold to set for another KPI, the computing machine returns to block 2804 to identity the other KPI.


At block 2814, the computing machine stores the one or more threshold settings for the one or more KPIs for the service. The computing machine associates the parameters for a state defined by a corresponding threshold in a data store that is coupled to the computing machine.


As will be discussed in more detail below, implementations of the present disclosure provide a service-monitoring dashboard that includes KPI widgets (“widgets”) to visually represent KPIs of the service. A widget can be a Noel gauge, a spark line, a single value, or a trend indicator. A Noel gauge is indicator of measurement as described in greater detail below in conjunction with FIG. 40. A widget of a KPI can present one or more values indicating how a respective service or an aspect of a service is performing at one or more points in time. The widget can also illustrate (e.g., using visual indicators such as color, shading, shape, pattern, trend compared to a different time range, etc.) the KPI's current state defined by one or more thresholds of the KPI.



FIGS. 29A-B illustrate examples of a graphical interface enabling a user to set one or more thresholds for the KPI, in accordance with one or more implementations of the present disclosure.



FIG. 29A illustrates an example GUI 2900 for receiving input for search processing language 2902 for defining a search query, in accordance with one or more implementations of the present disclosure. The KPI can be in one of multiple states (e.g., normal, warning, critical). Each state can be represented by a range of values. At a certain time, the KPI can be in one of the states depending on which range the value, which is produced by the search query for the KPI, falls into. GUI 2900 can display an input box 2904 for a field to which the threshold(s) can be applied. In particular, a threshold can be applied to the value produced by the search query defining the KPI. The value can be, for example, the field's value extracted from an event when the search query is executed, a statistic calculated based on one or more values of the field in one or more events located when the search query is executed, a count of events satisfying the search criteria that include a constraint for the field, etc. GUI 2900 may include the name 2904 of the actual field used in the search query or the name of an alias that defines a desired statistic or count to be produced by the search query. For example, the threshold may be applied to an average response time produced by the search query, and the average response time can be defined by the alias “rsp_time” in the input box 2904.



FIG. 29B illustrates an example GUI 2950 for receiving input for selecting a data model for defining a search query, in accordance with one or more implementations of the present disclosure. GUI 2950 can be displayed if a KPI is defined using a data model.


GUI 2950 in FIG. 29B can include a statistical function 2954 to be used for producing a value when executing the search query of the KPI. As shown, the statistical function 2954 is a count, and the resulting statistic (the count value) should be compared with one or more thresholds of the KPI. The GUI 2950 also includes a button 2956 for creating the threshold(s) for the KPI. When either button 2906 is selected from GUI 2900 or button 2956 is selected from GUI 2950, GUI 3000 of FIG. 30 is displayed.



FIG. 29C illustrates an example GUI 2960 for configuring KPI monitoring in accordance with one or more implementations of the present disclosure. GUI 2960 may present information specifying a service definition corresponding to a service provided by a plurality of entities, and a specification for determining a KPI for the service. The service definition refers to a data structure, organization, or representation that can include information that associates one or more entities with a service. The service definition can include information for identifying the service definition, such as, for example, a name or other identifier for the service or service definition as may be indicated using GUI element 2961. The specification for determining a KPI for the service refers to the KPI definitional information that can include source-related definitional information of a group of GUI elements 2963 and monitoring-related parameter information of a group of GUI elements 2965. The source-related definitional information of a group of GUI elements 2963 can include, as illustrated by FIG. 29C, a search defining the KPI as presented in a GUI element 2902, one or more entity identifiers for entities providing the service as presented in a GUI element 2906, one or more threshold field names for fields derived from the entities' machine data as presented in a GUI element 2904. (The named fields derived from the entities' machine data may be used to derive a value produced by the search of 2902.) The monitoring-related parameter information of a group of GUI elements 2963 can include, as illustrated in FIG. 29C, an importance indicator presented by GUI element 2962, a calculation frequency indicator presented by GUI element 2964, and a calculation period indicator presented by GUI element 2966. Once KPI definitional information (2963 and 2965) is adequately indicated using GUI 2960, a specification for determining a KPI can be stored as part of the service definition (e.g., in the same database or file, for example), or in association with the service definition (e.g., in a separate database or file, for example, where the service definition, the KPI specification, or both, include information for associating the other). The adequacy of KPI definitional information can be determined in response to a specific user interaction with the GUI, by an automatic analysis of one or more user interactions with the GUI, or by some combination, for example.


The search of 2902 is represented by search processing language for defining a search query that produces a value derived from machine data pertaining to the entities that provide the service and which are identified in the service definition. The value can indicate a current state of the KPI (e.g., normal, warning, critical). An entity identifier of 2906 specifies one or more fields (e.g., dest, ip_address) that can be used to identify one or more entities whose machine data should be used in the search of 2902. The threshold field GUI element 2904 enables specification of one or more fields from the entities' machine data that should be used to derive a value produced by the search of 2902. One or more thresholds can be applied to the value associated with the specified field(s) of 2904. In particular, the value can be produced by a search query using the search of 2902 and can be, for example, the value of threshold field 2904 associated with an event satisfying search criteria of the search query when the search query is executed, a statistic calculated based on values for the specified threshold field of 2904 associated with the one or more events satisfying the search criteria of the search query when the search query is executed, or a count of events satisfying the search criteria of the search query that include a constraint for the threshold field of 2904, etc. In the example illustrated in GUI 2960, the designated threshold field of 2904 is “cpu_load_percent,” which may represent the percentage of the maximum processor load currently being utilized on a particular machine. In other examples, the threshold(s) may be applied a field specified in 2904 which may represent other metrics such as total memory usage, remaining storage capacity, server response time, or network traffic, for example.


In one implementation, the search query includes a machine data selection component and a determination component. The machine data selection component is used to arrive at a set of machine data from which to calculate a KPI. The determination component is used to derive a representative value for an aggregate of the set of machine data. In one implementation, the machine data selection component is applied once to the machine data to gather the totality of the machine data for the KPI, and returns the machine data sorted by entity, to allow for repeated application of the determination component to the machine data pertaining to each entity on an individual basis. In one implementation, portions of the machine data selection component and the determination component may be intermixed within search language of the search query (the search language depicted in 2902, as an example of search language of a search query).


KPI monitoring parameters 2965 refer to parameters that indicate how to monitor the state of the KPI defined by the search of 2902. In one embodiment, KPI monitoring parameters 2965 include the importance indicator of 2962, the calculation frequency indicator of 2964, and the calculation period indicator of element 2966.


GUI element 2964 may include a drop-down menu with various interval options for the calculation frequency indicator. The interval options indicate how often the KPI search should run to calculate the KPI value. These options may include, for example, every minute, every 15 minutes, every hour, every 5 hours, every day, every week, etc. Each time the chosen interval is reached, the KPI is recalculated and the KPI value is populated into a summary index, allowing the system to maintain a record indicating the state of the KPI over time.


GUI element 2966 may include individual GUI elements for multiple calculation parameters, such as drop-down menus for various statistic options 2966a, periods of time options 2966b, and bucketing options 2966c. The statistic options drop-down 2966a indicates a selected one (i.e., “Average”) of the available methods in the drop-down (not shown) that can be applied to the value(s) associated with the threshold field of 2904. The expanded drop-down may display available methods such as average, maximum, minimum, median, etc. The periods of time options drop-down 2966b indicates a selected one (i.e., “Last Hour”) of the available options (not shown). The selected period of time option is used to identify events, by executing the search query, associated with a specific time range (i.e., the period of time) and each available option represents the period over which the KPI value is calculated, such as the last minute, last 15 minutes, last hour, last 4 hours, last day, last week, etc. Each time the KPI is recalculated (e.g., at the interval specified using 2964), the values are determined according to the statistic option specified using 2966a, over the period of time specified using 2966b. The bucketing options of drop-down 2966c each indicate a period of time from which the calculated values should be grouped together for purposes of determining the state of the KPI. The bucketing options may include by minute, by 15 minutes, by hour, by four hours, by day, by week, etc. For example, when looking at data over the last hour and when a bucketing option of 15 minutes is selected, the calculated values may be grouped every 15 minutes, and if the calculated values (e.g., the maximum or average) for the 15 minute bucket cross a threshold into a particular state, the state of the KPI for the whole hour may be set to that particular state.


Importance indicator of 2962 may include a drop-down menu with various weighting options. As discussed in more detail with respect to FIGS. 32 and 33, the weighting options indicate the importance of the associated KPI value to the overall health of the service. These weighting options may include, for example, values from 1 to 10, where the higher values indicate higher importance of the KPI relative to the other KPIs for the service. When determining the overall health of the service, the weighting values of each KPI may be used as a multiplier to normalize the KPIs, so that the values of KPIs having different weights may be combined together. In one implementation, a weighting option of 11 may be available as an overriding weight. The overriding weight is a weight that overrides the weights of all other KPIs of the service. For example, if the state of the KPI, which has the overriding weight, is “warning” but all other KPIs of the service have a “normal” state, then the service may only be considered in a warning state, and the normal state(s) for the other KPIs can be disregarded.



FIG. 30 illustrates an example GUI 3000 for enabling a user to set one or more thresholds for the KPI, in accordance with one or more implementations of the present disclosure. Each threshold for a KPI defines an end of a range of values, which represents one of the states. GUI 3000 can display a button 3002 for adding a threshold to the KPI. If button 3002 is selected, a GUI for facilitating user input for the parameters for the state associated with the threshold can be displayed, as discussed in greater detail below in conjunction with FIGS. 31A-C.


Referring to FIG. 30, if button 3002 is selected three times, there will be three thresholds for the KPI. Each threshold defines an end of a range of values, which represents one of the states. GUI 3000 can display a UI element (e.g., column 3006) that includes sections representing the defined states for the KPI, as described in greater detail below in conjunction with FIGS. 31A-C. GUI 3000 can facilitate user input to specify a maximum value 3004 and a minimum value 3008 for defining a scale for a widget that can be used to represent the KPI on the service-monitoring dashboard. Some implementations of widgets for representing KPIs are discussed in greater detail below in conjunction with FIGS. 40-42 and FIGS. 44-46.


Referring to FIG. 30, GUI 3000 can optionally include a button 3010 for receiving input indicating whether to apply the threshold(s) to the aggregate of the KPIs of the service or to the particular KPI. Some implementations for applying the threshold(s) to the aggregate of the KPIs of the service or to a particular KPI are discussed in greater detail below in conjunction with FIGS. 32-34.



FIG. 31A illustrates an example GUI 3100 for defining threshold settings for a KPI, in accordance with one or more implementations of the present disclosure. GUI 3100 is a modified view of GUI 3000, which is provided once the user has requested to add several thresholds for a KPI via button 3002 of GUI 3000. In particular, in response to the user request to add a threshold, GUI 3100 dynamically adds a GUI element in a designated area of GUI 3100. A GUI element can be in the form of an input box divided into several portions to receive various user input and visually illustrate the received input. The GUI element can represent a specific state of the KPI. When multiple states are defined for the KPI, several GUI elements can be presented in the GUI 3100. For example, the GUI elements can be presented as input boxes of the same size and with the same input fields, and those input boxes can be positioned horizontally, parallel to each other, and resemble individual records from the same table. Alternatively, other types of GUI elements can be provided to represent the states of the KPI.


Each state of the KPI can have a name, and can be represented by a range of values, and a visual indicator. The range of values is defined by one or more thresholds that can provide the minimum end and/or the maximum end of the range of values for the state. The characteristics of the state (e.g., the name, the range of values, and a visual indicator) can be edited via input fields of the respective GUI element.


In the example shown in FIG. 31A, GUI 3100 includes three GUI elements representing three different states of the KPI based on three added thresholds. These states include states 3102, 3104, and 3106.


For each state, GUI 3100 can include a GUI element that displays a name (e.g., a unique name for that KPI) 3109, a threshold 3110, and a visual indicator 3112 (e.g., an icon having a distinct color for each state). The unique name 3109, a threshold 3110, and a visual indicator 3112 can be displayed based on user input received via the input fields of the respective GUI element. For example, the name “Normal” can be specified for state 3106, the name “Warning” can be specified for state 3104, the name “Critical” can be specified for state 3102.


The visual indicator 3112 can be, for example, an icon having a distinct visual characteristic such as a color, a pattern, a shade, a shape, or any combination of color, pattern, shade and shape, as well as any other visual characteristics. For each state, the GUI element can display a drop-down menu 3114, which when selected, displays a list of available visual characteristics. A user selection of a specific visual characteristic (e.g., a distinct color) can be received for each state.


For each state, input of a threshold value representing the minimum end of the range of values for the corresponding state of the KPI can be received via the threshold portion 3110 of the GUI element. The maximum end of the range of values for the corresponding state can be either a preset value or can be defined by (or based on) the threshold associated with the succeeding state of the KPI, where the threshold associated with the succeeding state is higher than the threshold associated with the state before it.


For example, for Normal state 3106, the threshold value 0 may be received to represent the minimum end of the range of KPI values for that state. The maximum end of the range of KPI values for the Normal state 3106 can be defined based on the threshold associated with the succeeding state (e.g., Warning state 3104) of the KPI. For example, the threshold value 50 may be received for the Warning state 3104 of the KPI. Accordingly, the maximum end of the range of KPI values for the Normal state 3106 can be set to a number immediately preceding the threshold value of 50 (e.g., it can be set to 49 if the values used to indicate the KPI state are integers).


The maximum end of the range of KPI values for the Warning state 3104 is defined based on the threshold associated with the succeeding state (e.g., Critical state 3102) of the KPI. For example, the threshold value 75 may be received for the Critical state 3102 of the KPI, which may cause the maximum end of the range of values for the Warning state 3104 to be set to 74. The maximum end of the range of values for the highest state (e.g., Critical state 3102) can be a preset value or an indefinite value.


When input is received for a threshold value for a corresponding state of the KPI and/or a visual characteristic for an icon of the corresponding state of the KPI, GUI 3100 reflects this input by dynamically modifying a visual appearance of a vertical UI element (e.g., column 3118) that includes sections that represent the defined states for the KPI. Specifically, the sizes (e.g., heights) of the sections can be adjusted to visually illustrate ranges of KPI values for the states of the KPI, and the threshold values can be visually represented as marks on the column 3118. In addition, the appearance of each section is modified based on the visual characteristic (e.g., color, pattern) selected by the user for each state via a drop-down menu 3114. In some implementations, once the visual characteristic is selected for a specific state, it is also illustrated by modified appearance (e.g., modified color or pattern) of icon 3112 positioned next to a threshold value associated with that state.


For example, if the color green is selected for the Normal state 3106, a respective section of column 3118 can be displayed with the color green to represent the Normal state 3106. In another example, if the value 50 is received as input for the minimum end of a range of values for the Warning state 3104, a mark 3117 is placed on column 3118 to represent the value 50 in proportion to other marks and the overall height of the column 3118. As discussed above, the size (e.g., height) of each section of the UI element (e.g., column) 3118 is defined by the minimum end and the maximum end of the range of KPI values of the corresponding state.


In one implementation, GUI 3100 displays one or more pre-defined states for the KPI. Each predefined state is associated with at least one of a pre-defined unique name, a pre-defined value representing a minimum end of a range of values, or a predefined visual indicator. Each pre-defined state can be represented in GUI 3100 with corresponding GUI elements as described above.


GUI 3100 can facilitate user input to specify a maximum value 3116 and a minimum value 3120 for the combination of the KPI states to define a scale for a widget that represents the KPI. Some implementations of widgets for representing KPIs are discussed in greater detail below in conjunction with FIGS. 40-42 and FIGS. 44-46. GUI 3100 can display a button 3122 for receiving input indicating whether to apply the threshold(s) to the aggregate KPI of the service or to the particular KPI or both. The application of threshold(s) to the aggregate KPI of the service or to a particular KPI is discussed in more detail below in conjunction with FIG. 33.



FIGS. 31B-31C illustrate GUIs for defining threshold settings for a KPI, in accordance with an alternative implementation of the present disclosure. In GUI 3150 of FIG. 31B, adjacent to column 3118, a line chart 3152 is displayed. The line chart 3152 represents the KPI values for the current KPI over a period of time selected from drop down menu 3154. The KPI values are plotted over the period of time on a first horizontal axis and against a range of values set by the maximum value 3116 and minimum value 3120 on a second vertical axis. In one implementation when a mark 3156 is added to column 3118 indicating the end of a range of values for the a particular state a horizontal line 3158 is displayed along the length of line chart 3152. The horizontal line 3158 makes it easy to visually correlate the KPI values represented by line chart 3152 with the end of the range of values. For example, in FIG. 31B, with the “Critical” state having a range below 15 GB, the horizontal line 3158 indicates that the KPI values drop below the end of the range four different times. This may provide information to a user that the end of the range of values indicated by mark 3156 can be adjusted.


In GUI 3160 of FIG. 31C, the user has adjusted the position of mark 3156, thereby decreasing the end of the range of values for the “Critical” state to 10 GB. Horizontal line 3158 is also lowered to reflect the change. In one implementation, the user may click and drag mark 3156 down to the desired value. In another implementation, the user may type in the desired value. The user can tell that the KPI values now drop below the end of the only once, thereby limiting the number of alerts associated with the defined threshold.



FIGS. 31D-31F illustrate example GUIs for defining threshold settings for a KPI, in accordance with alternative implementations of the present disclosure. In one implementation, for services that have multiple entities, the method for determining the KPI value from data across the multiple entities is applied on a per entity basis. For example, if machine data pertaining to a first entity searched to produce a value relevant to the KPI (e.g., CPU load) every minute while machine data pertaining to a second entity is searched to produce the value relevant to the KPI every hour, simply averaging all the values together would give a skewed result, as the sheer number of values produced from the machine data pertaining to the first entity would mask any values produced from the machine data pertaining to the second entity in the average. Accordingly, in one implementation, the average value (e.g., cpu_load_percent) per entity is calculated over the selected time period and that average value for each entity is aggregated together to determine the KPI for the service. A per-entity average value that is calculated over the selected time period can represent a contribution of a respective KPI entity to the KPI of the service. Since the values are calculated on a per entity basis, thresholds can not only be applied to the KPI of the service (calculated based on contributions of all KPI entities of the service) but also to a KPI contribution of an individual entity. Different threshold types can be defined depending on threshold usage.


In GUI 3159 of FIG. 31D, different threshold types 3161 are presented. Threshold types 3161 include an aggregate threshold type, a per-entity threshold type and a combined threshold type. An aggregate threshold type represents thresholds applied to a KPI, which represents contributions of all KPI entities in the service. With an aggregate threshold type, a current KPI state can be determined by applying the determination component of the search query to an aggregate of machine data pertaining to all individual KPI entities to produce a KPI value and applying at least one aggregate threshold to the KPI value.


A per-entity threshold type represents thresholds applied separately to KPI contributions of individual KPI entities of the service. With a per-entity threshold type, a current KPI state can be determined by applying the determination component to an aggregate of machine data pertaining to an individual KPI entity to determine a KPI contribution of the individual KPI entity, comparing at least one per-entity threshold with a KPI contribution separately for each individual KPI entity, and selecting the KPI state based on a threshold comparison with a KPI contribution of a single entity. In other words, a contribution of an individual KPI entity can define the current state of the KPI of the service. For example, if the KPI of the service is below a critical threshold corresponding to the start of a critical state but a contribution of one of the KPI entities is above the critical threshold, the state of the KPI can be determined as critical.


A combined threshold type represents discrete thresholds applied separately to the KPI values for the service and to the KPI contributions of individual entities in the service. With a combined threshold type, a current KPI state can be determined twice—first by comparing at least one aggregate threshold with the KPI of the service, and second by comparing at least one per-entity threshold with a KPI contribution separately for each individual KPI entity.


In the example of FIG. 31D, the aggregate threshold type has been selected using a respective GUI element (e.g., one of buttons 3161), and thresholds have been provided to define different states for the KPI of the service. In response to the selection of the aggregate threshold type, GUI 3159 presents an interface component including line chart 3163 that visualizes predefined KPI states and how a current state of the KPI changes over a period of time selected from the monitoring GUI 2960. In one implementation, the interface component includes a horizontal axis representing the selected period of time (e.g., last 60 minutes) and a vertical axis representing the range of possible KPI values. The various states of the KPI are represented by horizontal bands, such as 3164, 3165, 3166, displayed along the horizontal length of the interface component. In one implementation, when a mark is added to column 3162 indicating the start or end of a range of values for a particular state, a corresponding horizontal band is also displayed. The marks in column 3162 can be dragged up and down to vary the KPI thresholds, and correspondingly, the ranges of values that correspond to each different state. Line chart 3163 represents KPI values for the current KPI over a period of time selected from the monitoring GUI 2960 and determined by the determination component of the search query, as described above. The KPI values are plotted over the period of time on a horizontal axis and against a range of values set by the maximum value and minimum value on a vertical axis. The horizontal bands 3164-3166 make it easy to visually correlate the KPI values represented by line chart 3163 with the start and end of the range of values of a particular state. For example, in FIG. 31D, with the “Critical” state having a range above 69.34%, the horizontal band 3164 indicates that the KPI value exceeds the start of the range one time. Since line chart 3163 represents the KPI of the service, the values plotted by line chart 3163 may include the average of the average cpu_load_percent of all KPI entities in the service, calculated over the selected period of time. Accordingly, the state of the KPI may only change when the aggregate contribution of all KPI entities crosses the threshold from one band 3164 to another 3165.


In GUI 3170 of FIG. 31E, adjacent to column 3162, an interface component with two line charts 3173 and 3177 is displayed. In this implementation, the per entity threshold type has been selected using a respective GUI element (e.g., one of buttons 3161). Accordingly, the line charts 3173 and 3177 represent the KPI contributions of individual entities in the service over the period of time selected from the monitoring GUI 2960. The per-entity contributions are plotted over the period of time on a first horizontal axis and against a range of values set by the maximum value and minimum value on a second vertical axis. Since line charts 3173 and 3177 represent per entity KPI contributions, the values plotted by line chart 3173 may include the average cpu_load_percent of a first entity over the selected period of time, while the values plotted by line chart 3177 may include the average cpu_load_percent of a second entity over the same period of time. In one implementation, the determination component of the search query determines a contribution of an individual KPI entity from an aggregate of machine data corresponding to the individual KPI entity, applies at least one entity threshold to the contribution of the individual KPI entity, and selects a KPI state based at least in part on the determined contribution of the individual KPI entity in view of the applied threshold. Accordingly, the state of the KPI may change when any of the per entity contributions cross the threshold from one band 3166 to another 3165.


In GUI 3180 of FIG. 31F, the combined threshold type has been selected using a respective GUI element (e.g., one of buttons 3161). Accordingly GUI 3180 includes two separate interface components with one line chart 3183 on a first set of axes that represents the KPI of the service in the first interface component, and two additional line charts 3187 and 3188 on a second set of axes that represent the per entity KPI contributions in the second interface component. Both sets of axes represent the same period of time on the horizontal axes, however, the range of values on the vertical axes may differ. Similarly, separate thresholds may be applied to the service KPI represented by line chart 3183 and to the per entity KPI contributions represented by line charts 3187 and 3188. Since line chart 3183 represents the service KPI, the values plotted by line chart 3183 may include the average of the average cpu_load_percent of all entities in the service, calculated over the selected period of time. Accordingly, the state of the KPI may only change when the aggregate value crosses the thresholds that separate any of bands 3184, 3185, 3186 or 3189. Since line charts 3187 and 3188 represent per entity contributions for the KPI, the values plotted by line chart 3187 may include the average cpu_load_percent of a first entity over the selected period of time, while the values plotted by line chart 3188 may include the average cpu_load_percent of a second entity over the same period of time. Accordingly, the state of the KPI may change when any of the per entity values cross the thresholds that separate any of bands 3164, 3165 or 3166. In cases where the aggregate thresholds and per entity thresholds result in different states for the KPI, in one implementation, the more severe state may take precedence and be set as the state of the KPI. For example, if the aggregate threshold indicates a state of “Medium” but one of the per entity thresholds indicates a state of “High,” the more severe “High” state may be used as the overall state of the KPI.


In one implementation, a visual indicator, also referred to herein as a “lane inspector,” may be present in any of the GUIs 3150-3180. The lane inspector includes, for example, a line or other indicator that spans vertically across the bands at a given point in time along the horizontal time axis. The lane inspector may be user manipulable such that it may be moved along the time axis to different points. In one implementation, the lane inspector includes a display of the point in time at which it is currently located. In one implementation, the lane inspector further includes a display of a KPI value reflected in each of the line charts at the current point in time illustrated by the lane inspector. Additional details of the lane inspector are described below, but are equally applicable to this implementation.



FIG. 31G is a flow diagram of an implementation of a method for defining one or more thresholds for a KPI on a per entity basis, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 3422 is performed by the client computing machine. In another implementation, the method 3422 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 3191, the computing machine causes display of a GUI that presents information specifying a service definition for a service and a specification for determining a KPI for the service. In one implementation, the service definition identifies a service provided by a plurality of entities each having corresponding machine data. The specification for determining the KPI refers to the KPI definitional information (e.g., which entities, which records/fields from machine data, what time frame, etc.) that is being defined and is stored as part of the service definition or in association with the service definition. In one implementation, the KPI is defined by a search query that produces a value derived from the machine data pertaining to one or more KPI entities selected from among the plurality of entities. The KPI entities may include a set of entities of the service (i.e., service entities) whose relevant machine data is used in the calculation of the KPI. Thus, the KPI entities may include either whole set or a subset of the service entities. The value produced by the search query may be indicative of a performance assessment for the service at a point in time or during a period of time. In one implementation, the search query includes a machine data selection component that is used to arrive at a set of data from which to calculate a KPI and a determination component to derive a representative value for an aggregate of machine data. The determination component is applied to the identified set of data to produce a value on a per-entity basis (a KPI contribution of an individual entity). In one alternative, the machine data selection component is applied once to the machine data to gather the totality of the machine data for the KPI, and returns the machine data sorted by entity, to allow for repeated application of the determination component to the machine data pertaining to each entity on an individual basis.


At block 3192, the computing machine receives user input specifying one or more entity thresholds for each of the KPI entities. The entity thresholds each represent an end of a range of values corresponding to a particular KPI state from among a set of KPI states, as described above.


At block 3193, the computing machine stores the entity thresholds in association with the specification for determining the KPI for the service. In one implementation, the entity thresholds are added to the service definition.


At block 3194, the computing machine makes the stored entity thresholds available for determining a state of the KPI. In one implementation, determining the state of the KPI includes determining a contribution of an individual KPI entity by applying the determination component to an aggregate of machine data corresponding to the individual KPI entity, and then applying at least one entity threshold to a KPI contribution of the individual KPI entity. Further, the computing machine selects a KPI state based at least in part on the determined contribution of the individual KPI entity in view of the applied entity threshold. In one implementation, the entity thresholds are made available by exposing them through an API. In one implementation, the entity thresholds are made available by storing information for referencing them in an index of definitional components. In one implementation, the entity thresholds are made available as an integral part of storing them in a particular logical or physical location, such as logically storing them as part of a KPI definitional information collection associated with a particular service definition. In such an implementation, a single action or process, then, may accomplish both the storing of the entity thresholds, and the making available of the entity thresholds.


Aggregate Key Performance Indicators



FIG. 32 is a flow diagram of an implementation of a method 3200 for calculating an aggregate KPI score for a service based on the KPIs for the service, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 3201, the computing machine identifies a service to evaluate. The service is provided by one or more entities. The computing system can receive user input, via one or more graphical interfaces, selecting a service to evaluate. The service can be represented by a service definition that associates the service with the entities as discussed in more detail above.


At block 3203, the computing machine identifies key performance indicators (KPIs) for the service. The service definition representing the service can specify KPIs available for the service, and the computing machine can determine the KPIs for the service from the service definition of the service. Each KPI can pertain to a different aspect of the service. Each KPI can be defined by a search query that derives a value for that KPI from machine data pertaining to entities providing the service. As discussed above, the entities providing the service are identified in the service definition of the service. According to a search query, a KPI value can be derived from machine data of all or some entities providing the service.


In some implementations, not all of the KPIs for a service are used to calculate the aggregate KPI score for the service. For example, a KPI may solely be used for troubleshooting and/or experimental purposes and may not necessarily contribute to providing the service or impacting the performance of the service. The troubleshooting/experimental KPI can be excluded from the calculation of the aggregate KPI score for the service.


In one implementation, the computing machine uses a frequency of monitoring that is assigned to a KPI to determine whether to include a KPI in the calculation of the aggregate KPI score. The frequency of monitoring is a schedule for executing the search query that defines a respective KPI. As discussed above, the individual KPIs can represent saved searches. These saved searches can be scheduled for execution based on the frequency of monitoring of the respective KPIs. In one example, the frequency of monitoring specifies a time period (e.g., 1 second, 2 minutes, 10 minutes, 30 minutes, etc.) for executing the search query that defines a respective KPI, which then produces a value for the respective KPI with each execution of the search query. In another example, the frequency of monitoring specifies particular times (e.g., 6:00 am, 12:00 pm, 6:00 pm, etc.) for executing the search query. The values produced for the KPIs of the service, based on the frequency of monitoring for the KPIs, can be considered when calculating a score for an aggregate KPI of the service, as discussed in greater detail below in conjunction with FIG. 34A.


Alternatively, the frequency of monitoring can specify that the KPI is not to be measured (that the search query for a KPI is not to be executed). For example, a troubleshooting KPI may be assigned a frequency of monitoring of zero.


In one implementation, if a frequency of monitoring is unassigned for a KPI, the KPI is automatically excluded in the calculation for the aggregate KPI score. In one implementation, if a frequency of monitoring is unassigned for a KPI, the KPI is automatically included in the calculation for the aggregate KPI score.


The frequency of monitoring can be assigned to a KPI automatically (without any user input) based on default settings or based on specific characteristics of the KPI such as a service aspect associated with the KPI, a statistical function used to derive a KPI value (e.g., maximum versus average), etc. For example, different aspects of the service can be associated with different frequencies of monitoring, and KPIs can inherit frequencies of monitoring of corresponding aspects of the service.


Values for KPIs can be derived from machine data that is produced by different sources. The sources may produce the machine data at various frequencies (e.g., every minute, every 10 minutes, every 30 minutes, etc.) and/or the machine data may be collected at various frequencies (e.g., every minute, every 10 minutes, every 30 minutes, etc.). In another example, the frequency of monitoring can be assigned to a KPI automatically (without any user input) based on the accessibility of machine data associated with the KPI (associated through entities providing the service). For example, an entity may be associated with machine data that is generated at a medium frequency (e.g., every 10 minutes), and the KPI for which a value is being produced using this particular machine data can be automatically assigned a medium frequency for its frequency of monitoring.


Alternatively, frequency of monitoring can be assigned to KPIs based on user input. FIG. 33A illustrates an example GUI 3300 for creating and/or editing a KPI, including assigning a frequency of monitoring to a KPI, based on user input, in accordance with one or more implementations of the present disclosure. GUI 3300 for can include a button 3311 to receive a user request to assign a frequency of monitoring to the KPI being created or modified. Upon activating button 3311, a user can enter (e.g., via another GUI or a command line interface) a frequency (e.g., a user defined value) for the KPI, or select a frequency from a list presented to the user. In one example, the list may include various frequency types, where each frequency type is mapped to a pre-defined and/or user-defined time period. For example, the frequency types may include Real Time (e.g., 1 second), High Frequency (e.g., 2 minutes), Medium Frequency (e.g., 10 minutes), Low Frequency (e.g., 30 minutes), Do Not Measure (e.g., no frequency).


The assigned frequency of monitoring of KPIs can be included in the service definition specifying the KPIs, or in a separate data structure together with other settings of a KPI.


Referring to FIG. 32, at block 3205, the computing machine derives one or more values for each of the identified KPIs. The computing machine can cause the search query for each KPI to execute to produce a corresponding value. In one implementation, as discussed above, the search query for a particular KPI is executed based on a frequency of monitoring assigned to the particular KPI. When the frequency of monitoring for a KPI is set to a time period, for example, High Frequency (e.g., 2 minutes), a value for the KPI is derived each time the search query defining the KPI is executed every 2 minutes. The derived value(s) for each KPI can be stored in an index. In one implementation, when a KPI is assigned a frequency of monitoring of Do Not Measure or is assigned a zero frequency (no frequency), no value is produced (the search query for the KPI is not executed) for the respective KPI and no values for the respective KPI are stored in the data store.


At block 3207, the computing machine calculates a value for an aggregate KPI score for the service using the value(s) from each of the KPIs of the service. The value for the aggregate KPI score indicates an overall performance of the service. For example, a Web Hosting service may have 10 KPIs and one of the 10 KPIs may have a frequency of monitoring set to Do Not Monitor. The other nine KPIs may be assigned various frequencies of monitoring. The computing machine can access the values produced for the nine KPIs in the data store to calculate the value for the aggregate KPI score for the service, as discussed in greater detail below in conjunction with FIG. 34A. Based on the values obtained from the data store, if the values produced by the search queries for 8 of the 9 KPIs indicate that the corresponding KPI is in a normal state, then the value for an aggregate KPI score may indicate that the overall performance of the service is normal.


An aggregate KPI score can be calculated by adding the values of all KPIs of the same service together. Alternatively, an importance of each individual KPI relative to other KPIs of the service is considered when calculating the aggregate KPI score for the service. For example, a KPI can be considered more important than other KPIs of the service if it has a higher importance weight than the other KPIs of the service.


In some implementations, importance weights can be assigned to KPIs automatically (without any user input) based on characteristics of individual KPIs. For example, different aspects of the service can be associated with different weights, and KPIs can inherit weights of corresponding aspects of the service. In another example, a KPI deriving its value from machine data pertaining to a single entity can be automatically assigned a lower weight than a KPI deriving its value from machine data pertaining to multiple entities, etc.


Alternatively, importance weights can be assigned to KPIs based on user input. Referring again to FIG. 33A, GUI 3300 can include a button 3309 to receive a user request to assign a weight to the KPI being created or modified. Upon selecting button 3309, a user can enter (e.g., via another GUI or a command line interface) a weight (e.g., a user defined value) for the KPI, or select a weight from a list presented to the user. In one implementation, a greater value indicates that a greater importance is placed on a KPI. For example, the set of values may be 1-10, where the value 10 indicates high importance of the KPI relative to the other KPIs for the service. For example, a Web Hosting service may have three KPIs: (1) CPU Usage, (2) Memory Usage, and (3) Request Response Time. A user may provide input indicating that the Request Response Time KPI is the most important KPI and may assign a weight of 10 to the Request Response Time KPI. The user may provide input indicating that the CPU Usage KPI is the next most important KPI and may assign a weight of 5 to the CPU Usage KPI. The user may provide input indicating that the Memory Usage KPI is the least important KPI and may assign a weight of 1 to the Memory Usage KPI.


In one implementation, a KPI is assigned an overriding weight. The overriding weight is a weight that overrides the importance weights of the other KPIs of the service. Input (e.g., user input) can be received for assigning an overriding weight to a KPI. The overriding weight indicates that the status (state) of KPI should be used a minimum overall state of the service. For example, if the state of the KPI, which has the overriding weight, is warning, and one or more other KPIs of the service have a normal state, then the service may only be considered in either a warning or critical state, and the normal state(s) for the other KPIs can be disregarded.


In another example, a user can provide input that ranks the KPIs of a service from least important to most important, and the ranking of a KPI specifies the user selected weight for the respective KPI. For example, a user may assign a weight of 1 to the Memory Usage KPI, assign a weight of 2 to the CPU Usage KPI, and assign a weight of 3 to the Request Response Time KPI. The assigned weight of each KPI may be included in the service definition specifying the KPIs, or in a separate data structure together with other settings of a KPI.


Alternatively or in addition, a KPI can be considered more important than other KPIs of the service if it is measured more frequently than the other KPIs of the service. In other words, search queries of different KPIs of the service can be executed with different frequency (as specified by a respective frequency of monitoring) and queries of more important KPIs can be executed more frequently than queries of less important KPIs.


As will be discussed in more detail below in conjunction with FIG. 34A, the calculation of a score for an aggregate KPI may be based on ratings assigned to different states of an individual KPI. Referring again to FIG. 33A, a user can select button 3313 for defining threshold settings, including state ratings, for a KPI to display GUI 3350 in FIG. 33B. FIG. 33B illustrates an example GUI 3350 for defining threshold settings, including state ratings, for a KPI, in accordance with one or more implementations of the present disclosure. Similarly to GUI 3100 of FIG. 31A, GUI 3350 includes horizontal GUI elements (e.g., in the form of input boxes) 3352, 3354 and 3356 that represent specific states of the KPI. For each state, a corresponding GUI element can display a name 3359, a threshold 3360, and a visual indicator 3362 (e.g., an icon having a distinct color for each state). The name 3359, a threshold 3360, and a visual indicator 3362 can be displayed based on user input received via the input fields of the respective GUI element. GUI 3350 can include a vertical GUI element (e.g., a column) 3368 that changes appearance (e.g., the size and color of its sectors) based on input received for a threshold value for a corresponding state of the KPI and/or a visual characteristic for an icon of the corresponding state of the KPI. In some implementations, once the visual characteristic is selected for a specific state via the menu 3364, it is also illustrated by the modified appearance (e.g., modified color or pattern) of icon 3362 positioned next to a threshold value associated with that state.


In addition, GUI 3350 provides for configuring a rating for each state of the KPI. The ratings indicate which KPIs should be given more or less consideration in view of their current states. When calculating an aggregate KPI, a score of each individual KPI reflects the rating of that KPI's current state, as will be discussed in more detail below in conjunction with FIG. 34A. Ratings for different KPI states can be assigned automatically (e.g., based on a range of KPI values for a state) or specified by a user. GUI 3350 can include a field 3380 that displays an automatically generated rating or a rating entered or selected by a user. Field 3380 may be located next to (or in the same row as) a horizontal GUI element representing a corresponding state. Alternatively, field 3380 can be part of the horizontal GUI element. In one example, a user may provide input assigning a rating of 1 to the Normal State, a rating of 2 to the Warning State, and a rating of 3 to the Critical State.


In one implementation, GUI 3350 displays a button 3372 for receiving input indicating whether to apply the threshold(s) to the aggregate KPI of the service or to the particular KPI or both. If a threshold is configured to be applied to a certain individual KPI, then a specified action (e.g., generate alert, add to report) will be triggered when a value of that KPI reaches (or exceeds) the individual KPI threshold. If a threshold is configured to be applied to the aggregate KPI of the service, then a specified action (e.g., create notable event, generate alert, add to incident report) will be triggered when a value (e.g., a score) of the aggregate KPI reaches (or exceeds) the aggregate KPI threshold. In some implementations, a threshold can be applied to both or either the individual or aggregate KPI, and different actions or the same action can be triggered depending on the KPI to which the threshold is applied. The actions to be triggered can be pre-defined or specified by the user via a user interface (e.g., a GUI or a command line interface) while the user is defining thresholds or after the thresholds have been defined. The action to be triggered in view of thresholds can be included in the service definition identifying the respective KPI(s) or can be stored in a data structure dedicated to store various KPI settings of a relevant KPI.



FIG. 34A is a flow diagram of an implementation of a method 3400 for calculating a score for an aggregate KPI for the service, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 3402, the computing machine identifies a service to be evaluated. The service is provided by one or more entities. The computing system can receive user input, via one or more graphical interfaces, selecting a service to evaluate.


At block 3404, the computing machine identifies key performance indicators (KPIs) for the service. The computing machine can determine the KPIs for the service from the service definition of the service. Each KPI indicates how a specific aspect of the service is performing at a point in time.


As discussed above, in some implementations, a KPI pertaining to a specific aspect of the service (also referred to herein as an aspect KPI) can be defined by a search query that derives a value for that KPI from machine data pertaining to entities providing the service. Alternatively, an aspect KPI may be a sub-service aggregate KPI. Such a KPI is sub-service in the sense that it characterizes something less than the service as a whole. Such a KPI is an aspect KPI in the almost definitional sense that something less than the service as a whole is an aspect of the service. Such a KPI is an aggregate KPI in the sense that the search which defines it produces its value using a selection of accumulated KPI values in the data store (or of contemporaneously produced KPI values, or a combination), rather than producing its value using a selection of event data directly. The selection of accumulated KPI values for such a sub-service aggregate KPI includes values for as few as two different KPI's defined for a service, which stands in varying degrees of contrast to a selection including values for all, or substantially all, of the active KPI's defined for service as is the case with a service-level KPI. (A KPI is an active KPI when its definitional search query is enabled to execute on a scheduled basis in the service monitoring system. See the related discussion in regards to FIG. 32. Unless otherwise indicated, discussion herein related to KPI's associated with a service, or the like, may presume the reference is to active KPI definitions, particularly where the context relates to available KPI values, such that the notion of “all” may reasonably be understood to represent something corresponding to technically less than “all” of the relevant, extant KPI definitions.) A method for determining (e.g., by calculating) a service-level aggregate KPI is discussed in relation to the flow diagram of FIG. 32. A person of ordinary skill in the art now will understand how the teachings surrounding FIG. 32 may be adapted to determine or produce an aggregate KPI that is a sub-service aggregate KPI. Similarly, a person of skill in the art now will understand how teachings herein regarding GUIs for creating, establishing, modifying, viewing, or otherwise processing KPI definitions (such as GUIs discussed in relation to FIGS. 22-27) may be adapted to accommodate a KPI having a defining search query that produces its value using a selection of accumulated KPI values in the data store (or of contemporaneously produced KPI values, or a combination), rather than producing its value using a selection of event data directly.


At block 3406, the computing machine optionally identifies a weighting (e.g., user selected weighting or automatically assigned weighting) for each of the KPIs of the service. As discussed above, the weighting of each KPI can be determined from the service definition of the service or a KPI definition storing various setting of the KPI.


At block 3408, the computing machine derives one or more values for each KPI for the service by executing a search query associated with the KPI. As discussed above, each KPI is defined by a search query that derives the value for a corresponding KPI from the machine data that is associated with the one or more entities that provide the service.


As discussed above, the machine data associated with the one or more entities that provide the same service is identified using a user-created service definition that identifies the one or more entities that provide the service. The user-created service definition also identifies, for each entity, identifying information for locating the machine data pertaining to that entity. In another example, the user-created service definition also identifies, for each entity, identifying information for a user-created entity definition that indicates how to locate the machine data pertaining to that entity. The machine data can include for example, and is not limited to, unstructured data, log data, and wire data. The machine data associated with an entity can be produced by that entity. In addition or alternatively, the machine data associated with an entity can include data about the entity, which can be collected through an API for software that monitors that entity.


The computing machine can cause the search query for each KPI to execute to produce a corresponding value for a respective KPI. The search query defining a KPI can derive the value for that KPI in part by applying a late-binding schema to machine data or, more specifically, to events containing raw portions of the machine data. The search query can derive the value for the KPI by using a late-binding schema to extract an initial value and then performing a calculation on (e.g., applying a statistical function to) the initial value.


The values of each of the KPIs can differ at different points in time. As discussed above, the search query for a KPI can be executed based on a frequency of monitoring assigned to the particular KPI. When the frequency of monitoring for a KPI is set to a time period, for example, Medium Frequency (e.g., 10 minutes), a value for the KPI is derived each time the search query defining the KPI is executed every 10 minutes. The derived value(s) for each KPI can be stored in a data store. When a KPI is assigned a zero frequency (no frequency), no value is produced (the search query for the KPI is not executed) for the respective KPI.


The derived value(s) of a KPI is indicative of how an aspect of the service is performing. In one example, the search query can derive the value for the KPI by applying a late-binding schema to machine data pertaining to events to extract values for a specific fields defined by the schema. In another example, the search query can derive the value for that KPI by applying a late-binding schema to machine data pertaining to events to extract an initial value for a specific field defined by the schema and then performing a calculation on (e.g., applying a statistical function to) the initial value to produce the calculation result as the KPI value. In yet another example, the search query can derive the value for the KPI by applying a late-binding schema to machine data pertaining to events to extract an initial value for specific fields defined by the late-binding schema to find events that have certain values corresponding to the specific fields, and counting the number of found events to produce the resulting number as the KPI value.


At block 3410, the computing machine optionally maps the value produced by a search query for each KPI to a state. As discussed above, each KPI can have one or more states defined by one or more thresholds. In particular, each threshold can define an end of a range of values. Each range of values represents a state for the KPI. At a certain point in time or a period of time, the KPI can be in one of the states (e.g., normal state, warning state, critical state) depending on which range the value, which is produced by the search query of the KPI, falls into. For example, the value produced by the Memory Usage KPI may be in the range representing a Warning State. The value produced by the CPU Usage KPI may be in the range representing a Warning State. The value produced by the Request Response Time KPI may be in the range representing a Critical State.


At block 3412, the computing machine optionally maps the state for each KPI to a rating assigned to that particular state for a respective KPI (e.g., automatically or based on user input). For example, for a particular KPI, a user may provide input assigning a rating of 1 to the Normal State, a rating of 2 to the Warning State, and a rating of 3 to the Critical State. In some implementations, the same ratings are assigned to the same states across the KPIs for a service. For example, the Memory Usage KPI, CPU Usage KPI, and Request Response Time KPI for a Web Hosting service may each have Normal State with a rating of 1, a Warning State with a rating of 2, and a Critical State with a rating of 3. The computing machine can map the current state for each KPI, as defined by the KPI value produced by the search query, to the appropriate rating. For example, the Memory Usage KPI in the Warning State can be mapped to 2. The CPU Usage KPI in the Warning State can be mapped to 2. The Request Response Time KPI in the Critical State can be mapped to 3. In some implementations, different ratings are assigned to the same states across the KPIs for a service. For example, the Memory Usage KPI may each have Critical State with a rating of 3, and the Request Response Time KPI may have Critical State with a rating of 5.


At block 3414, the computing machine calculates an impact score for each KPI. In some implementations, the impact score of each KPI can be based on the importance weight of a corresponding KPI (e.g., weight×KPI value). In other implementations, the impact score of each KPI can be based on the rating associated with a current state of a corresponding KPI (e.g., rating x KPI value). In yet other implementations, the impact score of each KPI can be based on both the importance weight of a corresponding KPI and the rating associated with a current state of the corresponding KPI. For example, the computing machine can apply the weight of the KPI to the rating for the state of the KPI. The impact of a particular KPI at a particular point in time on the aggregate KPI can be the product of the rating of the state of the KPI and the importance (weight) assigned to the KPI. In one implementation, the impact score of a KPI can be calculated as follows:

Impact Score of KPI=(weight)×(rating of state)


For example, when the weight assigned to the Memory Usage KPI is 1 and the Memory Usage KPI is in a Warning State, the impact score of the Memory Usage KPI=1×2. When the weight assigned to the CPU Usage KPI is 2 and the CPU Usage KPI is in a Warning State, the impact score of the CPU Usage KPI=2×2. When the weight assigned to the Request Response Time KPI is 3 and the Request Response Time KPI is in a Critical State, the impact score of the Request Response Time KPI=3×3.


In another implementation, the impact score of a KPI can be calculated as follows:

Impact Score of KPI=(weight)×(rating of state)×(value)


In yet some implementations, the impact score of a KPI can be calculated as follows:

Impact Score of KPI=(weight)×(value)


At block 3416, the computing machine calculates an aggregate KPI score (“score”) for the service based on the impact scores of individual KPIs of the service. The score for the aggregate KPI indicates an overall performance of the service. The score of the aggregate KPI can be calculated periodically (as configured by a user or based on a default time interval) and can change over time based on the performance of different aspects of the service at different points in time. For example, the aggregate KPI score may be calculated in real time (continuously calculated until interrupted). The aggregate KPI score may be calculated may be calculated periodically (e.g., every second).


In some implementations, the score for the aggregate KPI can be determined as the sum of the individual impact scores for the KPIs of the service. In one example, the aggregate KPI score for the Web Hosting service can be as follows:

Aggregate KPIWeb Hosting=(weight×rating of state)Memory Usage KPI+(weight×rating of state)CPU Usage KPI+(weight×rating of state)Request Response Time KPI=(1×2)+(2×2)+(3×3)=15.


In another example, the aggregate KPI score for the Web Hosting service can be as follows:

Aggregate KPIWeb Hosting=(weight×rating of state×value)Memory Usage KPI+(weight×rating of state×value)CPU Usage KPI+(weight×rating of state×value)Request Response Time KPI=(1×2×60)+(2×2×55)+(3×3×80)=1060.


In yet some other implementations, the impact score of an aggregate KPI can be calculated as a weighted average as follows:

Aggregate KPIWeb Hosting=[(weight×rating of state)Memory Usage KPI+(weight×rating of state)CPU Usage KPI+(weight×rating of state)Request Response Time KPI)]/(weightMemory Usage KPI+weightCPU Usage KPI+weightRequest Response Time KPI)


A KPI can have multiple values produced for the particular KPI for different points in time, for example, as specified by a frequency of monitoring for the particular KPI. The multiple values for a KPI can be that in a data store. In one implementation, the latest value that is produced for the KPI is used for calculating the aggregate KPI score for the service, and the individual impact scores used in the calculation of the aggregate KPI score can be the most recent impact scores of the individual KPIs based on the most recent values for the particular KPI stored in a data store. Alternatively, a statistical function (e.g., average, maximum, minimum, etc.) is performed on the set of the values that is produced for the KPI is used for calculating the aggregate KPI score for the service. The set of values can include the values over a time period between the last calculation of the aggregate KPI score and the present calculation of the aggregate KPI score. The individual impact scores used in the calculation of the aggregate KPI score can be average impact scores, maximum impact score, minimum impact scores, etc. over a time period between the last calculation of the aggregate KPI score and the present calculation of the aggregate KPI score.


The individual impact scores for the KPIs can be calculated over a time range (since the last time the KPI was calculated for the aggregate KPI score). For example, for a Web Hosting service, the Request Response Time KPI may have a high frequency (e.g., every 2 minutes), the CPU Usage KPI may have a medium frequency (e.g., every 10 minutes), and the Memory Usage KPI may have a low frequency (e.g., every 30 minutes). That is, the value for the Memory Usage KPI can be produced every 30 minutes using machine data received by the system over the last 30 minutes, the value for the CPU Usage KPI can be produced every 10 minutes using machine data received by the system over the last 10 minutes, and the value for the Request Response Time KPI can be produced every 2 minutes using machine data received by the system over the last 2 minutes. Depending on the point in time for when the aggregate KPI score is being calculated, the value (e.g., and thus state) of the Memory Usage KPI may not have been refreshed (the value is stale) because the Memory Usage KPI has a low frequency (e.g., every 30 minutes). Whereas, the value (e.g., and thus state) of the Request Response Time KPI used to calculate the aggregate KPI score is more likely to be refreshed (reflect a more current state) because the Request Response Time KPI has a high frequency (e.g., every 2 minutes). Accordingly, some KPIs may have more impact on how the score of the aggregate KPI changes overtime than other KPIs, depending on the frequency of monitoring of each KPI.


In one implementation, the computing machine causes the display of the calculated aggregate KPI score in one or more graphical interfaces and the aggregate KPI score is updated in the one or more graphical interfaces each time the aggregate KPI score is calculated. In one implementation, the configuration for displaying the calculated aggregate KPI in one or more graphical interfaces is received as input (e.g., user input), stored in a data store coupled to the computing machine, and accessed by the computing machine.


At block 3418, the computing machine compares the score for the aggregate KPI to one or more thresholds. As discussed above with respect to FIG. 33B, one or more thresholds can be defined and can be configured to apply to a specific individual KPI and/or an aggregate KPI including the specific individual KPI. The thresholds can be stored in a data store that is coupled to the computing machine. If the thresholds are configured to be applied to the aggregate KPI, the computing machine compares the score of the aggregate KPI to the thresholds. If the computing machine determines that the aggregate KPI score exceeds or reaches any of the thresholds, the computing machine determines what action should be triggered in response to this comparison.


Referring to FIG. 34A, at block 3420, the computing machine causes an action be performed based on the comparison of the aggregate KPI score with the one or more thresholds. For example, the computing machine can generate an alert if the aggregate KPI score exceeds or reaches a particular threshold (e.g., the highest threshold). In another example, the computing machine can generate a notable event if the aggregate KPI score exceeds or reaches a particular threshold (e.g., the second highest threshold). In one implementation, the KPIs of multiple services is aggregated and used to create a notable event. In one implementation, the configuration for which of one or more actions to be performed is received as input (e.g., user input), stored in a data store coupled to the computing machine, and accessed by the computing machine.



FIG. 34AB is a flow diagram of an implementation of a method 3422 for automatically defining one or more thresholds for a KPI, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 3422 is performed by the client computing machine. In another implementation, the method 3422 is performed by a server computing machine coupled to the client computing machine over one or more networks.


In one implementation, rather than having the user manually configure thresholds by adjusting the sliders or inputting numeric values, as described above, the system may be configured to generate suggested thresholds, whether for aggregate, per entity or both. In one implementation, the suggested thresholds may be recommendations that can be applied to the data or that can serve as a starting point for further adjustment by the system user. The suggestions may be referred to as “automatic” thresholds or “auto-thresholds” in various implementations.


At block 3423, the computing machine receives user input requesting generation of threshold suggestions. In one implementation, a user may select a generate suggestions button that, when selected, initiates an auto-threshold determination process. Rather than having the user manually configure thresholds by adjusting the sliders or inputting numeric values, as described above, the system may be configured to generate suggested thresholds, whether for aggregate, per entity or both.


At block 3424, the computing machine receives user input indicating a method of threshold generation. For example, upon selection of the generate suggestions button, a threshold configuration GUI may be displayed. The threshold configuration GUI may have a number of selectable tabs that allow the user to select the method of auto-threshold determination. In one implementation, the methods include even splits, percentiles and standard deviation. The even splits method takes the range of values displayed in a graph and divides that range into a number of threshold ranges that each correspond to a KPI state for the selected service. In one implementation the threshold ranges are all evenly sized. In another implementation, the threshold ranges may vary in size. In one implementation, the threshold ranges may be referred to as “Fixed Intervals,” such that the size of the range does not change, but that one range may be of a different size than another range. The percentiles method takes the calculated KPI values and shows the distribution of those values divided into some number of percentile groups that each correspond to a KPI state for the selected service. The standard deviation method takes the calculated KPI values and shows the distribution of those values divided into some number of groups, based on standard deviation from the mean value, that each correspond to a KPI state for the selected service.


At block 3425, the computing machine receives user input indicating the severity ordering of the thresholds. The severity ordering refers to whether higher or lower values correspond to a more severe KPI state. In one implementation, a drop down menu may be provided that allows the user to select a severity ordering from among three options including: higher values are more critical, lower values are more critical, and higher and lower values are more critical. When the higher values are more critical option is selected, the state names are ordered such that they proceed in descending order from higher threshold values to lower threshold values. (The descending order of state names refers to a progression from most severe to least severe. The ascending order of state names refers to the a progression from least severe to most severe.) When the lower values are more critical option is selected, the state names are ordered such that they proceed in ascending order from lower threshold values to higher threshold values. When the higher and lower values are more critical option is selected, the state names are ordered such that they proceed in descending order from higher threshold values to some lower threshold values and then back up again on the severity scale as the threshold values continue to decrease. In such a case, the state names may appear as though they are reflected in order about a center point, with state names associated with greater severity ordered farther from the center.


At block 3426, depending on the selected method of threshold generation, the computing machine optionally receives user input indicating the time range of data for calculating threshold suggestions. The computing machine may analyze data from the selected time range in order to generate the threshold suggestions, rather than analyzing all available data, at least some of which may be stale or not relevant. The actual values that correspond to the boundaries of the threshold groups may not be determined until a period of time over which the values are to be calculated is selected from a pull down menu. Examples of the period of time may include, the last 60 minutes, the last day, the last week, etc. In one implementation, a period of time over which the values are to be calculated is selected when the method of auto-thresholding includes percentiles or standard deviation. In one implementation, no period of time is required when the even splits method is suggested.


At block 3427, the computing machine generates threshold suggestions based on the received user input. Upon selection of the period of time, the actual values that correspond to the boundaries of the threshold groups are calculated and displayed in the GUI. The user may be able to adjust, edit, add or delete thresholds from this GUI, as described above.



FIG. 34AC-AO illustrate example GUIs for configuring automatic thresholds for a KPI, in accordance with one or more implementations of the present disclosure. In GUI 3430 of FIG. 34AC, a generate suggestions button 3432 may be provided that, when selected, initiates the auto-threshold determination process. Once generated, indications of the thresholds may be displayed with reference to graph 3431. Graph 3431 includes a line chart the represents values, such as KPI values, over a period of time. The values are plotted over the period of time on a first horizontal axis and against a range of values set by the maximum value and minimum value on a second vertical axis. Upon selection of button 3432, a threshold configuration GUI 3434 may be displayed, as shown in FIG. 34AD.


In GUI 3434 of FIG. 34AD, a number of tabs may be provided that allow the user to select the method of auto-threshold determination. In one implementation, the even splits tab 3436 may be selected. The even splits method takes the range of values from the second vertical axis displayed in the graph 3431 and divides that range into a number of even threshold ranges that each correspond to a state of the selected service. In one embodiment, there may be a default number of threshold ranges (e.g., 5) each corresponding to a different state (i.e., critical, high, medium, low, normal). In one implementation, the threshold ranges 3438 are displayed in GUI 3434 along with the state corresponding to each range and what percentage of the total range of values from graph 3431 are represented by each threshold range. The actual values 3440 that correspond to the boundaries of the threshold ranges 3438 may also be displayed in GUI 3434. According to the example illustrated in FIGS. 34AC-AD, the range of values for the access latency on disks of a storage appliance from graph 3431 include 101.14 to 915.74 milliseconds. GUI 3434 shows that the critical state includes values above 83.3%, which corresponds to values above 745.921 milliseconds. Similarly, the high state includes values between 66.7% and 83.3%, which corresponds to values between 577.119 milliseconds and 745.921 milliseconds, and so on. GUI 3434 provides the ability for the user to rename the states, adjust the associated percentages that correspond to each state, and to add or remove displayed states as well. When the even splits tab 3436 is selected, upon the addition or removal of a state, GUI 3434 may display recalculated values 3440 so that the range of values corresponding to each state remains equal in size.


Once configuration of thresholds in the even splits tab 3436 is completed, horizontal bands 3444 corresponding to each state may be displayed on chart 3431, as illustrated in FIG. 34AE. As shown, the range of values represented by each band 3444 is equal since the thresholds were set using the even splits method. In one implementation, the names of the states and corresponding values 3446 representing the end of the threshold ranges are also displayed adjacent to chart 3431. The user may similarly be able to adjust, edit, add or delete thresholds from this GUI, as described above.


In GUI 3434 of FIG. 34AF, a drop down menu 3448 may be provided that allows the user to select a severity ordering. In one implementation, there are three options for severity ordering including: higher values are more critical, lower values are more critical, and higher and lower values are more critical. When the higher values are more critical option is selected, the state names 3438 are ordered such that they proceed in descending order from higher threshold values to lower threshold values (e.g., high is above 661.52, medium is between 661.52 and 407.3, normal is between 407.3 and 153.08, and so on). The severity ordering may be selected depending on the underlying KPI values. For example, a user may desire to set thresholds that warn them when certain values are getting too high (e.g., processor load) but when other values are getting too low (e.g., memory space remaining). In GUI 3434 of FIG. 34AG, the user has selected the option for lower values are more critical 3449. When the lower values are more critical option 3449 is selected, the state names 3452 are ordered such that they proceed in descending order from lower threshold values to higher threshold values 2454 (e.g., high is below 68.679, medium is between 68.679 and 237.481, low is between 237.481 and 407.3, and so on). The corresponding order of states would also be reflected in chart 3431.


In GUI 3434 of FIG. 34AH, the user has selected the option for higher and lower values are more critical. When the higher and lower values are more critical option is selected, the state names 3456 are ordered such that they proceed in descending order from higher threshold values to lower threshold values 3458 and then back up again on the severity scale as the threshold values continue to decrease (e.g., high is above 704.229 or between 110.371 and 25.97, medium is between 704.229 and 618.811 or between 195.789 and 110.371, low is between 618.811 and 534.41 or between 280.19 and 195.789, and so on). The higher and lower values are more critical option could be applicable to any KPI where the user wants to be warned if the value differs from an expected value by a certain amount in either direction (e.g., temperature). The corresponding order of states would also be reflected in chart 3431 as shown in FIG. 34AI. Once configuration of thresholds is completed, horizontal bands 3462 corresponding to each state may be displayed on chart 3431. As shown, the range of values represented by each band 3462 is equal since the thresholds were set using the even splits method. In one implementation, the names of the states and corresponding values 3464 representing the end of the threshold ranges are also displayed adjacent to chart 3431. The user may similarly be able to adjust, edit, add or delete thresholds from this GUI, as described above.


In GUI 3434 of FIG. 34AJ, the method of auto-threshold determination is selected using the percentiles tab 3466. The percentiles method takes the calculated KPI values and shows the distribution of those values divided into some number of percentile groups that each correspond to a state of the selected service. In one embodiment, there may be a default number of threshold groups (e.g., 5) each corresponding to a different state (i.e., critical, high, medium, low, normal). In one implementation, the threshold groups 3468 are displayed in GUI 3434 along with the state and percentile corresponding to each. The actual values that correspond to the boundaries of the threshold groups 3468 are not displayed until a period of time over which the values are to be calculated is selected from pull down menu 3470. Examples of the period of time may include the last 60 minutes, the last day, the last week, etc.


Upon selection of the period of time, the actual values 3471 that correspond to the boundaries of the threshold groups 3468 are displayed in GUI 3434, as shown in FIG. 34AK. According to the example illustrated in FIG. 34AK, the critical state includes values above the 90th percentile (indicating that 90% of the calculated values are below this state), which corresponds to an actual value of 401.158 milliseconds. Similarly, the high state includes values between the 90th and 75th percentiles, which correspond to values between 401.158 milliseconds and 341.737 milliseconds, and so on. GUI 3434 provides the ability for the user to rename the states, adjust the associated percentages that correspond to each state, and to add or remove displayed states as well. Once configuration of thresholds in the percentiles tab 3466 is completed, horizontal bands 3476 corresponding to each state may be displayed on chart 3431, as illustrated in FIG. 34AL. As shown, the range of values represented by each band 3476 varies according to the distribution of the data since the thresholds were set using the percentiles method. In one implementation, the names of the states and corresponding values 3478 representing the end of the threshold ranges are also displayed adjacent to chart 3431. The user may similarly be able to adjust, edit, add or delete thresholds from this GUI, as described above.


In GUI 3434 of FIG. 34AM, the method of auto-threshold determination is selected using the standard deviation tab 3480. The standard deviation method takes the calculated KPI values and shows the distribution of those values divided into some number of groups, based on standard deviation from the mean value, that each correspond to a state of the selected service. In one embodiment, there may be a default number of threshold groups (e.g., 5) each corresponding to a different state (i.e., critical, high, medium, low, normal). In one implementation, the threshold groups 3482 are displayed in GUI 3434 along with the state and number of standard deviations corresponding to each. The actual values that correspond to the boundaries of the threshold groups 3482 are not displayed until a period of time over which the values are to be calculated is selected from pull down menu 3484.


Upon selection of the period of time, the actual values 3486 that correspond to the boundaries of the threshold groups 3482 are displayed in GUI 3434, as shown in FIG. 34AN. According to the example illustrated in FIG. 34AN, the critical state includes values above the 2 standard deviations from the mean, which corresponds to an actual value of 582.825 milliseconds. Similarly, the high state includes values between 1 and 2 standard deviations from the mean, which corresponds to values between 582.825 milliseconds and 436.704 milliseconds, and so on. GUI 3434 provides the ability for the user to rename the states, adjust the associated percentages that correspond to each state, and to add or remove displayed states as well. Once configuration of thresholds in the standard deviation tab 3480 is completed, horizontal bands 3490 corresponding to each state may be displayed on chart 3431, as illustrated in FIG. 34A0. As shown, the range of values represented by each band 3490 varies according to the distribution of the data since the thresholds were set using the standard deviation method. In one implementation, the names of the states and corresponding values 3492 representing the end of the threshold ranges are also displayed adjacent to chart 3431. The user may similarly be able to adjust, edit, add or delete thresholds from this GUI, as described above.


Time Varying Static Thresholds


Time varying static thresholds may be an enhancement to the thresholds discussed above and may enable a user to customize a specific threshold or set of thresholds to vary over time. Thresholds may enable a user (e.g., IT managers) to indicate values that when exceeded may initiate an alert or some other action. One or more thresholds may apply to the same metric or metrics. For example, a CPU utilization metric may have a first threshold to indicate that a utilization less than 20% is good, a second threshold at 50% to indicate that a range from 20% to 50% is normal, and a third threshold at 100% to indicate that a range of 50% to 100% is critical. In some implementations, the thresholds may be set to specific values and the same values may apply at all times, for example, the same threshold may apply to both working hours and non-working hours.


In other implementations, threshold values may differ for different time frames. For example, computing resources may vary over time and what may be considered critical during one time frame may not be considered critical during another time frame. To address such a situation, time varying static thresholds can be provided to enable a user to generate different sets of KPI thresholds that apply to different time frames. In one example, a user may define a threshold scheme that includes multiple sets of thresholds that vary depending on time to account for expected variations in the metric. For instance, sets of thresholds may be defined to address variations in the utilization (e.g., variations in load or performance) of an email service to distinguish between an expected decrease in performance and a problematic decrease in performance. An expected decrease in performance may occur between 8 am and 10 am Monday-Friday because the email clients may synchronize when the client machines are first activated in the morning. A problematic decrease in performance may seem similar to the expected performance but may occur at different times and as a result of, for example, the server behaving erratically and may be a prelude to email service malfunction (e.g., email server crash). With a time varying static thresholds, a user may configure the thresholds based on time frames so that alarms would be avoided when the behavior is expected and alarms would be activated for abnormal behavior.


The time frames may be based on any unit of time, such as for example, time of the day, days of the week, certain months, holiday seasons or other duration of time. The time frames may apply in a cyclical manner, such that each of the multiple sets of KPI thresholds may apply sequentially over and over, for example, a first set of KPI thresholds may apply during weekdays and a second set of KPI thresholds may apply during weekends and the sets may be repeated for each consecutive week. The cyclical application of KPI thresholds may enable a user to have more granular control of KPI states and enhance the user's ability to discover abnormal behavior when behavior cycles. A user may use time varying static thresholds to better ensure alarms are triggered when appropriate and to avoid false positives such as triggering alarms when unnecessary.


As will be discussed in more detail below in conjunction with FIGS. 34AP through 34AS, a user may configure time varying static thresholds by defining multiple sets of KPI thresholds that correspond to different time frames. Each set of KPI thresholds may be defined by a user and may include one or more KPI thresholds. The KPI thresholds may be compared with KPI values to determine a state of a KPI at a point in time or during a period of time. Multiple GUIs may be used in conjunction with time varying static thresholds, for example, one GUI may allow the user to define the sets of KPI thresholds and another GUI may display the resulting states of a KPI that are determined based on the sets of KPI thresholds.



FIG. 34AP is a flow diagram of an implementation of a method 34110 for defining one or more sets of KPI thresholds that span multiple time frames, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 34110 is performed by a client computing machine. In another implementation, the method 34110 is performed by a server computing machine coupled to the client computing machine over one or more networks.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks). However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media.


Method 34110 may begin at block 34102 when the computing machine may cause display of a GUI to identify a KPI for a service. For example, the GUI may display the name of the KPI (e.g., KPI name 2961 in FIG. 29C), or some other information that identifies the KPI. As discussed above, the KPI may be defined by a search query that produces a KPI value derived from machine data pertaining to one or more entities providing the service. The KPI value may be indicative of a performance assessment for the service at a point in time or during a period of time. The GUI may also display one or more threshold fields (e.g., threshold field 2904 in FIG. 29C) for fields from the entities' machine data that are used to derive a value produced by the KPI search query. One or more thresholds can be applied to the value associated with the threshold field. In particular, the value can be produced by the KPI search query and can be, for example, the value of the threshold field in an event satisfying search criteria of the search query when the search query is executed, a statistic calculated based on one or more values of the threshold field in one or more events satisfying the search criteria of the search query when the search query is executed, a count of events satisfying the search criteria of the search query that include a constraint for the threshold field, etc. For example, the threshold field can be “cpu_load_percent,” which may represent the percentage of the maximum processor load currently being utilized on a particular machine. In other examples, the threshold may be applied to some other fields, such as total memory usage, remaining storage capacity, server response time, network traffic, etc.


At block 34104, the computing machine may receive, via the GUI a user input specifying different sets of KPI thresholds to apply to a KPI value to determine the state of the KPI. The GUI for receiving user input specifying different sets of KPI thresholds may be the same as the GUI that identifies the KPI, or it may be a separate GUI, which may be presented when a user selects, in the GUI identifying the KPI, a button (or any similar UI element) for adding thresholds to the KPI.


Each set of KPI thresholds specified by the user may correspond to a distinct time frame. In one example, there may be three different sets of KPI thresholds. The first set may correspond to a time frame including one or more weekdays or all weekdays. The second set may correspond to a time frame including days of a weekend or a span of time from Friday evening to Monday morning. The third set may include one or more holidays. In another example, one time frame may include working hours (e.g., 9 am-5 pm) and another time frame may include non-working hours (5:01 pm-8:59 am). In yet another example, there may be six different sets of KPI thresholds. The first set may correspond to a time frame including working hours (e.g., 9 am-5 pm) for Monday through Thursday. The second set may correspond to a time frame including non-working hours (5:01 pm-8:59 am) for Monday through Thursday. The third set may correspond to a time frame including working hours for Fridays. The fourth set may correspond to a time frame including non-working hours for Fridays. The fifth set may include weekends, and the sixth set may include holidays.


Each set of KPI thresholds may include multiple thresholds that define multiple states (e.g., critical, non-critical). Each KPI threshold may represent an end of a range of values corresponding to a particular KPI state. Each range may have one or more ends, for example, one end may be based on the minimum value of the range and another end may be based on the maximum value of the range. The range of values corresponding to a particular state may have a specific KPI threshold at each end or may have a KPI at only one end and be open-ended on the other end. For example, a critical state may be defined by a single KPI threshold that identifies one end of the range (i.e., the minimum value) and the other end may not be specified and can extend to cover any value greater than or less than the KPI threshold. In one example, a KPI threshold may define an end that functions as a boundary between KPI states such that a set of three KPI thresholds may define three states. The boundary may define a mutual end between two separate but adjacent ranges that correspond to two different states. In another example, each KPI state may be defined by two KPI thresholds where a first KPI threshold defining the minimum value of the range and the second KPI threshold defining the maximum value of the range. In this case, the KPI ranges may not need to be adjacent and instead may include gaps between states, for example there may be a critically low state and a critically high state with no state therebetween or there may be a default state therebetween (e.g., non-critical).


The GUI for receiving user input may include marks corresponding to one or more KPI thresholds of the sets of KPI thresholds. Each mark may be a graphical representation of a specific KPI threshold from each of the sets of KPI thresholds. The marks may be the same or similar to the marks discussed in regards to FIG. 31A, 34AR or 34AS (e.g., 3717, 3156, 34132A-F) and may be displayed on columns that correspond to each time frame. The GUI may enable a user to manually change existing KPI thresholds by adjusting the marks. The marks and columns will be discussed in more detail in regards to FIG. 34AR.


In some implementations, the user may specify thresholds for the first time frame (e.g., working hours), and then the computing machine may automatically predict, based on prior history, how KPI values during the second time frame (e.g., non-working hours) would differ from KPI values during the first time frame, and suggest thresholds for the second time frame based on the predicted difference. In one example, if average KPI values during the first time frame are 80 percent higher than average KPI values during the second time frame, the computing machine may suggest KPI thresholds for the second time frame that are 80 percent lower than the KPI thresholds specified for the first time frame. The user may then either accept suggested KPI thresholds or modify them as needed. In another example, a suggestion of a KPI threshold for the second time frame may be based on the KPI values within the second time frame without relying on the values within other time frames. In this example, the computing machine may suggest a KPI threshold at a particular percentile of the values in the second time frame (e.g., 75th percentile). In either example, the suggestion may be based on a statistical method such as, percentile, average, median, standard deviation or other statistical technique.


At block 34106, the computing machine may cause the different sets of KPI thresholds to be available for determining a KPI state (e.g., at a later time). This may involve storing the sets of KPI thresholds in a data structure or data store that may be accessible by the machine determining the states of the KPIs. In one example, a client device may be used to set the KPI threshold values and another machine (e.g., server machine) may evaluate the KPI values to determine the state of the KPI. In other examples, any device may be used to define the sets of KPI thresholds. In some implementations, the different sets of KPI thresholds are stored as part of the service definition (e.g., in the same database or file), or in association with the service definition (e.g., in a separate database or file). Using the example illustrated in FIG. 17B, different sets of KPI thresholds can be stored in a service definition structure 1720 as part of a KPI component 1727.



FIG. 34AQ is a flow diagram of an implementation of a method 34112 for determining the states of a KPI based on different sets of KPI thresholds defined for multiple time frames. As discussed above in regards to FIG. 34AP, performance of a service can be assessed using a KPI's values that may change over time. As the KPI values change, they may exceed a specific threshold or fall below a specific threshold, which may cause the state of the KPI to change over time, for example, a KPI may be in a high state for a few hours and then enter a critical state for an hour before entering a low state.


Method 34112 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 34112 is performed by a client computing machine. In another implementation, the method 34112 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 34114, the computing machine may execute a search query against machine data to produce a KPI value indicative of a performance assessment for a service at a point in time or during a period of time. The machine data may be derived from one or more of web access logs, email logs, DNS logs or authentication logs that can be produced by one or more entities providing the service. In one example, executing the search query may involve applying a late-binding schema to a plurality of events having machine data produced by the entities. The late-binding schema may be associated with one or more extraction rules defining one or more fields in the plurality of events.


Next, the computing machine determines the state of the KPI based on the produced KPI value. In order to determine the state of the KPI, the computing machine needs to determine which set of the KPI thresholds should be applied to the produced KPI value. Such a determination involves comparing the point in time or the period of time used for the calculation of the KPI value with different time frames of multiple sets of KPI thresholds. In particular, at block 34116, the computing machine may identify one of the sets of KPI thresholds that correspond to a time frame that covers the point in time or the period of time associated with the KPI value. In one example, the KPI thresholds may have a time frame that corresponds to days of the week (e.g., weekdays, weekends) and the comparison may involve identifying the day of the week associated with the KPI value and comparing the day of the week with the time frames of the sets of KPI values to determine a set whose time frame covers the identified day of the week. In another example, the KPI thresholds may have a time frame that correspond to a specific date (e.g., holiday) and the comparison may involve identifying the date associated with the KPI value and comparing the date with the time frames of the sets of KPI thresholds to determine a set whose time frame matches the identified date. In yet another example, the KPI thresholds may have a time frame that corresponds to times of the day (e.g., 9 am, 5 pm, midnight, afternoon, night) and the comparison may involve identifying the time of the day associated with the KPI value and comparing the time of the day with the time frames of the sets of KPI thresholds to determine a set whose time frame covers the identified time.


In some situations, there may be multiple overlapping sets of KPI thresholds, for example, there may be different sets of thresholds for weekdays, weekends and holidays and the sets may have overlapping time frames. This may occur when there is a weekday set of thresholds and a holiday set of thresholds and a holiday occurs on a weekday. As a result, the time associated with a single KPI value may correspond to two separate sets of KPI thresholds. When this occurs, the computing machine may include a set of rules or an algorithm for selecting a set of KPI thresholds to apply. In one example, the computing machine may defer to the set of KPI thresholds that has the smallest time frame (e.g., most specific time frame). This may involve calculating the total duration of time associated with each of the overlapping sets of thresholds. For example, if one set included each weekday and the other set included each holiday, the computing machine may calculate the total duration covered by the weekday set of thresholds (e.g., 52 weeks×5 days a week equals approximately 260 days) and the holiday set of thresholds (e.g., 10 federal holidays) and determine the holiday set is the set that has the smaller total duration. The computing machine may then select the set of thresholds associated with the smaller duration of time and use the KPI thresholds in the selected set to determine the states corresponding to the KPI values. In other examples, the computing machine may select a set of KPI thresholds based on creation time or modification time of the sets, in which case the newest or oldest set of thresholds may be selected.


At block 34118, the computing machine may select a KPI state for the KPI value from the KPI states that correspond to the set of KPI thresholds identified at block 34116. As discussed above, the KPI thresholds of a set may define multiple ranges and each of the ranges may correspond to a KPI state. Once the appropriate set of thresholds has been identified, the computing machine may compare a specific KPI value with the thresholds of the set to determine which range the value corresponds to (e.g., falls within). For example, a set of KPI thresholds may pertain to web server response delay during a weekday time frame. The set of KPI thresholds may include three threshold values that correspond respectively to an end of a range (e.g., minimum or maximum value) of each of the three KPI states (e.g., low, medium, high). The computing machine may select the KPI state by performing a comparison between ranges of the KPI thresholds and the KPI value produced at block 345114 to determine where the value lies within the multiple ranges. Once a range is identified, the computing device may select the state associated with the range and assign that state to the KPI during the time associated with the KPI value.


At block 34119, the computing machine causes display of a GUI that visually illustrates the selected state of the KPI. The GUI may be, for example, a service-monitoring dashboard GUI or a deep dive KPI visualization GUI that are discussed in more detail below.



FIG. 34AR illustrates an exemplary GUI 34140 for defining sets of KPI thresholds with different time frames, in accordance with one or more implementations of the present disclosure. GUI 34140 may display multiple sets of KPI thresholds, a first set may correspond to a first time frame (e.g., working hours) and a second set may correspond to a second time frame (e.g., non-working hours). Each set may include multiple KPI thresholds that define the ranges of KPI values that correspond to respective states (e.g., critical, warning, normal). GUI 34140 may include a time frame region 34142, a threshold display region 34143, and a visualization region 34144 and multiple buttons 34152A and 34152B. Each of the regions may include multiple GUI elements that may be interrelated in such a manner that a user may select a KPI set in either the time frame region 34143 or visualization region 34144 and the thresholds region 34143 is then updated to display the thresholds that correspond to the selected set. The GUI elements may include input fields divided into several regions to receive various user input and visually illustrate the received input. When multiple sets of KPI thresholds are defined, each set may correspond to a specific row (e.g., 34145A) within time frame region 34142 and may be visually illustrated by a specific column (e.g., 34130A) within visualization region 34144.


Time frame display region 34142 may display multiple rows 34145A and 34145B that correspond to time frames for different sets of KPI thresholds. Each row may include a time frame description field 34146, end time fields 34147A and 34147B and time unit selection 34148. Time frame description field 34146 may provide a field for a user to enter a textual description (e.g., working hours) that may describe the time frame during which the set of KPI thresholds applies. End time fields 34147A and 34147B may indicate the respective start time (e.g., 9 am) and end time (e.g., 5 pm) of the time frame. Time unit selection 34148 may provide a drop down box, which when selected, allows a user to select a unit of time. As shown, a user may select a unit from three options (e.g., times, days, holidays), however in other examples there may be any number of options including any time unit or combination of time units.


Threshold display region 34143 may display the thresholds and corresponding states for the selected time frame (e.g., working hours). As shown, the time frame for working hours may include three states 34149A-C and each state of the KPI may have a name (e.g., critical, warning and normal), and can be represented by a range of values, and a visual indicator. The range of values may be defined by one or more thresholds (e.g., 75, 50, 0) that can provide the minimum value and/or the maximum value of the range of values for the state. The visual indicator uniquely identifies a corresponding state using a visual effect (e.g., distinct color). The characteristics of the state (e.g., the name, the range of values, and a visual indicator) can be edited via input fields of the respective GUI element.


Visualization region 34144 may include one or more columns 34130A and 34130B and one or more markers 34132A-F. Each of columns 34130A and 34130B may correspond respectively to the set displayed in threshold display region 34143 and a row (e.g., 34145A) within time frame region 34142. Selecting a different column (e.g., column 34130B) may update the threshold display region 34143 to show a different set of thresholds and update time frame region 34142 to highlight a different row (e.g., 34145B). As illustrated, column 34130A represents the time frame corresponding to working hours and includes three markers 34132A-C that correspond respectively to states 34149A-C. The space between each marker represents the range of KPI values that correspond to the state. The space between columns 34130A and 34130B illustrates the duration of the time frame for the set of KPI thresholds, namely an eight-hour block that spans from 9 am to 5 pm. The space between column 34130B and the end of the visualization region illustrates the duration of the time frame for another set of KPI thresholds and may be a block (approximately 16 hours) that spans from 5:01 pm to 8:59 am. Although not displayed in the figure, column 34130A may also be displayed at the far right portion of visualization region 34144. This is because the time frames are cyclical and the current duration of time displayed is a full cycle (e.g., 24 hours). Therefore, the end of the cycle is 9 am, which is when the time frame of the first set of KPI thresholds (e.g., working hours) begins.


Addition buttons 34152A and 34152B may be used to initiate a user request to add additional time frames or additional thresholds. In response to a user selecting additional button 34152A, a new row (e.g., 34145B) may be created within time frame region 34142 and a new column (e.g., 34130B) may be created in visualization region 34144. In addition, threshold display region 34143 may be cleared to allow a user to add thresholds using addition button 34152B.


Addition button 34152B may enable a user to add multiple thresholds to the set of KPI thresholds. For example, in response to a user selecting addition button 34152A, a new threshold (e.g., 34149A) may be added to threshold display region 34143. In addition, a new mark may be created on column 34130B in visualization region 34144. The user may then have multiple ways to set the threshold value. One option may involve the user typing a value into the threshold value field 34136. Another option would be for the user to adjust the corresponding marker to slide it up or down on the column. Dragging the marker up the column would increase the threshold value and dragging the marker down the column may decrease the threshold value.


When the user has finished defining the sets of KPI thresholds, the user may exit the GUI. This may add the sets of KPI thresholds to a data store to be accessed when determining the states of KPI values, as discussed in regards to FIG. 34AS.



FIG. 34AS is an exemplary GUI 34240 for displaying the states a KPI over time in view of sets of KPI thresholds. As discussed above, a user may define a set of KPI thresholds for a first time frame (e.g., work hours) and a second set of KPI thresholds for a second time frame (e.g., non-working hours). The system may then use the sets of KPI thresholds to determine which KPI values correspond to which states. GUI 34240 may graphically illustrate the state of each KPI value using a visual indicator (e.g., bar chart overlay).


GUI 34240 may include a graph 34231, states 34249A-C, state indicators 34238A-C, and multiple KPI points 34238A-F that span a time duration. The time duration may be adjusted by the user and may include a portion of a time cycle or one or more time cycles. A cycle may be based on a day, week, month, year or other repeatable duration of time. As shown in GUI 34231, the cycle may be based on a 24-hour period and within the 24 hour period there may be multiple time frames corresponding to the sets of KPI thresholds.


Graph 34231 may be a line chart or line graph or other graphical visualization that displays multiple data points (e.g., KPI values) over time. Graph 34231 may include columns 34230A and 34230B that may each correspond to a set of KPI thresholds and may include markers 34239A-C as discussed in regards to FIG. 34AR.


States 34249A-C may correspond to ranges of KPI values that are separated by KPI thresholds represented in the figure as markers 34239A-C. Each threshold may correspond to a threshold indicator line (e.g., horizontal dotted line 34236A) that indicates the end of a state or a boundary between states. Threshold indicator lines 34236A and 34236B help illustrate time varying static thresholds because threshold indicator lines 34236A and 34236B each correspond to the same state, namely third state 34249C (e.g., critical) and during different time frames the same state may correspond to different threshold values and therefore different ranges. For example, during first time frame 34234A the threshold for the thirds state 34249C corresponds to threshold indicator 34236A (e.g., at 75) and at second time frame 34234B the threshold for the third state 34249C corresponds to threshold indicator 34236B (e.g., at 40).


KPI points 34238A-F may represent KPI values at a point in time or during a period of time. Each of the KPI points 34238A-F may be determined by a search query and may correspond to a KPI state. As discussed above with respect to FIG. 34AQ, method 34240 may be used to determine the KPI value and to determine which state the KPI value corresponds to. Once the state is determined, it may be displayed on graph 34231 using state indicators 34237A-C (e.g., bars of bar chart).


State indicators 34237A-C may visually represent the state of the KPI over time. Each state indicator 34237A-C may correspond to one or more KPI points and may be determined in view of the sets of KPI thresholds and respective time frames. As shown, state indicator 34237A indicates that KPI point 34238A is within a first state (e.g., normal), state indicator 34237B indicates that KPI point 34238B is within a second state (e.g., warning) and state indicator 34237C indicates that KPI point 34238C is within a third state (e.g., critical). The state indicators may include colors, patterns or other visual effects capable of distinguishing the state indicators. The location of the state indicator with respect to the KPI point may vary. In one example the state indicator may overlap the KPI point with the KPI point being in the middle of the upper end of the state indicator, in other examples the KPI point may be the left most point, right most point or other variation.


As discussed herein, the disclosure describes various mechanisms for defining and using time varying static thresholds to determine states of a KPI over different durations of time. The disclosure describes graphical user interfaces that enable a user to define multiple sets of KPI thresholds for different time frames as well as graphical user interfaces for displaying the states of multiple KPI values in view of the multiple sets of KPI thresholds.


Adaptive Thresholding


Adaptive thresholding may be an enhancement to the thresholds discussed above and may enable a user to configure the system to automatically adjust one or more thresholds. As discussed above, thresholds may enable users (e.g., IT managers) to indicate a range of values corresponding to a state and when the KPI value falls within the range, an alert or some other action may be initiated. One or more thresholds may apply to the same KPI or KPIs. For example, a CPU utilization KPI may be associated with a first threshold to indicate that a utilization less than 20% is good, a second threshold at 50% to indicate that a range from 20% to 50% is normal, and a third threshold at 100% to indicate that a range of 50% to 100% is critical. In some implementations, the thresholds may be static thresholds with specific values for the thresholds provided by user input and where the threshold value may remain at that specified value until a different threshold value is provided by user input. In other implementations, the thresholds may be adaptive thresholds and the threshold values may be provided by training processes (e.g., using machine learning techniques) that analyze training data (e.g., historic data of most recent four weeks).


Adaptive thresholding may be used to establish one or more thresholds of one or more time policies. A time policy may identify a time frame and one or more thresholds associated with the time frame. The time frame may be specified by a user, may include one or more separate time blocks and may be based on any unit of time, such as for example, time of the day, days of the week, certain months, holiday, seasons or other duration of time. The time frame may identify continuous blocks of time that occur multiple separate times within a time cycle. Each threshold may be based on a specific KPI value (e.g., numeric value) or a statistical metric related to one or more KPI values (e.g., mean, median, standard deviation, quantile, range, etc.). Adaptive thresholding may involve accessing threshold information of one or more time policies that identify one or more time frames and training data for the one or more time frames. The training data may include KPI values or machine data used for deriving KPI values and may be based on historical data, simulated data, example data or other data or combination of data. The training data may be analyzed to identify variations within the data (e.g., patterns, distributions, trends) and based on the variations, a set of one or more thresholds can be determined for a KPI. Such adaptive thresholding can be dynamic (performed continuously or periodically (e.g., based on schedule, interval or the like) or event driven (e.g., performed in response to a user request).


Adaptive thresholds and static thresholds may be displayed and configured using a graphical user interface (GUI). The GUI may include one or more presentation schedules that may display one or more time frames associated with time policies. Each presentation schedule may include multiple time slots and span a portion of one or more time cycles. Some of the time slots may be associated with a specific time policy and may have a unifying appearance that distinguishes the time slots from time slots associated with other time policies. In one example, the presentation schedule may have a time grid arrangement (e.g., calendar grid view). In another example, the presentation schedule may have a graph arrangement and may include one or more depictions and threshold markers. The depiction may be one or more points, lines, bars, slices or other graphical representation and may illustrate KPI values for a point in time or duration of time. The threshold markers may be graphical display elements that illustrate the current values associated with a threshold and may also function as graphical control elements to enable a user to modify the values.


In one implementation, the GUI may include a listing of time policies and multiple presentation schedules for previewing and configuring threshold information. The listing of time policies may display time policies associated with one or more KPIs and may be integrated with the multiple presentation schedules, such that in response to a user identifying a time policy from the listing, the multiple presentation schedules may be updated to display corresponding threshold information. The multiple presentation schedules may include a first presentation schedule with a time grid arrangement and a second presentation schedule with a graph arrangement. In one example, a user may add a time policy with a time frame of workdays 9 am-5 pm and multiple thresholds (e.g., normal, warning, critical). This may generate a new entry in the listing of time policies, which may default to being the in-focus time policy. In response to a time policy being in focus, the presentation schedule with the time grid arrangement (e.g., calendar view) may display a uniform appearance for time slots associated with Monday through Friday from 9 am to 5 pm and may appear similar to a shaded horizontal bar (e.g., row) spanning the work days. The presentation schedule with the graph arrangement may also update the time slots associated with the time policy to have a uniform appearance and may display a threshold marker for each of the multiple thresholds. Each threshold marker may be positioned based on its value and within the time slots that correspond to its time frame. The user may then preview the details of the new time policy in the presentations schedules.


As will be discussed in more detail below, some aspects of the disclosure describe technology for adaptive thresholding and a graphical user interface for creating and modifying time policies to utilize static and/or adaptive thresholding. FIGS. 34AT through 34AW illustrate example graphical user interfaces and a method of displaying a graphical user interface and FIG. 34AX illustrates an example method of determining and adjusting threshold values using adaptive thresholding, in accordance with some aspects of the present disclosure.



FIG. 34AT illustrates an exemplary GUI 34610 for displaying and configuring threshold information of one or more time policies, in accordance with one or more implementations of the present disclosure. GUI 34610 may include a listing 34615, a presentation schedule 34620 and a graphical visualization 34625.


Listing 34615 may include multiple entries for time policies 34616 and may enable a user to select one or more of the time policies 34616. A time policy may be defined for one or more KPIs and may specify one or more time frames and a set of one or more thresholds associated with the time frames. Each time frame may be associated with a duration of time and may be based on any unit of time, such as for example, time of the day, day of the week, certain months, seasons, holiday or other duration of time. In one example, the time frame may be a contiguous duration of time (e.g., time block). In another example, the time frame may be multiple separate durations of time (e.g., multiple discrete time blocks) and therefore may not be contiguous duration of time. Each threshold of the set of thresholds may correspond to a KPI state and be based on a specific KPI value or a statistical metric pertaining to one or more KPI values (e.g., standard deviation, quantile, range, etc.).


Entries within listing 34615 may be displayed and organized based on a variety of mechanisms. In the example shown, an entry within listing 34615 may represent a time policy by displaying the time frame as textual data (e.g., “Weekdays, 12 am-5 am”). In another example, additional or alternate data associated with the time policy may be displayed, such as a name of the time policy, a quantity of thresholds, one or more of the threshold values or other threshold information. The entries may be organized based on the chronological order of the time frames, for example, weekday 5 am-10 am may be placed above or below weekday 10 am-12 pm depending on whether it is ascending or descending chronological order. In another example, the entries may be organized into groups (e.g., weekdays vs weekends) or in some other manner.


One or more time policies 34616 may be in-focus as illustrated by in-focus time policy 34618. An in-focus time policy may refer to a time policy that is distinguished from the other time policies via one or more visual attributes to indicate that it is a point of focus and may correspond to the information being displayed by presentation schedule 34620 and graphical visualization 34625. The visual attribute may be any visual attribute such as shading, highlighting, outlining, bolding, italicizing, underlining or any other visual indicator that would signify that the time policy is in-focus, for example, that it has been selected by a user. In some implementations, if a time policy includes multiple time frames, all of the time frames of the time policy are presented with an in-focus visual attribute. Alternatively, only one or a subset of the time frames of the time policy can be presented with an in-focus visual attribute. For example, only the most recently added time frame, the longest time frame, the shortest time frame, etc. may be presented with an in-focus visual attribute.


Presentation schedule 34620 may graphically represent the time frames associated with the time policies. Presentation schedule 34620 may include one or more timeslots 34621 displayed in a grid arrangement. Time slots 34621 may be a graphical representation of a continuous duration of time. The grid arrangement may be two-dimensional, three-dimensional or n-dimensional grid arrangement. The grid arrangement may organize timeslots 34621 in rows and columns similar to a matrix. The rows and columns may have different temporal scales and represent different durations of time. For example, the rows may correspond to narrower time blocks (e.g., more temporally granular) and the columns may correspond to broader time blocks (e.g., less temporally granular). In one example, the grid arrangement may be the same or similar to a calendar view, such as a week calendar view, wherein the rows may correspond to hour time blocks and the columns may correspond to daytime blocks. In addition, presentation schedule 34620 may also support a year calendar view, a month calendar view, a weekday calendar view, weekend calendar view, a day calendar view, or other duration of time. Presentation schedule 34620 may display a time cycle 34622 or a portion of one or more time cycles 34622.


Time cycle 34622 may be a repeatable duration of time and may be based on a day, week, month, year or a portion thereof. As shown by presentation schedule 34620, time cycle 34622 may span a week. The time cycle 34622 may be determined by accessing user settings (e.g., preferences) or default settings set by the product designer. The time cycle 34622 may also be determined at runtime based on the in-focus time policy 34618 or one or more time policies 34616 of listing 34615. In one example, the system may analyze all the time policies and determine that some or all of the included time frames are based on a week duration, in which case time cycle 34622 may be set to a week. In another example, the system may determine that the time frames of time policies 34616 cover only the weekdays or only the weekends in which case the time cycle may be set to only the weekdays or only the weekends respectively. In yet another example, if time policies 34616 cover specific days (e.g., holidays), time cycle 34622 may be set to a month or year view with those days highlighted. The time cycle displayed within presentation schedule 34620 may be adjusted (e.g., by zooming in or zooming out) by the user at run time to display more or fewer time slots or to modify the dimensions of the time slots.


Each of the time slots 34621 may represent a continuous duration of time based on any underlying unit of time measurement, such as, seconds, minutes, hours, days, weeks or any portion or variation therefrom. The time slots may vary in dimension between one another such that timeslots during a first portion of a time cycle may have smaller durations and time slots during a different portion of the time cycle may have larger durations. In one example, the duration of each time slot may align with a base time measurement, such a seconds, minutes, hours, days, weeks or may be a portion of the base time measurement. In another example, the duration of each time slot may align with a block of time corresponding to the time frame, such that the duration of time frame and the duration represented by the time slot may be the same (e.g., 5 hr block from 5 am-10 am). One or more time slots 34621 may correspond to a time frame for a time policy and may have a unifying appearance 34623 to illustrate this to the user.


Unifying appearance 34623 may be a visual attribute applied to one or more time slots to distinguish the time slots from time slots that correspond to other time policies. The visual attributes of unifying appearance 34623 may be the same or similar to the visual attribute for the in-focus time policy 34616 and may involve shading, highlighting, outlining, bolding, underlining or any other visual indicator that would signify that the time slots are associated (e.g., grouped) with one another. In the example shown in FIG. 34AT, the time slots associated with in-focus time policy 34618 may be arranged such that uniform appearance 34623 of the time slots may appear similar to a continuous shaded horizontal bar (e.g., shaded row) spanning the work days corresponding to the time frame of the in-focus time policy 34618. In other examples, unifying appearance 34623 may not be contiguous and may include multiple separate time slots that correspond to the same time frame, such as, Monday, Wednesday and Friday nights.


Hover display 34624 may be a popup window or box that appears when a user points an input device to an area associated with a time policy. Such a popup window or box (e.g., a hover box or mouse over) may be of any shape or size and may display graphical or textual information regarding the threshold information or time frame information of a corresponding time policy. For example, the graphical display may be a mouse over displaying the time frame (e.g., time block and repeat schedule) corresponding to the time slots having a unifying appearance. Hover display 34624 may be initiated by the system when the user identifies one or more time slots. A user may identify the one or more time slots by hovering over or selecting one or more time slots using an input device such as a mouse, keyboard, touch sensitive interface or other user input technology.


Graphical visualization 34625 may be the same or similar to the graphs discussed above with respect to FIGS. 30-34AS (e.g., KPI threshold graphs 3431) and may include multiple thresholds and corresponding threshold markers 34626. Graphical visualization 34625 may also include multiple depictions 34627 and one or more statistical metrics 34628.


Depictions 34627 may include a graphical representation of one or more KPI values (individual KPI values, aggregate KPI values or a combination of both). Depictions 34627 may include one or more points, lines, planes, bars (e.g., bar chart), slices (e.g., pie chart) or other graphic representations capable of identifying one or more values of a KPI. In the example shown in FIG. 34AT, depictions 34627 include six separate depictions and each depiction may illustrate the KPI values for one of a plurality of entities (e.g., a server cluster). For example, a first depiction may illustrate a contribution of a first entity to the KPI and a second depiction may illustrate a contribution of a second entity to the KPI. Displaying multiple depictions within the graphical visualization 34625 may be advantageous because it may enable the user to distinguish the performance of one entity from other similar or related entities.


Statistical metrics 34628 may be any measurements relating to the collection, analysis, or organization of data (e.g., live data, training data). The statistical metrics may be used for identifying patterns, trends, distributions or other measurement relating to a set of data and may include, for example, one or more of standard deviations, quantiles or ranges. In the example shown in FIG. 34AT, the statistical metrics may include multiple standard deviations (e.g., 0, 1 and 2 standard deviations). Each statistical metric may be displayed within graphical visualization 34625 to enable the user to visually compare portions of the one or more depictions to the statistical metric. The statistical metric may be displayed using a series of points that span a portion of the graphical visualization. For example, standard deviations 0, 1 and 2 are each displayed using a horizontal dotted line at the corresponding KPI value.


The features discussed above and below may also be configured by the user to accommodate multiple time zones by temporally normalizing the data (e.g., training data, time frames, time slots, depictions, presentation schedules, graphical visualization). The temporal normalization may be based on local time or based on a universal time (Universal Time (UTC)). Temporally normalizing based on local time may involve aligning data corresponding to time zones based on the respective local time of each time zone. For example, depictions 34627 may correspond respectively to entities in different time zones and each depiction may be aligned on the same graph based on local time so that a data point from a specific time (e.g., 5 pm-PST) in one time zone would align with a data point from the same local time (e.g., 5 pm-EST) in a second time zone. Temporally normalizing data based on a universal time may involve aligning the data from different time zones based on a universal time. For example, depictions 34627 may correspond to entities in different time zones and may be aligned on the same graph based on the universal time so that a data point from a specific local time (e.g., 5 pm-PST) in one time zone would align with a data point from a different local time (e.g., 8 pm-EST) of a second time zone. In other examples, training data for a time frame may accommodate different time zones by being temporally normalized to align the training data (e.g., KPI values, machine data) based on local time or a universal time.



FIG. 34AU illustrates an exemplary GUI 34630 for displaying a presentation schedule having time slots in a graph arrangement and one or more depictions of KPI values, in accordance with one or more implementations of the present disclosure. GUI 34630 may include a presentation schedule 34632 that may be similar to presentation schedule 34620 and may include one or more time slots for graphically representing time frames associated with one or more time policies. Presentation schedule 34632 may include one or more time slots 34634, a depiction 34636, a time cycle 34637 and threshold markers 34638A and 34638B. Time slot 34634 may be a graphical representation of a duration of time and may be the same or similar to time slots 34621 of FIG. 34AT. Each time slot 34634 may represent a continuous duration of time based on any underlying unit of time measurement, such as, seconds, minutes, hours, days, weeks or any portion or variation therefrom. One or more time slots 34634 may be arranged in a graph appearance. The graph appearance may have an X-axis (e.g., horizontal axis) and a Y-axis (e.g., vertical axis). The X-axis may represent a range of time and may display a portion of one or more time cycles. The Y-axis may represent a range of KPI values, including KPI values corresponding to threshold values. Both the X-axis and Y-axis may be customized by the user to adjust the range being displayed. For example, the user may use time range control element 34631 to adjust the range of time (e.g., time cycle) displayed along the X-axis. The user may also utilize value range control element 34633 to adjust the range of the KPI values being displayed along the Y-axis.


Presentation schedule 34630 may also include one or more depictions 34636. Depiction 34636 may include a graphical representation of one or more KPI values (i.e., individual or aggregate KPI values or a combination of both). Depiction 34636 may be similar to depictions 34627 of FIG. 34AT and may include one or more points, lines, planes, bars (e.g., bar chart), slices (e.g., pie chart) or other graphic representations capable of identifying one or more values of a KPI. In the example shown in FIG. 34AU, depiction 34636 is a graph line that illustrates variations in KPI values over time (e.g., over time cycle 34637). Depiction 34636 may be continuous throughout the graph arrangement and may overlay one or more time slots or may include discrete points or intervals within the one or more time slots.


The time slots may grouped together into time slot groups (e.g., 34635A-G), which may be a continuous group of time slots. Each time slot group 34635A-F may correspond to a time frame or portion of a time frame and may vary in dimension (e.g., width). For example, a first time slot group may have a thinner width to illustrate a smaller duration of time (e.g., time slot group 34635A) and a second time slot group may have a thicker width to represent a larger duration of time (e.g., time slot group 34635F). Multiple discrete time slot groups may correspond to the same time frame of a time policy. For example, a time frame may cover a time block (e.g., 5 am-10 am) that occurs multiple times (e.g., Monday-Friday) within a time cycle (e.g., week). Each time block of the time frame may be graphically represented by a time slot or a time slot group and may be displayed with a unifying appearance.


Unifying appearance 34639 may be a visual attribute applied to one or more time slots to distinguish them from time slots that correspond to other time policies. Unifying appearance 34639 may be the same or similar to unifying appearance 34623 and may use the same or similar visual attributes. The visual attributes of unifying appearance 34639 may involve shading, highlighting, outlining, bolding, underlining or any other visual indicator that would signify that the time slots or groups of time slots are associated with one another and the time frame of a time policy. In the example shown, each of time slots 34635A-E have a unifying appearance 34639 that includes shading that appears similar to a shaded vertical bar (e.g., shaded column). This may be advantageous because it may indicate to a user that the time frame of the in-focus policy 34618 may correspond to each of time slot groups 34635A-E. Each of the time slot groups 34635A-E may include threshold markers to indicate the corresponding thresholds.


Threshold markers 34638A and 34638B may be included within presentation schedule 34632 and may indicate the values of the thresholds of one or more time policies. Each threshold marker 34638 may be a graphical display element that is positioned at a point within the presentation schedule that indicates its corresponding time frame and threshold value. For example, threshold marker 34638A is positioned at a point along the Y-axis that indicates its threshold value and is positioned along point(s) of the X-axis that indicates the duration of time that that threshold corresponds to (e.g., 5 am-10 am). In one example, the threshold markers 34638A and 34638B may be graphical display elements that also function as graphical control elements and may receive user input to enable a user to adjust the value of a threshold. In another example, the threshold marker may be a static graphical display element that does not provide control functionality to a user.


The quantity of threshold markers for each time slot group may indicate how many thresholds are in the corresponding time policy. In the example shown in FIG. 34AU, each time slot group (e.g., 34635A-F) includes two threshold markers, which indicates that each of the corresponding time policies 34616 includes a set of two thresholds. In other examples, each time slot group may have any number of threshold markers and may include no threshold markers as shown by default time slot group 34635G.


Default time slot group 34635G may be a time slot group that is not associated with a time policy or may correspond to a default time policy. In the example shown in FIG. 34AU, default time slot group 34635G may visually represent a duration of time that is not associated with a time policy and therefore does not display threshold information. In an alternate example, default time slot group 34635G may be associated with a default time policy with one or more thresholds. In this latter example, the thresholds of the default time policy may apply to the KPI without identifying a specific time frame and may only apply when there is no time policy designated for the duration of time. In another example, default time slot group 34356 may be a blank time slot group that is displayed when a time policy is subsequently removed, deactivated, suspended, hidden, or other related action is performed.



FIG. 34AV includes exemplary GUI 34640 for displaying information about the training data such as the quantity of training data and the values of the training data and may assist a user in selecting appropriate training data for establishing one or more thresholds for a time policy. GUI 34640 may include presentation schedule 34642 and training data preview display 34644.


Presentation schedule 34642 may include multiple depictions 34646 A-D corresponding to multiple different durations of training data. Each duration of time may correspond to a user defined or system defined window of time. The training data may be stored KPI values or may be machine data (e.g., time stamped events) that may be used to derive KPI values Either the KPI values or machine data may be stored (e.g., cached) to provide faster access. For example, when the training data includes KPI values, the KPI values may be stored in a summary index discussed above in conjunction with FIG. 29C. The training data may be associated with one or more KPIs and may include the KPI that the thresholds apply to as well as one or more KPIs that are related or similar to the KPI that the threshold applies to. In one example, the user or system may configure the adaptive thresholding to use training data from a defined window of time corresponding to one of the depictions (e.g., 1 week). In another example, the user or system may define a window of time corresponding to one or more depictions (e.g., 2 weeks, 3 weeks, 4 weeks).


Training data from the defined window of time may include a portion of one or more hours, days, weeks, months or other duration of time. In one example, the window may be a fixed duration of time and may include a rolling window relative to the current time. The rolling window may include a window of training data, where new data is added and old data is removed as the window time progresses. In another example, the window of time may dynamically adjust based on any condition related to the training data or user's IT environment. For example, the window may be reduced or enlarged if the quantity of data (e.g., KPI values or machine data) is not within a predetermined range of data, which may be based on a storage or processing capacity of a computing system.


Training data may include historical data, simulated data, example data or a combination thereof. Historical data may include data generated by or about one or more entities in the user's IT environment. In one example, the historical training data may be the most recent historical data relative to the current point in time and may include historical data from a duration of time that includes one or more of the past hour, day, week month or other duration of time. In another example, the historical training data may be from a historical period not immediately preceding the current point in time (e.g., not from the past minute or hour). For example, the historical training data may be based on a past time cycle, such as yesterday or last week.


Simulated data may be similar to historical data but may be generated by a simulation algorithm as opposed to actual data generated by or about an entity of a user's IT environment. The simulation algorithm may be executed by a computing system to generate training data that attempts to mimic data that may be generated by or about one or more entities of the user's IT environment. The simulation algorithm may incorporate one or more features of the user's IT environment, such as features from the KPI definition, entity definition or service definition.


Example data may be similar to historical data and simulated data but may be associated with a different IT environment, KPI, entity or service. In one example, the example training data may be delivered by the software provider (e.g., with the software product). In another example, the training data may be associated with a different KPI and may not be associated with KPI values of a current KPI. This may be advantageous if there is little to no training data for the current KPI, in which case the data associated with a different KPI may be used for training the current KPI (e.g., boot strapping). The different KPI may be similar or related to the current KPI, for example, the current KPI and the different KPI may be defined by search queries that search a similar data source (e.g., log files) or gather data from similar entities (e.g., servers) or relate to the same service.


Presentation schedule 34642 may include depictions 34646A-D for graphically representing multiple portions of the training data. Depictions 34646A-D may include a graphical representation of one or more KPI values (individual values, aggregate values or a combination of both). Each of the depictions 34646A-D may correspond to a different portion (e.g., temporal section) of training data, which may correspond to a portion of one or more windows of time discussed above. In the example shown in FIG. 34AV, each of the depictions 34646A-D may include a series of points that illustrate KPI values for a specific window of time (e.g., week). Depiction 34646A may represent KPI values from a first portion of the training data (e.g., week one) and depiction 34646B may represent KPI values from a second portion of the training data (e.g., week two). Depiction 34646C may represent KPI values from a third portion of the training data (e.g., week three) and depiction 34646D may represent KPI values from a fourth portion of the training data (e.g., week four). Together depictions 34646A-D may represent a month of training data.


Training data preview 34644 may enable a user to view the availability of training data. As discussed above, training processes may analyze training data (e.g., KPI values or machine data) to determine threshold values. Training data preview 34644 may provide a graphical representation of the portion of training data that is available for processing. The graphical representations may include multiple progress bars with different durations (e.g., last day, last three days, last two weeks, last three weeks, last month). Each progress bar may indicate the portion of data available and unavailable within that duration. For example, graphical representation 34648 may be associated with a two week duration and may indicate that three quarters of the duration (e.g., 1.5 weeks) has available training data and that the last quarter does not have available training data. Training data preview 34644 may also provide an indicator (e.g., in the form of an image or text) as to when the training data should be available. For example, the indicator may be a textual message that indicates a date and time when the training data is expected to be available.



FIG. 34AW includes an exemplary GUI 34650 for displaying multiple presentation schedules and multiple graphical control elements for configuring one or more time policies and configuring threshold information, in accordance with one or more implementations of the present disclosure. GUI 34650 illustrates how the graphical components interact with one another and how they may be utilized to create a new time policy and add configuration information for the new time policy. GUI 34650 may include graphical components similar to those shown in FIGS. 34AT and 34AU, such as presentation schedules 34620 and 34632, listing 34615, graphical visualization 34625 and graphical control elements 34652A-D. Graphical control elements 34652A-D may enable a user to create and configure one or more time policies 34616. Graphical control elements 34652A-D may include buttons, drop-down-lists, linked text or other GUI elements and may be configured to display information and receive user input (e.g., mouse, keyboard, or touch input).


Graphical control element 34652A may enable a user to initiate or request the creation of a new time policy. Upon receiving user input, graphical control element 34652A may initiate a GUI (not shown) to enable the user to identify a KPI, a time frame and other information related to the new time policy. Identifying a time frame may involve identifying one or more blocks of time (e.g., 9 am-5 pm), days or points in time when these blocks should apply (e.g., Monday and Friday), and how often the blocks should repeat (e.g., weekly, monthly). In one example, the time policy may be selected from one or more template time policies that may come packaged with a product. The template time policies may include suggested thresholds, suggested time frames and may correspond to one or more user defined or prepackaged KPIs with preconfigured and/or customizable search queries. Once the time policy has been created, it may be added to time policies 34616 of list 34615 and may default to being the current in-focus time policy.


Graphical control element 34652B may enable a user to select whether the one or more time policies 34616 utilizes static thresholding or adaptive thresholding. Static thresholding and adaptive thresholding are techniques for determining and assigning values to thresholds. For static thresholding, the values of the threshold are provided by user input and may remain at that value until a different value for the threshold is provided by user input. For adaptive thresholding, the system may provide the values for the threshold in view of training data and may automatically determine and assign the values when initiated by a user event (e.g., user request) or may automatically determine and assign the values in a dynamic fashion (e.g., continuously or periodically such as based on a schedule, interval, etc.). The process of utilizing adaptive thresholding to determine and assign threshold values is discussed in more detail in regards to FIG. 34AY.


A user may utilize graphical control element 34652B to configure a time policy when it is created or to change the configuration of the time policy at a subsequent point in time. For example, a user may create a time policy and set it to adaptive thresholding. This may allow the system to automatically assign an initial value for the threshold and subsequently adjust the value over time based on training data. Sometime later (e.g., several minutes, hours, days or weeks later) the user may manipulate graphical control element 34652B to transition the time policy from adaptive thresholding to static thresholding to keep the threshold at a constant value or vice versa. This may be advantageous for a user (e.g., IT administrator) because when a user first configures a KPI, the user may not be familiar with the variations of the KPI and may utilize adaptive thresholds to determine the values to assign to a threshold. Once the thresholds have been set, the user may want to keep the thresholds constant to increase the predictability of when actions (e.g., alerts) can be triggered and may therefore utilize or transition to static thresholds.


Graphical control element 34652C may enable a user to add a threshold to a time policy 34616 (e.g., in-focus time policy). Graphical control element 34652C may be configured to receive such a user request and may initiate the creation of a new threshold. In response to the request, the system may determine whether the new threshold should be an adaptive threshold or a static threshold by checking the time policy or other configuration information. If the new threshold is an adaptive threshold, the system may analyze training data to determine a threshold value and may assign the threshold value to the new threshold. If the new threshold is a static threshold, the system may use a value provided by a user or assign a default value to the new threshold. The system may also display a new graphical control element 34652D to indicate that a new threshold has been created.


Graphical control element 34652D may display a threshold and may enable a user to configure the new or previously added threshold. Each graphical control element 34652D may display information for a specific threshold. The information may include the threshold value, a KPI state associated with the threshold value, a visual attribute (e.g., color) corresponding to the KPI state, or other threshold information. The functionality of the graphical control element 34652D (e.g., marker) may relate to or depend on whether the time policy or threshold utilizes static thresholding or adaptive thresholding. For example, each graphical control element 34652D representing a static threshold may be configured to receive user input to adjust the value associated with the threshold whereas each graphical control element 34652D representing an adaptive threshold may be configured to display user input without being adjustable by the user.



FIG. 34AX is a flow diagram of an exemplary method for displaying a graphical user interface including a presentation schedule with one or more time slots, in accordance with one or more implementations of the present disclosure. Method 34670 may also be used to update an existing presentation schedule at runtime to apply a unifying appearance to one or more time slots associated with an in-focus time policy. Method 34670 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 34670 may be performed by a client computing machine. In another implementation, the method 34670 may be performed by a server computing machine coupled to the client computing machine over one or more networks.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks). However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media.


Method 34670 may begin at block 34672 when the computing machine may access stored threshold information for one or more time policies associated with a KPI. The KPI may be defined by a search query that derives a value (e.g., KPI value) from machine data. The value may be indicative of the performance of a service at a point in time or over a period of time and the service may be represented by a stored service definition associating one or more entities that provide the service. Each of the entities may be represented by a stored entity definition that may include an identification of the machine data pertaining to the entity. In one example, the computing system may run the search query defining the KPI to derive the value and may also assign a particular state of the KPI when the value is within a range bounded by one or more thresholds.


Each time policy may identify or be associated with a time frame and at least one threshold. The threshold may define an end of a range of values that may correspond to a KPI state. The time frame may identify one or more durations of time and may be based on any unit of time, such as for example, time of the day, days of the week, certain months, holiday seasons or other duration of time. The time frame may occur one or more times within a time cycle and may apply to prior or subsequent time cycles.


Each time policy may be a static time policy, an adaptive time policy, or a combination thereof. A static time policy may include one or more static thresholds, which may have a value provided by or based on user input and may remain at the value until another value is provided by user input. An adaptive time policy may include one or more adaptive thresholds, which may have a value provided automatically (e.g., without additional user input) by the system based on training data (e.g., historical values of the KPI) and may be automatically adjusted over time by the system. In one example, the threshold information for a KPI may have multiple time policies and at least one of the time policies may be a static time policy and at least one of the time policies may be an adaptive time policy. In another example, all of the time policies associated with a KPI may be static policies or all may be adaptive time policies. A time policy may be a combination of a static time policy and an adaptive time policy if it includes at least one static threshold and at least one adaptive threshold. In one example, a user may configure a time policy with multiple adaptive thresholds (e.g., at 2 standard deviations above and below the mean) and a static threshold at a larger value.


The computing machine may initiate an automatic adjustment of an adaptive threshold based on user input or without user input. The user input may be in the form of a user event (e.g., user request), such as a user initiating the creation of a new threshold via graphical control element 34652C (e.g., “add new threshold”) or by initiating a recalculation of an existing adaptive threshold. An adjustment without user input may be based on a schedule or frequency interval. The schedule may be any time-based schedule, such as a schedule based on an astrological calendar, financial calendar, business calendar or other schedule. The frequency interval may be based on a duration of time, such as a portion of one or more hours, days, weeks, months, seasons, years, time cycles or other time duration. When the schedule or interval indicates that an adjustment may occur, the system may initiate the adaptive thresholding process, which is discussed in more detail in regards to FIG. 34AY.


At block 34674, the computing machine may determine a correspondence between one of the time policies and one or more time slots. The time slots may be included within a presentation schedule and arranged in a grid arrangement (e.g., presentation schedule 34620), graph arrangement (e.g., presentation schedule 34632) or other arrangement. Each time slot in the presentation schedule may represent a continuous duration of time based on any underlying unit of time measurement, such as, seconds, minutes, hours, days, weeks or any portion or variation therefrom. The computing machine may analyze the time frames of the time policies to determine which of the one or more time slots correspond to which time policies, and a time policy with a single time frame (e.g., weekday nights) may correspond to multiple time slots (e.g., Mon-Fri nights).


At block 34676, the computing machine may cause display of a graphical user interface (GUI) including a presentation schedule comprising the one or more time slots, wherein the one or more time slots have a unifying appearance. The unifying appearance of the time slots in the presentation schedule comprises a visual attribute to distinguish the time slots from a time slot that corresponds to another time policy in the presentation schedule. The unifying appearance of the time slots in the presentation schedule may indicate which time slots correspond to an in-focus time policy (e.g., time policy identified based on user input). Each of the time slots in the presentation schedule may also include other visual attributes to distinguish ranges of values corresponding to different KPI states. For example, a single time slot may include multiple visual attributes related to color to indicate multiple ranges of KPI values and each visual attribute may correspond to a KPI state.


The presentation schedule may include a graph (e.g., graph arrangement of time slots) having one or more depictions. In one example, the presentation schedule may include a depiction (e.g., graph line) that represents aggregate KPI values. In another example, there may be multiple depictions and a first depiction may illustrate a contribution of a first entity into the KPI and a second depiction may illustrate a contribution of a second entity into the KPI. In yet another example, the first depiction may correspond to values of the KPI derived from a portion of training data associated with a first time cycle and a second depiction may correspond to values of the KPI derived from a portion of training data associated with a second time cycle.


The presentation schedule may include or be displayed along with one or more graphical control elements that are configured to receive user input to customize the settings of the time policies and threshold information. In one example, the computing machine may receive user input to adjust a marker (e.g., a graphical control element) of a threshold of one of the time policies and the computing machine may update the value of the threshold in view of the user input. In another example, the computing machine may receive a first user input identifying one of the time policies and receive a second user input to change the identified time policy from an adaptive time policy to a static time policy to avoid automatic changes to the thresholds of the identified time policy.


In another example, the GUI may include multiple presentation schedules and a listing of time policies. One of the presentation schedules may have timeslots in a graph arrangement and another presentation schedule may have time slots in a grid arrangement. Each of the presentation schedules may span the same duration of time and display threshold information for a time cycle (e.g., a week) or may each span a different duration, which may or may not be based on a portion of one or more time cycles. For example, the presentation schedule having a grid arrangement may display a portion (e.g., only the weekdays) of a time cycle (e.g., week) and the presentation schedule having a graph arrangement may display multiple time cycles (e.g., a month). The time policy listing may display one or more time policies associated with a KPI and may be configured to receive a selection of one or more time policies. The selection may cause one or more of the presentation schedules to be updated to display threshold information associated with the selected time policy. Conversely, a selection of a time slot in a presentation schedule may cause the corresponding time policy(ies) in the listing to include a visual attribute (e.g., highlighting).


One or more of the presentation schedules may include a hover display that provides threshold information and may be initiated by the system when the user identifies one or more of the time slots. A user may identify the one or more time slots by selecting one or more time slots with an input from a mouse, keyboard, touch gesture or other user input technology. The user may also identify the one or more time slots by hovering over the one or more timeslots using the input technology without selecting any of the timeslots. In one example, the hover display may be a hover box or mouse over of any shape or size and may display graphical or textual information regarding the threshold information or corresponding time policy. For example, the graphical display may be a mouse over displaying information related to the time frame, such as the block of time and occurrences (e.g., 5 am-10 am weekdays).


In addition to the multiple presentations schedules, the GUI may also include a graphical visualization (e.g., graph) having a graph line representing a plurality of values of the KPI over a duration of time. The duration of time may default to the most recent hour of the time frame, however any other durations of time may be used. The graphical visualization may comprise multiple graphical control elements (e.g., user adjustable threshold markers) and a graphical control element enabling a user to add an additional threshold to one of the time policies. In one example, the graphical visualization may have a horizontal axis indicating a duration of time and a vertical axis with one or more markers illustrating one or more thresholds associated with the time policy.


Responsive to completing the operations described above with references to block 34676, the method may terminate.



FIG. 34AW is a flow diagram of an implementation of a method 34680 for utilizing adaptive thresholding to automatically determine one or more or more values for a threshold. Method 34680 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 34680 may be performed by a client computing machine. In another implementation, the method 34680 may be performed by a server computing machine coupled to the client computing machine over one or more networks.


Method 34680 may begin at block 34681 when the computing machine may access information that defines one or more time frames associated with a KPI, each of the time frames may have a set of one or more thresholds. Each threshold may represent the end of a range of values corresponding to a particular state of the KPI and the KPI may be defined by a search query that derives a value indicative of the performance of a service at a point in time or during a period of time. The value may be derived from machine data pertaining to one or more entities that provide the service.


The machine data may be stored as time-stamped events and each time-stamped event may include a portion of raw machine data and may be accessed using a late-binding schema. The machine data may comprise heterogeneous machine data from multiple sources. For example, the machine data pertaining to the entity may include machine data from multiple sources on the same entity or on different entities.


At block 34683, the computing machine may select a time frame from the one or more time frames. The time frames may be associated with one or more time policies which may also specify other threshold related information, such as the quantity of thresholds, the threshold values and associated KPIs. Each time frame may occur multiple times within a time cycle and the time cycle may be based on one or more of a daily time cycle, a weekly time cycle, a monthly time cycle, a seasonal time cycle, a holiday time cycle or other time cycle. For example, a time cycle may be based on a week and the time frame may identify a block of time that occurs every night during the week.


At block 34685, the computing machine may identify training data for the time frame. Training data for a time frame may be identified based on information associated with the time policy. The time policy may identify or be associated with a KPI that may be defined by a search query and the search query may identify one or more data sources and may be associated with a summary index (e.g., cached KPI values). The computing system may utilize this information to identify training data, which may include the location of the training data and a duration of training data. The training data identified may include all training data or training data from a specific duration of time. Training data from a specific duration of time may be based on a window of time such as a portion of one or more hours, days, weeks or months.


Training data for the time frame may be any portion of the training data associated with or related to the time frame. In one example, training data for a time frame may include training data generated during the time frame. For example, the time frame may be weekday nights and the training data may include training data generated during weekday nights. In another example, training data for a time frame may not include training data generated during the time frame. For example, the time frame may include holidays and the training data for the time frame may include only training data from the previous day or week and not training data from the holiday or previous holiday.


The training data may include KPI values or machine data (e.g., time stamped events) that may be used to derive the KPI values. As discussed above, the training data may include historical data, simulated data, example data or a combination thereof. When the training data includes KPI values, the KPI values may be simulated values, historical values, or example values of the KPI. When the training data includes machine data, the training data may be simulated machine data, historical machine data, or example machine data. In one example, the training data may be the most recent historical data and may include data (e.g., machine data or KPI values) corresponding to a specific duration relative to the current time (e.g., yesterday, last week, etc.).


At block 34687, the computing machine may determine one or more thresholds for the time frame in consideration of the identified training data. Determining a threshold may involve identifying a new value to be assigned to a new threshold or to determine a change for an existing threshold value, wherein the change is based on a delta value, a percentage value or an absolute value. Determining the one or more thresholds may involve analyzing the training data, which may include KPI values from one or more KPIs, to determine a statistical metric indicating changes in the training data and updating the set of one or more thresholds for the time frame based on the KPI value corresponding to the statistical metric. The statistical metric may be any measurement for identifying patterns, trends, distributions or other measurement for a set of data and may include one or more of standard deviations, quantiles or ranges. In one example, multiple statistical metrics related to standard deviation may be used (e.g., −2 standard deviation, 0 standard deviation, and +2 standard deviation) and the first statistical metric may be associated with a lower threshold (e.g., informational state), the second statistical metric may be associated with a middle threshold (e.g., warning state) and the third standard deviation may be associated with the highest threshold (e.g., critical state). When the system analyzes the training data, it may determine specific KPI values associated with each of the statistical metrics (e.g., 0 standard deviation corresponds to a value of 75) to be subsequently assigned to each respective threshold.


After determining a value for a threshold, the computing machine may decide whether the value should be assigned to a threshold. The decision may involve determining whether the new value is sufficiently different to warrant assigning it to the threshold. Calculating the difference may involve comparing a new threshold value to a previous threshold value and may be based on an absolute difference, percentage difference or other difference calculation. In one example, the computing machine may withhold assigning the value to the threshold if the difference is below a predefined difference level. In another example, the computing machine may not assign the threshold if the difference exceeds a predefined difference level or range, in which case it may be deemed to be too large of a change and may require approval from a user prior to assigning the value to the threshold.


At block 34689, the computing machine may assign values to the thresholds. Assigning a value to a threshold may involve modifying a time policy to alter the values of one or more of the thresholds. The assignment of values may occur automatically based on a schedule, a frequency interval, or other event (e.g., restart, training data exceeds a storage threshold). Assigning values to the thresholds may involve assigning a first value to a threshold and subsequently assigning a second value to the threshold, wherein the first value and the second value are based on training data from different time durations. Once a value has been assigned to a threshold, the threshold may be utilized to define a particular state (e.g., KPI state) for a KPI value derived by a search query when the value is within a range bounded by the one or more thresholds. The search query may use a late-binding schema to extract values indicative of the performance of the service from time-stamped events after the search query is initiated.


Responsive to completing the operations described above with references to block 34689, the method may terminate.


As discussed herein, some aspects of the disclosure are directed to technology for implementing adaptive thresholding. Adaptive thresholding may enable a user to configure the system to automatically determine or adjust one or more thresholds. Thresholds may enable a user (e.g., IT managers) to indicate values that may initiate an alert or some other action. Adaptive thresholding may involve identifying training data and analyzing the training data to determine a value for a threshold and may occur continuously, periodically (e.g., schedule, interval) or may be initiated by a user. For example, adaptive thresholding may occur every hour, day, week, or month and use historical training data. In addition, some aspects of the disclosure are directed to a GUI for displaying and configuring adaptive and/or static thresholds. The GUI may include one or more presentation schedules that may display one or more time frames associated with the time policies. Each presentation schedule may include multiple time slots and span a portion of one or more time cycles. Some of the time slots may be associated with a specific time policy and may have a unifying appearance that distinguishes the time slots from timeslots associated with other time policies. In one example, the presentation schedule may have a time grid arrangement (e.g., calendar grid view) and in another example, the presentation schedule may have a graph arrangement and may include one or more depictions and graphical control elements. The depiction may be one or more points, lines, bars, slice or other graphical representation and may illustrate KPI values graphical control elements may enable the user to add, configure, or preview the threshold information associated with the time policies.


Anomaly Detection


Anomaly detection may be a feature incorporated into technologies described herein and may enable users (e.g., IT managers) to identify when the values of a KPI reflect anomalous behavior (e.g., an occurrence that is relatively less predictable and/or more surprising than previously received/identified KPI values). That is, it can be appreciated that while in certain implementations defining and/or applying static thresholds to KPI values (e.g., in order to identify KPI values that lie above and/or below such thresholds) may be effective in enabling the identification of unusual behavior, occurrences, etc. In certain circumstances, however, such thresholds may not necessarily identify anomalous behavior/occurrences, such as with respect to the deviation and/or departure of a particular KPI value from a trend that has been observed/identified with respect to prior KPI values, as is described herein. For example, certain machine behavior, occurrences, etc. (as reflected in one or more KPI values) may not necessarily lie above or below a particular threshold. However upon considering a current KPI value in view of various trend(s) identified/observed in prior KPI values (e.g., training data such as historical KPI values, simulated KPI values, etc.), the current KPI value, may nevertheless reflect anomalous behavior/occurrences (in that the current KPI value, for example, deviates/departs from the identified trend).


It should be understood that while in certain implementations the referenced anomalies may correspond to behavior or occurrences as reflected in KPI values that may be greater or lesser than an expected/predicted KPI value (as described in detail below), in other implementations such anomalies may correspond to the absence or lack of certain behaviors/occurrences. For example, in a scenario in which certain KPI values have been observed/determined to demonstrate some amount of volatility, upon further observing/determining that subsequent KPI values are relatively less volatile, such behavior/occurrence can also be identified as anomalous (despite the fact that the KPI value(s) do not fall above or below a particular threshold).


FIG. 34AZ1 illustrates an exemplary GUI 34690 for anomaly detection, in accordance with one or more implementations of the present disclosure. It should be understood that GUI 34690 (as depicted in FIG. 34AZ1) corresponds to a particular KPI (here, ‘ABC KPI 2’), though in other implementations such a GUI may correspond to multiple KPIs, an aggregate or composite of KPIs, etc. GUI 34690 may include activation control 34691 and training window selector 34692. Activation control 34691 can be, for example, a button or any other such selectable element or interface item that, upon selection (e.g., by a user), enables and/or otherwise activates the various anomaly detection technologies described herein (e.g., with respect to a particular KPI or KPIs). Upon activating anomaly detection via activation control 34691, training window selector 34692 can be presented to the user via GUI 34690.


Training window selector 34692 can enable the user to define the ‘training window’ (e.g., a chronological interval) of training data (including but not limited to KPI values or machine data used for deriving KPI values and which may be based on historical data, simulated data, example data or other data or combination of data) to be considered in predicting one or more expected KPI values. It should be understood that training data from a specific duration of time may be based on a window of time such as a portion of one or more hours, days, weeks, months or other duration of time. For example, upon receiving a selection of ‘7 days’ via training window selector 34692, the described technologies can analyze the previous seven days of KPI values for KPI ‘ABC KPI 2,’ in order to predict an expected KPI value for the eighth day. Moreover, in certain implementations, the referenced training window may be a fixed duration of time and may include a rolling window relative to the current time. The rolling window may include a window of training data, where new data is added and/or old data is removed as the window time progresses. In another example, the window of time may dynamically adjust based on any condition related to the training data or user's IT environment. For example, the window may be reduced or enlarged if the quantity of data (e.g., KPI values or machine data) is not within a predetermined range of data, which may be based on a storage or processing capacity of a computing system.


It should be understood that the referenced predicted/expected KPI values can be computed using any number of techniques/technologies. In certain implementations, various time series forecasting techniques can be applied to the referenced training data such as historical KPI values (e.g., the KPI values within the training window selected by the user). Based on the historical KPI values received/identified with respect to the selected training window, a time series forecasting model can be generated. Such a model can be used, for example, to predict one or more expected subsequent KPI value(s) (e.g., an expected KPI value for the eighth day in the sequence). For example, based on KPI values corresponding to a ‘training window’ of the past seven days (reflecting, for example, that CPU usage of a service or one or more entities providing the service increases significantly at 2:00 PM on each of the past seven days), a predicted value can be computed, reflecting the expected/predicted KPI value on the eighth day (reflecting, for example, that CPU usage of the service or one or more entities providing the service is expected to increase significantly at 2:00 PM on the eighth day as well).


In certain implementations, such a model can account for any number of factors, variables, parameters, etc. For example, the model may be configured to account for one or more trends reflected in the training data such as historical KPI values, simulated KPI values, etc. and/or the seasonality (e.g., repeating patterns, such as daily, weekly, monthly, holidays, etc., occurrences) reflected in the training data. Additionally, in certain implementations various aspects of noise and/or randomness can also be accounted for in the model. Examples of the referenced model(s) include but are not limited to exponential smoothing algorithms such as the Holt-Winters model. Such models may also include various smoothing parameters that can define, for example, how loosely or tightly the model is to fit the underlying data. In order to select appropriate smoothing parameters, in certain implementations techniques such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (e.g., Limited-memory BFGS (L-BFGS)) can be employed. In doing so, smoothing parameters can be selected (e.g., with respect to the predictive model, for example, the Holt-Winters model) that are likely to minimize errors with respect to the predicted/expected KPI values. Alternatively, in certain implementations the referenced parameters (e.g., alpha and beta parameters) can be optimized using other technique(s). For example, the referenced parameters can be adjusted using stochastic gradient descent, e.g., at each forecast step. In doing so, prediction error can be minimized. For example, the gradient can be calculated analytically L2-penalized. The learning rate (gamma) can be adjusted (e.g., using AdaGrad), thereby reducing the need for hand-tuning. Being that the optimization problem is non-convex, updates to the referenced alpha and beta parameters can be alternated.


Having computed an expected/predicted KPI value, a comparison can be made (e.g., upon receiving or otherwise identifying the actual KPI value) between the expected/predicted KPI value and its corresponding actual KPI value. By way of illustration, continuing the example provided above, having predicted that CPU usage of a service or one or more entities providing the service is likely to increase significantly at 2:00 PM on the eighth day (as it did on the prior seven days), upon receiving/identifying the actual KPI value for the eighth day, a comparison can be performed between the predicted and actual KPI values, reflecting, for example, that CPU usage of the service or one or more entities actually increased significantly at 6:00 PM on the eighth day (instead of at 2:00 PM as predicted/expected). In doing so, an error value can be computed or otherwise determined. Such an error value can reflect the degree to which the referenced expected/predicted KPI value was (or was not) accurate (i.e., the degree to which the expected/predicted KPI value was relatively close to or distant from the actual KPI value). In certain implementations, those expected/predicted KPI values that are relatively more significantly different or distant from their corresponding actual KPI values can be associated with a relatively larger/higher error score, while those expected/predicted KPI values that are relatively more comparable or close to their corresponding actual KPI values can be associated with a relatively smaller/lower error score.


It should be noted that while various examples provided herein illustrate the described technologies with respect to using the referenced model(s) to predict a subsequent (e.g., future) KPI value (e.g., a value that has not yet actually been generated), and then subsequently comparing the actual KPI value (when it is received) with the value predicted using historical KPI values, in other implementations such a process can be executed using simulated KPI data for such a process. For example, the referenced model(s) can be applied to historical KPI values in order to predict (independent of the actual subsequent KPI value) what would have been expected to be the subsequent KPI value. Such a prediction can then be compared with the actual KPI value that was received/identified. In doing so, historical KPI values can be used to generate a significant number of error values with respect to a KPI, such that the degree to which subsequent error values that are computed are anomalous can be more accurately identified, as is described herein. Alternatively, the referenced comparison(s) can be performed in relation to simulated data. In certain implementations, simulated data may be similar to historical data but may be generated by a simulation algorithm as opposed to actual data generated by or about an entity of a user's IT environment. The simulation algorithm may be executed by a computing system to generate training data that attempts to mimic data that may be generated by or about one or more entities of the user's IT environment. The simulation algorithm may incorporate one or more features of the user's IT environment, such as features from the KPI definition, entity definition or service definition. Moreover, in certain implementations, the referenced comparison(s) can be performed in relation to example data. In certain implementations, example data may be similar to historical data and simulated data but may be associated with a different IT environment, KPI, entity or service. In one example, the example training data may be delivered by the software provider (e.g., with the software product). In another example, the training data may be associated with a different KPI and may not be associated with KPI values of a current KPI. This may be advantageous if there is little to no training data for the current KPI, in which case the data associated with a different KPI may be used for training the current KPI (e.g., boot strapping). The different KPI may be similar or related to the current KPI, for example, the current KPI and the different KPI may be defined by search queries that search a similar data source (e.g., log files) or gather data from similar entities (e.g., servers) or relate to the same service. In certain implementations, a summary index (e.g., cached KPI values) can also be utilized in the referenced comparison(s). Additionally, in certain implementations, value(s) associated with one or more other KPIs can also be utilized in computing an expected/predicted KPI value. For example, in a scenario in which a significant amount of historical KPI values are not available for a particular KPI, one or more other KPIs, such as KPIs that are comparable to, similar to, etc., the referenced KPI, can be utilized in order to compute an expected/predicted KPI value.


Moreover, having computed an error value (reflecting, for example, the degree to which the predicted/expected KPI value was or was not accurate as compared to the corresponding actual KPI value), the position of such an error value within a range of historical errors observed/identified with respect to the same KPI can be computed. That is, it can be appreciated that, based on a particular set of training data such as historical KPI values, simulated KPI values, etc., and/or a time series forecasting model, it may be relatively common for the expected/predicted KPI values to be computed with relatively significant error scores (e.g., in a scenario in which the training data, for example, historical KPI values, does not exhibit identifiable trend(s), thereby creating difficulty in accurately predicting subsequent KPI values). Accordingly, the position of a particular error value within a range of historical error values observed/identified with respect to the KPI can be considered/accounted for in determining whether a KPI value that corresponds to a particular error value is to be considered an anomaly. For example, in a scenario in which significant error values are frequently observed/identified with respect to a KPI (reflecting that the referenced model is often relatively inaccurate in predicting a subsequent KPI value), upon identifying yet another error value (which, for example, has an error score that is relatively comparable to those previously identified errors), such an error value will not be identified as an anomaly, by virtue of the fact that it is relatively consistent with numerous prior errors that have been observed/identified with respect to the KPI. Conversely, in a scenario in which such an error value deviates significantly from prior errors that have been observed/identified with respect to the KPI (reflecting, for example, that the referenced model was significantly less accurate in predicting the expected KPI in the present instance as compared to past instances in which the model was significantly more accurate in predicting the expected KPI), such an error value (and the underlying KPI value(s) that correspond to it) can be identified as an anomaly. Thus, a particular error value (and the underlying KPI value(s) to which it corresponds) can be identified as an anomaly based on, for example, the quantile of the current error value within the history of past error values (e.g., within the selected training window).


At this juncture it should be noted that while in certain implementations the referenced historical error values may be maintained/stored (e.g., in a historical log, database, etc.) as-is (e.g., in their current state/format), in other implementations a data structure such as a digest containing the referenced historical error values can be maintained (e.g., in lieu of the raw historical error values). Examples of such a digest include but are not limited to a t-digest. A t-digest can be a probabilistic data structure that can be used to estimate the median (and/or any percentile) from distributed data, streaming data, etc. In certain implementations, the t-digest can be configured to ‘learn’ or identify various points in the cumulative distribution function (CDF) which may be ‘interesting’ (e.g., the parts of the CDF where the CDF is determined to be changing fastest). Such points may be referred to as centroids (e.g., value, mass). The referenced digest can be configured, for example, to store a summary of the past error history such that the referenced error quantiles can be computed accurately, while obviating the need to maintain large amounts of the actual historical error values. By storing/compressing the referenced error values into a t-digest, various efficiencies can be realized and/or improved, such as with respect to storage and/or processing of such values while also retaining the ability to easily keep the repository of such values up to date. The t-digest can also be easily referenced, such as in order to determine the quantile of the current KPI value, e.g., in order to determine whether a particular error is “unusually large” (that is, anomalous).


FIG. 34AZ2 illustrates an exemplary GUI 34693 for anomaly detection, in accordance with one or more implementations of the present disclosure. GUI 34693 may include search preview selector control 34694, sensitivity setting control 34695, sensitivity setting indicator 34696, alert setting control 34697, and search preview window 34698. Search preview selector control 34694 can be, for example, a drop down menu or any other such selectable element or interface item that, upon selection (e.g., by a user) enables a user to define or select a chronological interval with respect to which those error values (and their corresponding KPI values) that have been identified as anomalies are to be presented (e.g., within search preview window 34698), as described herein.


Sensitivity setting control 34695 can be, for example, a movable slider or any other such selectable element or interface item that, upon selection (e.g., by a user), enables a user to select or define a setting that dictates the sensitivity (e.g., between ‘1,’ corresponding to a relatively low sensitivity and ‘100,’ corresponding to a relatively high sensitivity, the presently selected value of which is reflected in sensitivity setting indicator 34696) with respect to which error values (and their corresponding KPI values) are to be identified as anomalies. That is, as described above, a particular error value (and its underlying KPI value(s)) can be identified as an anomaly based on the degree to which a particular error value deviates from the history of past error values for the KPI (e.g., within the selected training window). Accordingly, the referenced sensitivity setting can dictate/define an error threshold which can be, for example, a threshold by which such deviations are to be considered/identified as anomalies. For example, a sensitivity setting of ‘10’ may correspond to the 10th percentile of the referenced deviations from historical error values. Accordingly, based on such a selection, all those error values that are above the 10th percentile with respect to their deviation from historical error values would be identified as anomalies. By way of further example, a sensitivity setting of ‘99’ may correspond to the 99th percentile of the referenced deviations from historical error values. Accordingly, based on such a selection, only those error values that are above the 99th percentile with respect to their deviation from historical error values would be identified as anomalies. In providing the referenced sensitivity setting control 34695, the described technologies can enable a user to adjust the sensitivity setting (thereby setting a higher or lower error threshold with respect to which error values are or are not identified as anomalies) and to be presented with real-time feedback (via search preview window 34698) reflecting the error values (and their underlying KPI values), as described below.


Alert setting control 34697 can be, for example, a selectable button, checkbox, etc., or any other such selectable element or interface item that, upon selection (e.g., by a user) enables a user to select or define whether or not various alerts, notifications, etc. (e.g., email alerts, notable events, etc., as are described herein), are to be generated and/or provided, e.g., upon identification of various anomalies.


FIG. 34AZ3 illustrates an exemplary GUI 34699 for anomaly detection, in accordance with one or more implementations of the present disclosure. GUI 34699 may include search preview window 34698 (as described with respect to FIG. 34AZ2), KPI value graph 34700, anomaly point(s) 34701, anomaly information 34702, and alert management control 34703. KPI value graph 34700 can be, for example, a graph that depicts or represents KPI values (here, ‘CPU usage’) over the chronological interval defined by search preview selector control 34694 (e.g., the past 24 hours). It should be understood that, in certain implementations, the referenced chronological interval may be adjusted (e.g., zoomed-in, zoomed-out) by the user, e.g., at run time (such as by providing an input via search preview selector control 34694). In doing so, only a portion of the chronological interval may be displayed in search preview window 34698, or alternatively, an additional time period can be added to the chronological interval, and the resulting extended chronological interval can be displayed in search preview window 34698. Anomaly point(s) 34701 can be visual identifiers (e.g., highlighted or emphasized points or graphical indicators) depicted along the graph. The placement of such anomaly points 34701 within search preview window 34698 can reflect the point in time in which the underlying KPI (with respect to which the anomaly was detected) occurred within the chronological interval (e.g., the past 24 hours). For example, the left-most area of search preview window 34698 can correspond to the beginning of the referenced 24-hour period while the right-most area of search preview window 34698 can correspond to the end of the referenced 24-hour period.


As described above, the anomaly point(s) 34701 that are displayed along KPI value graph 34700 are identified based on the sensitivity setting provided by the user (via sensitivity setting control 34695). Accordingly, as the user drags the slider (that is, sensitivity setting control 34695) towards the left, thereby lowering the sensitivity setting (that is, the error threshold by which error values are to be determined to be anomalies with respect to their deviation from historical error values for the KPI), relatively more anomalies are likely to be identified. Conversely, as the user drags the slider (that is, sensitivity setting control 34695) towards the right, thereby raising the sensitivity setting (that is, the error threshold by which error values are to be determined to be anomalies with respect to their deviation from historical error values for the KPI), relatively fewer anomalies are likely to be identified. In doing so, the user can actively adjust the sensitivity setting via sensitivity setting control 34695 and be presented with immediate visual feedback regarding anomalies that are identified based on the provided sensitivity setting.


Anomaly information 34702 can be a dialog box or any other such content presentation element within which further information can be displayed, such as with respect to a particular anomaly. That is, having identified various anomalies (as depicted with respect to anomaly points 34701), it may be useful for the user to review additional information with respect to the identified anomalies. Accordingly, upon selecting (e.g., clicking on) and/or otherwise interacting with (e.g., hovering over) a particular anomaly point 34701, anomaly information 34702 can be presented to the user. In certain implementations, such anomaly information 34702 can include the underlying KPI value(s) associated with the anomaly, the error value, a timestamp associated with the anomaly (reflecting, for example, the time at which the KPI had an anomalous value), and/or any other such underlying information that may be relevant to the anomaly, KPI, etc. In doing so, the user can immediately review and identify information that may be relevant to diagnosing/identifying and/or treating the cause of the anomaly, if necessary.


It should also be noted that, in certain implementations, the referenced anomaly information 34702 dialog box (and/or one or more elements of GUI 34699 can enable a user to provide various types of feedback with respect to various anomalies that have been identified and/or presented (as well as information associated with such anomalies). Examples of such feedback that a user may provide include but are not limited to feedback reflecting that: the identified anomaly is not an anomaly, the identified anomaly is an anomaly, an error value/corresponding KPI value that was not identified as an anomaly should have been identified as an anomaly, an error value/corresponding KPI value that was not identified as an anomaly is, indeed, not an anomaly, the identified anomaly is not as anomalous as reflected by its corresponding error value, the identified anomaly is more anomalous than is reflected by its corresponding error value, the identified anomaly together with one or more nearby (e.g., chronologically proximate) anomalies are part of the same anomalous event, the identified anomaly is actually two or more distinct anomalies, etc. in certain implementations, the referenced feedback may originate from a multitude of sources (similar to the different sources of training data described herein). For example, labeled examples of anomalies and non-anomalies can be gathered from similar but distinct systems or from communal databases.


It should be further noted that while in certain implementations (such as those described herein) the referenced feedback can be solicited and/or received after an initial attempt has been made with respect to identifying anomalies, in other implementations the described technologies can be configured such that a training phase can first be initiated, such as where a user is presented with some simulated or hypothetical anomalies with respect to which the user can provide the various types of feedback referenced above. Such feedback can then be analyzed/processed to gauge the user's sensitivity and/or to identify what types of anomalies are (or aren't) of interest to them. Then, upon completing the referenced training phase, a detection phase can be initiated (e.g., by applying the referenced techniques to actual KPI values, etc.). Moreover, in certain implementations the described technologies can be configured to switch between training and detection modes/phases (e.g., periodically, following some conditional trigger such as a string of negative user feedback, etc.).


Moreover, in certain implementations the described technologies can be configured to detect/identify anomalies in/with respect to different contexts. For example, it can be appreciated that with respect to different user roles, e.g., an IT manager and a security analyst, anomalies identified in one context may not be considered anomalies in another context. Thus, depending on, for example, the role of the user, different anomalies may be identified. In certain implementations, the feedback provided via the slider and/or one of the mechanisms described above can further impact the active context or some subset of contexts (but not other(s)).


Alert management control 34703 can be, for example, a selectable element or interface item that, upon selection (e.g., by a user), enables a user to further manage various aspects of alerts, notifications, etc. (e.g., email alerts, notable events, etc., as are described herein) that are to be generated and/or provided, e.g., upon identification of various anomalies.


FIG. 34AZ4 is a flow diagram of an exemplary method 34704 for anomaly detection, in accordance with one or more implementations of the present disclosure. Method 34704 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 34704 may be performed by a client computing machine. In another implementation, the method 34704 may be performed by a server computing machine coupled to the client computing machine over one or more networks.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks). However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media.


Method 34704 may begin at block 34705 when the computing machine may execute a search query, such as over a period of time. In certain implementations, the referenced search query can be executed repeatedly, such as over a period of time and/or based on a frequency and/or a schedule. In doing so, values for a key performance indicator (KPI) can be produced. In certain implementations, such a search query can define the KPI. The referenced search query can derive a KPI value indicative of the performance of a service at a point in time or during a period of time. Such a value can, for example, be derived from machine data, such as machine data pertaining to one or more entities that provide the service, as is described herein. In certain implementations, such machine data may be produced by two or more sources. Additionally, in certain implementations, such machine data may be produced by another entity. Moreover, in certain implementations, such machine data may be stored as timestamped events (each of which may include a segment of raw machine data). Such machine data may also be accessed according to a late-binding schema.


At block 34706, a graphical user interface (GUI) enabling a user to indicate a sensitivity setting can be displayed. For example, as described herein with respect to FIGS. 34AZ1-34AZ3, upon activating activation control 34691, a sensitivity setting control 34695 can be displayed. As described above, sensitivity setting control 34695 can enable a user to define an error threshold above which, for example, a computed error value (which corresponds to one or more underlying KPI values) is to be identified as an anomaly (and below which such an error is not to be identified as an anomaly). In some implementations, sensitivity setting control 34695 can be a slider.


At block 34707, a user input can be received. In certain implementations, such user input can be received via the GUI (e.g., sensitivity setting control 34695). Moreover, in certain implementations, such input can indicate the sensitivity setting desired by the user (e.g., an error threshold above which a computed error value is to be identified as an anomaly and below which such an error is not to be identified as an anomaly). In some implementations, the user input can be received when the user moves the slider to a certain position.


At block 34708, zero or more of the values can be identified as anomalies. In certain implementations, such values can be identified as anomalies based on a sensitivity setting, such as a sensitivity setting indicated by user input (e.g., via sensitivity setting control 34695).


In certain implementations, in order to identify the referenced values as anomalies, one of the values can be compared, e.g., against a predicted or expected value. In doing so, an error value can be determined. For example, as described above, an expected KPI value can be predicted (e.g., based on historical KPI values, such as a summary index as described herein, simulated KPI values, etc.) and such an expected KPI value can then be compared to the actual subsequent KPI value. The degree to which the expected KPI value deviates/departs from its corresponding actual KPI value can be quantified as an error value. It should be understood that such a predicted value may be based at least in part on (a) one or more values for the KPI that immediately precede the predicted value, (b) a time series forecasting calculation, and/or (c) a frequency domain calculation, such as is described in detail above.


Additionally, in certain implementations having identified the referenced error value, the position of the error value within a range can be determined. Such a range can be, for example, a historical range of error values, each of which corresponds to previous instances of predicting expected KPI values and comparing such values with their corresponding actual KPI values. Accordingly, the position of a particular error value within the referenced range corresponds to how consistent (or inconsistent) a particular error value is as compared to previously computed error values (e.g., for the same KPI). Moreover, in certain implementations, the referenced sensitivity setting can be associated with the referenced range. That is, as described above, the sensitivity setting can define an error threshold within the range (e.g., less than 10%, less than 1%, or any other such value, at or near an end of the range) whereby a computed error value positioned in the portion of the range above the error threshold is to be identified as an anomaly and a computed error value positioned in the portion of the range below the error threshold is not to be identified as an anomaly (for example, the allowed values for the sensitivity setting correspond to a portion of the range). Moreover, in certain implementations such a range can be a quantile range. Such a quantile range can, for example, be represented as a digest of error values, such as may be determined over training data (e.g., training data that includes historic KPI values, such as historic KPI values computed with respect to multiple entities that provide the service).


At block 34709, a GUI that includes information related to the values identified as anomalies. In certain implementations, the information related to the values identified as anomalies can include a count of the anomalies.


Moreover, in certain implementations, a display of a graph that includes information related to zero or more of the values identified as anomalies can be adjusted. In certain implementations, such a display can be adjusted based on the user input indicating the sensitivity setting. For example, as described in detail with respect to FIGS. 34AZ1-34AZ3, upon receiving various sensitivity setting inputs via sensitivity setting control 34695, automatic (without any user input other than the sensitivity setting input) identification of anomalies can be repeated and the graph as displayed in search preview window 34698 can be dynamically adjusted, e.g., with respect to the quantity, position, etc., of various anomaly points 34701 (and their corresponding information).


At block 34710, a notable event can be generated, e.g., for an identified anomaly, such as in a manner described below (e.g., with respect to FIGS. 34O-34Z.)


Correlation Search and KPI Distribution Thresholding


As discussed above, the aggregate KPI score can be used to generate notable events and/or alarms, according to one or more implementations of the present disclosure. In another implementation, a correlation search is created and used to generate notable event(s) and/or alarm(s). A correlation search can be created to determine the status of a set of KPIs for a service over a defined window of time. Thresholds can be set on the distribution of the state of each individual KPI and if the distribution thresholds are exceeded then an alert/alarm can be generated.


The correlation search can be based on a discrete mathematical calculation. For example, the correlation search can include, for each KPI included in the correlation search, the following:

(sum_crit>threshold_crit)&&((sum_crit+sum_warn)>(threshold_crit+threshold_warn))&&((sum_crit+sum_warn+sum_normal)>(threshold_crit+threshold_warn+threshold_normal))


Input (e.g., user input) can be received that defines one or more thresholds for the counts of each state in a defined (e.g., user-defined) time window for each KPI. The thresholds define a distribution for the respective KPI. The distribution shift between states for the respective KPI can be determined. When the distribution for a respective KPI shifts toward a particular state (e.g., critical state), the KPI can be categorized accordingly. The distribution shift for each KPI can be determined, and each KPI can be categorized accordingly. When the KPIs for a service are categorized, the categorized KPIs can be compared to criteria for triggering a notable event. If the criteria are satisfied, a notable event can be triggered.


For example, a Web Hosting service may have three KPIs: (1) CPU Usage, (2) Memory Usage, and (3) Request Response Time. The counts for each state a defined (e.g., user-defined) time window for the CPU Usage KPI can be determined, and the distribution thresholds can be applied to the counts. The distribution for the CPU Usage KPI may shift towards a critical state, and the CPU Usage KPI is flagged as critical accordingly. The counts for each state in a defined time window for the Memory Usage KPI can be determined, and the distribution thresholds for the Memory Usage KPI may also shift towards a critical state, and the Memory Usage KPI is flagged as critical accordingly.


The counts of each state in a defined time window for the Request Response Time KPI can be determined, and the distribution thresholds for the Request Response Time KPI can be applied to the counts. The distribution for the Request Response Time KPI may also shift towards a critical state, and the Request Response Time KPI is flagged as critical accordingly. The categories for the KPIs can be compared to the one or more criteria for triggering a notable event, and a notable event is triggered as a result of each of the CPU Usage KPI, Memory Usage KPI, and Request Response Time KPI being flagged as critical.


Input (e.g., user input) can be received specifying one or more criteria for triggering a notable event. For example, the criteria may be that when all of the KPIs in the correlation search for a service are flagged (categorized) a critical state, a notable event is triggered. In another example, the criteria may be that when a particular KPIs is flagged a particular state for a particular number of times, a notable event is triggered. Each KPI can be assigned a set of criteria.


For example, a Web Hosting service may have three KPIs: (1) CPU Usage, (2) Memory Usage, and (3) Request Response Time. The counts of each state in a defined (e.g., user-defined) time window for the CPU Usage KPI can be determined, and the distribution thresholds can be applied to the counts. The distribution for the CPU Usage KPI may shift towards a critical state, and the CPU Usage KPI is flagged as critical accordingly. The counts of each state in a defined time window for the Memory Usage KPI can be determined, and the distribution thresholds for the Memory Usage KPI can be applied to the counts. The distribution for the Memory Usage KPI may also shift towards a critical state, and the Memory Usage KPI is flagged as critical accordingly. The counts of each state in a defined time window for the Request Response Time KPI can be determined, and the distribution thresholds for the Request Response Time KPI can be applied to the counts. The distribution for the Request Response Time KPI may also shift towards a critical state, and the Request Response Time KPI is flagged as critical accordingly. The categories for the KPIs can be compared to the one or more criteria for triggering a notable event, and a notable event is triggered as a result of each of the CPU Usage KPI, Memory Usage KPI, and Request Response Time KPI being flagged as critical.


Alarm Console—KPI Correlation



FIG. 34B illustrates a block diagram 3450 of an example of monitoring one or more services using key performance indicator(s), in accordance with one or more implementations of the present disclosure. As described above, a key performance indicator (KPI) for a service can be determined based on a monitoring period. For example, a service may have two KPIs (e.g., KPI1 3461A and KPI2 3461B). Each KPI 3461A-B can be set with a monitoring period 3457A-B of “every 5 minutes”, and a value for each KPI 3461A-B can be calculated every 5 minutes, as illustrated in timelines 3451A-B. One implementation of setting a monitoring period via a GUI is described above in conjunction FIG. 29C.


Referring to FIG. 34B, each time a KPI value is calculated for each KPI 3461A-B, the value can be mapped to a state 3455A-B (e.g., Critical (C), High (H), Medium (M), Low (L), Normal (N), and Informational (I)) based on, for example, the KPI thresholds that are set for a particular KPI. The thresholds that map a KPI value to a KPI state may differ between KPIs. For example, a value of “75” may be calculated for KPI1 3461A, and the value “75” may map to a “High” state for KPI1 3461A. In another example, the same value of “75” may be calculated for KPI2 3461BA, but the value “75” may map to a “Critical” state for KPI2 3461B. One implementation for configuring thresholds for a KPI is described above in conjunction with FIG. 31D.


Referring to FIG. 34B, each time a value and corresponding state is determined for each KPI, the KPI value and corresponding KPI state are stored as part of KPI data for the particular KPI in a service monitoring data store. The service monitoring data store can store KPI data for any number of KPIs for any number of services.


A KPI correlation search definition can be specified for searching the KPI data in the service monitoring data store to identify particular KPI data, and evaluating the particular KPI data for a trigger determination to determine whether to cause a defined action. A KPI correlation search definition can contain (i) information for a search, (ii) information for a triggering determination, and (iii) a defined action that may be performed based on the triggering determination.



FIG. 34C illustrates an example of monitoring one or more services using a KPI correlation search, in accordance with one or more implementations of the present disclosure. As described above, the KPI correlation search definition can contain (i) information for a search, (ii) information for a triggering determination, and (iii) a defined action that may be performed based on the triggering determination.


The information for the search identifies the KPI names and corresponding KPI information, such as values or states, to search for in the service monitoring data store. The search information can pertain to multiple KPIs. For example, in response to user input, the search information may pertain to KPI1 3480A and KPI2 3480B. A KPI that is used for the search can be an aspect KPI that indicates how a particular aspect of a service is performing or an aggregate KPI that indicates how the service as a whole is performing. The KPIs that are used for the search can be from different services.


The search information can include one or more KPI name—State value pairs (KPI-State pair) for each KPI that is selected for the KPI correlation search. Each KPI-State pair identifies which KPI and which state to search for. For example, the KPI1-Critical pair specifies to search for KPI values of KPI1 3480A that are mapped to a Critical State 3481A. The KPI1-High pair specifies to search for KPI values of KPI1 3480A that are mapped to a High State 3481B.


The information for the search can include a duration 3477A-B specifying the time period to arrive at data that should be used for the search. For example, the duration 3477A-B may be the “Last 60 minutes,” which indicates that the search should use the last 60 minutes of data. The duration 3477A-B can be applied to each KPI-State pair.


The information for the search can include a frequency 3472 specifying when to execute the KPI correlation search. For example, the frequency 3472 may be every 30 minutes. For example, when the KPI correlation search is executed at time 3473 in timeline 3471, a search may be performed to identify KPI values of KPI1 3480A that are mapped to a Critical State 3481A within the last 60 minutes 3477A, and to identify KPI values of KPI1 3480A that are mapped to a High State 3481B within the last 60 minutes 3477A.


For KPI2 3480B, the search may be performed at time 3473 based on three KPI-State pairs. For example, the search may be performed to identify KPI values of KPI2 3480B that are mapped to a Critical State 3491A within the last 60 minutes 3477B, KPI values of KPI2 3480B that are mapped to a High State 3491B within the last 60 minutes 3477B, and KPI values of KPI2 3480B that are mapped to a Medium State 3491C within the last 60 minutes 3477B.


The information for a trigger determination can include one or more trigger criteria 3485A-E for evaluating the results (e.g., KPIs having particular states) of executing the search specified by the search information to determine whether to cause a defined action 3499. There can be a trigger criterion 3485A-E for each KPI-State pair that is specified in the search information.


The trigger criterion 3485A-E for each KPI-State pair can include a contribution threshold 3483A-E that represents a statistic related to occurrences of a particular KPI state. In one implementation, a contribution threshold 3483A-E includes an operator (e.g., greater than, greater than or equal to, equal to, less than, and less than or equal to), a threshold value, and a statistical function (e.g., percentage, count). For example, the contribution threshold 3483A for the trigger criterion 3485A may be “greater than 29.5%,” which is directed to the number of occurrences of the critical KPI state for KPI1 3480A that exceeds 29.5% of the total number of all KPI states determined for KPI1 3480A over the last 60 minutes. For example, the state for KPI 3480A is determined 61 times over the last 60 minutes, and the KPI correlation search evaluates whether KPI 3480A has been in a critical state more than 29.5% of the 61 determinations. The total number of states in the duration is determined by the quotient of duration and frequency. The total number can be calculated based upon KPI monitoring frequency defined in a KPI definition and search time defined in the KPI correlation search. For example, total=(selected time/frequency time).


In one implementation, when there are multiple trigger criteria pertaining to a particular KPI, the KPI correlation search processes the multiple trigger criteria pertaining to the particular KPI disjunctively (i.e., their results are logically OR'ed). For example, the KPI correlation search can include trigger criterion 3485A and trigger criterion 3485B pertaining to KPI1 3480A. If either trigger criterion 3485A or trigger criterion 3485B is satisfied, the KPI correlation search positively indicates the satisfaction of trigger criteria for KPI1 3480A. In another example, the KPI correlation search can include trigger criterion 3485C, trigger criterion 3485D, and trigger criterion 3485E pertaining to KPI2 3480B. If any one or more of trigger criterion 3485C, trigger criterion 3485D, and trigger criterion 3485E is satisfied, the KPI correlation search positively indicates the satisfaction of trigger criteria for KPI2 3496B.


In one implementation, when multiple KPIs (e.g., KPI1 and KPI2) are specified in the search information, the KPI correlation search treats the multiple KPIs conjunctively in determining whether the correlation search trigger condition has been met. That is to say, the KPI correlation search must positively indicate the satisfaction of trigger criteria for every KPI in the search or the defined action will not be performed. For example, only after the KPI correlation search positively indicates the satisfaction of trigger criteria for both KPI1 3480A and KPI2 3480B will the determination be made that the correlation search trigger condition has been met and defined action 3499 can be performed. Said another way, satisfaction of the trigger criteria for a correlation search is determined by first logically OR'ing together evaluations of the trigger criteria within each KPI, and then logically AND'ing together those OR'ed results from all the KPI's.



FIG. 34D illustrates an example of the structure 34000 for storing a KPI correlation search definition, in accordance with one or more implementations of the present disclosure. A KPI correlation search definition can be stored in a service monitoring data store as a record that contains information about one or more characteristics of a KPI correlation search. Various characteristics of a KPI correlation search include, for example, a name of the KPI correlation search, information for a search, information for a triggering determination, a defined action that may be performed based on the triggering determination, one or more services that are related to the KPI correlation search, and other information pertaining to the KPI correlation search.


The KPI correlation search definition structure 34000 includes one or more components. A component may pertain to search information 34003 or trigger determination information 34011 for the KPI correlation search definition. Each KPI correlation search definition component relates to a characteristic of the KPI correlation search. For example, there is a KPI correlation search name component 34001, one or more record selection components 34005 for the information for the search, a duration component 34007, a frequency component 34009 for the frequency of executing the KPI correlation search, one or more contribution threshold components 34013 for the information for the triggering determination, one or more action components 34015, one or more related services components 34017, and one or more components for other information 34019. The characteristic of the KPI correlation search being represented by a particular component is the particular KPI correlation search definition component's type.


One or more of the KPI correlation search definition components can store information for an element. The information can include an element name and one or more element values for the element. In one implementation, an element name—element value(s) pair within a KPI correlation search definition component can serve as a field name-field value pair for a search query. In one implementation, the search query is directed to search a service monitoring data store storing service monitoring data pertaining to the service monitoring system. The service monitoring data can include, and is not limited to, KPI data (e.g., KPI values, KPI states, timestamps, etc.) and KPI specifications.


In one example, an element name—element value pair in the search information 34003 in the KPI correlation search definition can be used to search the KPI data in the service monitoring data store for the KPI data that has matching values for the elements that are named in the search information 34003.


The search information 34003 can include one or more record selection components 34005 to identify the KPI names and/or corresponding KPI states to search for in the service monitoring data store (e.g., KPI-state pairs). For example, the record selection component 34005 can include a “KPI1-Critical” pair that specifies a search for values for KPI1 corresponding to a Critical state. In one implementation, there are multiple KPI-state pairs in a record selection component 34005 to represent various states that are selected for a particular KPI for the KPI correlation search definition. For example, two states for KPI1 may be selected for the KPI correlation search definition. The record selection component 34005 can include another KPI-state pair “KPI1-High” pair that specifies a search for values for KPI1 corresponding to a High state. In one implementation, a single KPI name can correspond to multiple state values. For example, the record selection component 34005 can include a KPI-state pair “KPI1-Critical,High”. In one implementation, the multiple values are treated disjunctively. For example, a search query may search for values for KPI1 corresponding to a Critical state or a High state. In one implementation, the KPI is continuously monitored and the states of the KPI are stored in the service monitoring data store. The KPI correlation search searches the service monitoring data store for the particular states specified in the search information in the KPI correlation search.


There can be one or multiple components having the same KPI correlation search definition component type. For example, there can be multiple record selection components 34005 to represent multiple KPIs. For example, there can be a record selection component 34005 to store KPI-state value pairs for KPI1, and another record selection component 34020 to store KPI-state value pairs for KPI2. In one implementation, some combination of a single and multiple components of the same type are used to store information pertaining to a KPI correlation search in a KPI correlation search definition.


In one implementation, the search information 34003 includes a duration component 34007 to specify the time period to arrive at data that should be searched for the KPI-state pairs. For example, the duration may be the “Last 60 minutes”, and the KPI states that are to be extracted by execution of the KPI correlation search can be from the last 60 minutes. In another implementation, the duration component 34007 is not part of the search information 34003.


The trigger determination information 34011 can include one or more trigger criteria for evaluating the results of executing the search specified by the search information to determine whether to cause a defined action. The trigger criteria can include a contribution threshold component 34013 for each KPI-state pair in the record selection components 34005. Each contribution threshold component 34013 can include an operator (e.g., greater than, greater than or equal to, equal to, less than, and less than or equal to), a threshold value, and a statistical function (e.g., percentage, count). For example, the contribution threshold 34013 may be “greater than 29.5%”.


The action component 34015 can specify an action to be performed when the trigger criteria are considered to be satisfied. An action can include, and is not limited to, generating a notable event, sending a notification, and displaying information in an incident review interface, as described in greater detail below in conjunction with FIGS. 34O-34Z. The related services component 34017 can include information identifying services to which the KPI(s) specified in the search information 34003 pertain. The frequency component 34009 can include information specifying when to execute the KPI correlation search. For example, the KPI correlation search may be executed every 30 minutes.


A KPI correlation search definition can include a single KPI correlation search name component 34001 that contains the identifying information (e.g., name, title, key, and/or identifier) for the KPI correlation search. The value in the name component 34001 can be used as the KPI correlation search identifier for the KPI correlation search being represented by the KPI correlation search definition. For example, the name component 34001 may include an element name of “name” and an element value of “KPI-Correlation-1846a1cf-8eef-4”. The value “KPI-Correlation-1846a1cf-8eef-4” becomes the KPI correlation search identifier for the KPI correlation search that is being represented by KPI correlation search definition.


Various implementations may use a variety of data representation and/or organization for the component information in a KPI correlation search definition based on such factors as performance, data density, site conventions, and available application infrastructure, for example. The structure (e.g., structure 34000 in FIG. 34D) of a KPI correlation search definition can include rows, entries, or tuples to depict components of a KPI correlation search definition. A KPI correlation search definition component can be a normalized, tabular representation for the component, as can be used in an implementation, such as an implementation storing the KPI correlation search definition within an RDBMS. Different implementations may use different representations for component information; for example, representations that are not normalized and/or not tabular. Different implementations may use various data storage and retrieval frameworks, a JSON-based database as one example, to facilitate storing KPI correlations search definitions (KPI correlation search definition records). Further, within an implementation, some information may be implied by, for example, the position within a defined data structure or schema where a value, such as “Critical”, is stored—rather than being stored explicitly. For example, in an implementation having a defined data structure for a KPI correlation search definition where the first data item is defined to be the value of the name element for the name component of the KPI correlation search, only the value need be explicitly stored as the KPI correlation search component and the element name (name) are known from the data structure definition.



FIG. 34E is a flow diagram of an implementation of a method 34030 for monitoring service performance using a KPI correlation search, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine.


At block 34031, the computing machine causes display of a graphical user interface (GUI) that includes a correlation search portion that enables a user to specify information for a KPI correlation search definition. An example GUI that enables a user to specify information for a KPI correlation search definition is described in greater detail below in conjunction with FIG. 34G.


Referring to FIG. 34E, the KPI correlation search definition can include (i) information for a search, (ii) information for a triggering determination, and (iii) a defined action that may be performed based on the triggering determination. The information for the search identifies KPI values in a data store. Each KPI value is indicative of a KPI state. Each of the KPI values in the data store is derived from machine data pertaining to one or more entities identified in a service definition for a service using a search query specified by a KPI definition associated with the service.


The information for the trigger determination includes trigger criteria. The trigger determination evaluates the identified KPI values using the trigger criteria to determine whether to cause a defined action.


At block 34033, the computing machine causes display of a trigger criteria interface for a particular KPI definition that is specified in the KPI correlation search definition. An example trigger criteria interface is described in greater detail below in conjunction with FIG. 34J.


Referring to FIG. 34E, at block 34035, the computing machine receives user input, via the trigger criteria interface for the particular KPI definition (KPI), selecting one or more states. The KPI can be associated with one or more states. Example states can include, and are not limited to, Critical, High, Medium, Low, Normal, and Informational. The states can be configurable. The trigger criteria interface is populated based on the states that are defined for the particular KPI, for example, via GUI 3100 in FIG. 31A.


Referring to FIG. 34E, at block 34037, the computing machine receives user input specifying a contribution threshold for each selected state via the trigger criteria interface. In one implementation, a contribution threshold includes an operator (e.g., greater than, greater than or equal to, equal to, less than, and less than or equal to), a threshold value, and a statistical function (e.g., percentage, count). For example, the contribution threshold for a particular state may be “greater than 29.5%”.


At block 34039, the computing machine determines whether one or more contribution thresholds are to be specified for another KPI that is included in the KPI correlation search definition. The KPI correlation search definition may specify multiple KPIs (e.g., KPI1 3480A and KPI2 3480B in FIG. 34C).


If one or more contribution thresholds are to be specified for another KPI, the computing machine returns to block 34033 to cause the display of a trigger criteria interface that corresponds to the other KPI, and user input can be received selecting one or more states at block 34035. User input can be received specifying a contribution threshold for each selected state at block 34037.


If no other contribution thresholds are to be specified for another KPI (block 34039), the computing machine stores the contribution threshold(s) as trigger criteria information of the KPI correlation search definition at block 34041. In one implementation, the contribution threshold(s) are stored in contribution threshold components (e.g., contribution threshold components 34013 in FIG. 34D) in a KPI correlation search definition.



FIG. 34F illustrates an example of a GUI 34050 of a service monitoring system for initiating creation of a KPI correlation search, in accordance with one or more implementations of the present disclosure. In one implementation, GUI 34050 is displayed when an item in a list (e.g., list 706 in FIG. 7) to create correlation searches is activated.


GUI 34050 can include a list 34051 of correlation searches that have been defined. GUI 34050 can include a button 34055 for creating a new correlation search. When the button 34055 is activated, a list 34053 of the types of correlation search (e.g. “correlation search”, “KPI correlation search”) that can be created is displayed. A “KPI correlation search” includes searching for specific data produced for one or more KPI's and evaluating that data against a trigger condition so as to cause a predefined action when satisfied. In one embodiment, the “KPI correlation search” in this context of GUI element 34057 includes a search for KPI state values or indicators for one or more KPI's and evaluating that data against a trigger condition specified using state-related trigger criteria for each KPI so as to cause a predefined action, such as posting a notable event, when satisfied. A “correlation search” in the context of GUI element 34053 includes searching for specified data and evaluating that data against a trigger condition so as to cause a predefined action when satisfied, as described in greater detail in conjunction with FIGS. 34O-34Z. When an item 34057 in the list 34053 for creating a KPI correlation search is activated, a GUI for defining a KPI correlation search is displayed, as described below.



FIG. 34G illustrates an example of a GUI 34060 of a service monitoring system for defining a KPI correlation search, in accordance with one or more implementations of the present disclosure. GUI 34060 includes a services portion 34061, a KPI portion 34069, and a correlation search portion 34085. The services portion 34061 includes a list 34067 of services that have been defined, for example, using GUIs of the service monitoring system. In one implementation, the list 34067 is populated using the service definition records that are stored in a service monitoring data store. Each service in the list 34067 can correspond to an existing service definition record. The element value in the name component of the service definition record can be displayed in the list 34067.


In one implementation, the services in the list 34067 are ranked. In one implementation, the ranking of the services in the list 34067 is based on the KPI values of the services in the service monitoring data store. As described above, for each KPI of a service, the KPI values can be calculated for a service based on a monitoring period that is set for the KPI. The calculated KPI values can be stored as part of KPI data in the service monitoring data store. The ranking of the services can be based on, for example, the number of KPI values that are stored for a service, the timestamps for the KPI values, etc. For example, the monitoring period for a KPI may be “every 5 minutes” and the values are calculated for the KPI every 5 minutes. In another example, the monitoring period for a KPI may be set to zero and the KPI values may not be calculated. For example, if Sample Service 34064 has 10 KPIs, but the monitoring period for each of the KPIs has been set to zero, then the values for the 10 KPIs will not have been calculated and stored in the service monitoring data store. Sample Service 34064 will then be ranked below than other services with KPI monitoring periods greater than zero, in the list 34067.


One or more services in the list 34067 can be selected via a selection box (e.g., check box 34063) that is displayed for each service in the list 34067. When a service (e.g., Monitor CPU Load 34062) is selected from the list 34067 via a corresponding check box 34063, dependency boxes 34065 can be displayed for the corresponding selected service. The dependency boxes 34065 allow a user to optionally further specify whether to select the service(s) that depend on the selected service (e.g., Monitor CPU Load 34062) and/or to select the services which the selected service (e.g., Monitor CPU Load 34062) depends upon. As described above, a particular service can depend on one or more other services and/or one or more other services can depend on the particular service.


When one or more services are selected from the list 34067, the KPIs that correspond to the selected services can be displayed in the KPI portion 34069 in the GUI 34060. For example, the KPI “KPI for CPU Load” 34076 corresponds to the selected service “Monitor CPU Load” 34062, and the KPI “Memo Load” 34078 corresponds to the selected service “Check Mem Load on Environment” 34066. When a service is selected from the list 34067 and its “Depends on” or “Impacts” check box is selected, the KPI's that correspond to the services having the indicated dependency relationship with the selected service can be displayed in the KPI portion 34069 in the GUI 34060, as well. The KPI portion 34069 can be populated using data (e.g., KPI definitions, KPI values, KPI thresholds, etc.) that is stored in the service monitoring data store.


The KPI portion 34069 can include KPI data 34071 for the KPIs of the selected services. In one implementation, the KPI data 34071 is presented in a tabular format in the KPI portion 34069. The KPI data 34071 can include a header row and followed by one or more data rows. Each data row can correspond to a particular KPI. The KPI data 34071 can include one or more columns for each row. The header row can include column identifiers to represent the KPI data 34071 that is being presented in the KPI portion 34069. For example, the KPI data 34071 can include, for each row, a column that has the KPI name 34073, a column for the service name 34075 of the service that pertains to the particular KPI, and a column for a KPI health indicator 34077.


The KPI health indicator 34077 for each KPI can represent the performance of the corresponding KPI for a duration specified via button 34079. For example, the duration of the “Last 15 Minutes” has been selected as indicated by button 34079, and the KPI health indicator 34077 for each KPI can represent the performance of the corresponding KPI for the last 15 minutes relative to the point in time when the KPI data 34071 was displayed in the GUI 34060.


In one implementation, GUI 34060 includes a filtering text box to provide an index based case sensitive search functionality to filter out services. For example, if the service name is “Cpu load monitor service,” a user can search using different options, such as “C”. “c”, “cpu”, “Cpu”, “load”, and “cpu load monitor service”. In one implementation, GUI 34060 includes a filtering text box to provide an index based case insensitive search for KPI name, service name and severity name. The text box can support key=value index based case insensitive search. For example for a selected service “Cpu load monitor service” there may be a KPI with named “Cpu percent load,” which is monitored every minute and has state data with low=2, critical=9, high=4. A user can perform a search using for example, a name (KPI or Service)—key value pair. For example l=2 or low=2, can return all KPIs where low=2. In another example, where high=4, the search can return all KPIs where high value is 4.


When button 34079 is activated, for example, to select a different duration, a GUI enabling a user to specify a duration for determining the performance of the KPI is displayed. FIG. 34H illustrates an example GUI 34090 for facilitating user input specifying a duration to use for a KPI correlation search, in accordance with one or more implementations of the present disclosure. When button 34093 is activated, list 34092 can be displayed. The list 34092 can include buttons 34091A-E for selecting a duration for specifying the time period to arrive at data that should be searched for the KPI-state pairs. When button 34091A is selected, a list 30495 of preset durations is displayed. The list 34095 can include durations (e.g., Last 15 minutes) that are relative to the execution of the KPI correlation search and other types of preset durations (e.g., “All time”). For example, the duration that is selected may be the “Last 15 minutes,” which points to the last 15 minutes of data, from the time the KPI correlation search is executed, that should be searched for the KPI-state pairs.


When button 34091B is selected, an interface for defining a relative duration is displayed. The interface can include a text box for specifying a string indicating the relative duration to use. For example, user input can be received via the text box specifying the “Last 3 days” as the duration. When button 34091C is selected, an interface for defining a date range for the duration is displayed. For example, user input can be received specifying the date range between Dec. 18, 2014 and Dec. 19, 2014 as the duration. When button 34091D is selected, an interface for defining a date and time range for the duration is displayed. For example, user input can be received specifying the earliest date/time of Dec. 18, 2014 12:24:00 and the latest date time of Dec. 18, 2014 13:24:56 as the duration. When button 34091E is selected, an interface for an advanced definition for the duration is displayed. For example, user input can be received specifying the duration using search processing language. The selected duration can be stored in a duration component (e.g., duration component 34007 in FIG. 34D) in a KPI correlation search definition.


Referring to FIG. 34G, the KPI portion 34069 can display an expansion button 34068 for each KPI in the KPI data 34071. When an expansion button 34068 is activated, the KPI portion 34069 displays detailed performance data for the corresponding KPI for the selected duration (e.g., Last 15 minutes).



FIG. 34I illustrates an example of a GUI 34100 of a service monitoring system for presenting detailed performance data for a KPI for a time range, in accordance with one or more implementations of the present disclosure. GUI 34100 can correspond to KPI portion 34069 in FIG. 34G. Referring to FIG. 34I, GUI 34100 can include an expansion button (e.g., expansion button 34101) for each KPI in the GUI 34100. When an expansion button 34101 is activated, the GUI 34100 displays a detailed performance interface 34105 in association with the KPI health indicator 34107 for the particular KPI (e.g., “KPI for CPU Load” 34103) for the duration 34108 (e.g., “Last 60 Minutes”). The detailed performance interface 34105 displays detailed information about KPI performance corresponding to the indicator 34107.


The detailed performance interface 34105 can include a list 34115 of states that have been defined for the particular KPI. In one implementation, the states in the list 34115 are defined for the particular KPI via GUIs in FIGS. 31A-C described above. Referring to FIG. 34I, in one implementation, the states are displayed in a color that corresponds to a color that was defined for the particular state when the KPI thresholds for the particular KPI were defined.


The detailed performance interface 34105 can include a statistic 34117 for each state in the list 34115, which corresponds to the occurrences of a specific KPI state over duration 34108. For example, the KPI “KPI for CPU Load” 34103 may have a monitoring period of every one minute, and the value for the KPI “KPI for CPU Load” 34103 is calculated every minute. The statistic 34117 (e.g., “61”) indicates how the KPI “KPI for CPU Load” 34103 performs during time period 34108 of “Last 60 Minutes,” which shows that the KPI has been in a Medium state 61 times over the time period 34108 of “Last 60 Minutes.” The total for the counts in the list 34115 corresponds to the number of calculations performed according to the monitoring period (e.g., every minute) of the KPI during time period 34108 (e.g., for the last 60 minutes) specified for the KPI correlation search.


The detailed performance interface 34105 can include an open KPI search button 34111, which when selected displays a search GUI presenting the search query defining the KPI. The detailed performance interface 34105 can include an edit KPI button 34109, which when selected can display a GUI for editing the definition of the particular KPI. The detailed performance interface 34105 can include a deep dive button 34113, which when selected can display a GUI for presenting a deep dive visualization for the particular KPI.


Referring to FIG. 34G, one or more KPIs in the KPI portion 34069 can be selected for the KPI correlation search definition. Each KPI in the KPI portion 34069 can have a selection box 34081 and/or a selection link 34083 for selecting individual KPIs. The KPI portion 34069 can include a bulk selection box 34072 for selecting all of the KPIs in the KPI portion 34069. A bulk action link (e.g., add to selection link 34070A, view in deep dive link 34070B) can be activated to apply an action (e.g., select for KPI correlation search definition, view in deep dive) to the selected KPIs.


The one or more KPIs that have been selected from the KPI portion 34069 can be used to populate the correlation search portion 34085, as described in greater detail below. In one implementation, when one or more KPIs have been selected from the KPI portion 34069, a trigger criteria interface for a particular KPI is displayed. In one implementation, the trigger criteria interface for the first selected KPI in the KPI portion 34069 is displayed. For example, if the KPI “KPI for CPU Load” 34076 and the KPI “Mem Load” 34078 have been selected, the trigger criteria interface for the KPI “KPI for CPU Load” 34076 is displayed, as described below in conjunction with FIG. 34J.



FIG. 34J illustrates an example of a GUI 34120 of a service monitoring system for specifying trigger criteria for a KPI for a KPI correlation search definition, in accordance with one or more implementations of the present disclosure. In response to a KPI being selected from the KPI portion (e.g., KPI portion 34069 in FIG. 34G), the correlation search portion 34137 is updated to display the selected KPI(s). In one implementation, also in response to a KPI being selected from the KPI portion, a trigger criteria interface 34121 for a particular selected KPI is displayed. In one implementation, trigger criteria interface 34121 is displayed in the foreground and the correlation search portion 34137 is displayed in the background.


The trigger criteria interface 34121 enables a user to specify triggering conditions for the particular KPI to trigger a defined action (e.g., generate a notable event, send notification, display information in an incident review interface, etc.). The trigger criteria interface 34121 can display, for each state defined for the particular KPI, a selection box 34123, a slider bar 34125 with a slider element 34127, an operator indicator 34129, a value text box 34131, a statistical function indicator 34133, and a state identifier 34135.


In one implementation, when the trigger criteria interface 34121 is first displayed, for example, in response to a user selection of the particular KPI, the trigger criteria interface 34121 automatically displays the information reflecting the current performance of the states for the particular KPI based on the selected duration 34139 (e.g., Last 60 minutes). For example, the performance of the KPI as illustrated by indicators 34141A and 34141B can be presented in the trigger criteria interface 34121. For example, the trigger criteria interface 34121 may initially only display the information in portion 34143 indicating that the KPI was in the Low state 100% for the last 60 minutes. A user may use the currently displayed data as a contribution threshold for the particular state.


User input selecting one or more states can be received, for example, via the selection box 34123, slider element 34127, and value text box 34131 for a particular state. A contribution threshold can be specified for each selected state via user interaction with the trigger criteria interface 34121, as described in greater detail below.



FIG. 34K illustrates an example of a GUI 34150 of a service monitoring system for specifying trigger criteria for a KPI for a KPI correlation search definition, in accordance with one or more implementations of the present disclosure. The trigger criteria interface 34151 displays user selection of two trigger criteria 34167A-B, for the particular KPI, that correspond to the High state and the Critical state respectively.


For each selected state, user input of a contribution threshold can be received. The user input can include an operator (e.g., greater than, greater than or equal to, equal to, less than, and less than or equal to), a threshold value, and a statistical function (e.g., percentage, count). The user input for the operator can be received via an operator indicator 34159, which when selected can display a list of operators to select from. For example, a greater than (e.g., “>”) operator has been selected.


The user input of the statistical function to be used can be received via a statistical function indicator 34163, which when selected can display a list of statistical functions (e.g. percent, count, etc.) to select from. For example, the percentage function has been selected.


The user input for the threshold value can be received, for example, via a value entered in the text box 34161 and/or via a slider element 34157. In one implementation, when a user slides the slider element 34157 across a corresponding slider bar 34155 to select a value, the corresponding value can be displayed in the corresponding text box 34161. In one implementation, when a user provides a value in the text box 34161, the slider element 34157 is moved (e.g., automatically without any user interaction) to a position in the slider bar 34155 that corresponds to the value. (Text box 34161 and slider control element 34157 are, accordingly, operatively coupled.) For example, the value “29.5” has been selected. In one embodiment, slider bar 34155 appears in relationship with an actuals data graph bar. The actuals data graph bar depicts a value determined from actual data for the associated KPI in the associated state over the current working time interval (e.g. the “Last 60 minutes” of 34139 of FIG. 34J). The actuals data graph bar can be narrower or wider than the slider bar, appear in front of or behind the slider bar, be centered on axis with the slider bar, be visually distinct from the slider bar (e.g. a darker, lighter, variant, or different color, or have a different pattern, texture, or fill than the slider bar), and have the same scaling as the slider bar.


In one implementation, when a trigger criterion has been specified for a particular state, one or more visual indicators are presented in the trigger criteria interface 34151 for the particular state. For example, the contribution threshold for the Critical state may be “greater than 29.5%”, and the contribution threshold for the High state may be “greater than 84.5%”, and visual indicators are displayed for the two trigger criteria 34167A-B that have been specified.


For example, for the Critical state, the trigger criteria interface 34151 can present the selection box 34153 as being enabled, the slider bar 34155 as having a distinct visual characteristic to visually represent a corresponding value using a scale of the slider bar 34155, the slider element 34157 as being shaded or colored, an operator indicator 34159 as being highlighted, a value being displayed in a text box 34161, a statistical function indicator 34163 being highlighted, and/or a state identifier 34165 being highlighted. The distinct visual characteristic for the slider bar 34155 can be a color, a pattern, a shade, a shape, or any combination of color, pattern, shade and shape, as well as any other visual characteristics.


In one implementation, when multiple trigger criteria are specified for a particular KPI, the trigger criteria are processed disjunctively. For example, the trigger criteria of the KPI can be considered satisfied if either the KPI is in the Critical state more than 29.5% within the duration (e.g., Last 60 minutes) or the KPI is in the High state more than 84.5% within the duration.


GUI 34150 can include a save button 34169, which when activated, can display another trigger criteria interface 34151 that corresponds to another KPI, if another KPI has been selected for the KPI correlation search. If no other KPIs have been selected for the KPI correlation search, a GUI for creating the KPI correlation search based on the KPI correlation search definition is displayed.



FIG. 34L illustrates an example of a GUI 34170 of a service monitoring system for creating a KPI correlation search based on a KPI correlation search definition, in accordance with one or more implementations of the present disclosure. GUI 34170 can be displayed in response to a user activating a save button (e.g., save button 34169 in FIG. 34K) in a trigger criteria interface. The correlation search portion 34179 in the GUI 34170 can display information for the KPIs (e.g., KPI 34181A, KPI 34181B) that are part of the KPI correlation search definition.


The information for each KPI can include the name of the KPI, the service 34183 which the KPI pertains to, KPI performance indicator 34187, and a trigger criteria indicator 34189A for the particular KPI. The correlation search portion 34179 can include a selection button 34171 and/or a link 34173 for each KPI for receiving user input specifying that the selected KPI should be removed from the KPI correlation search definition.


The trigger criteria indicators 34189A-B for a particular KPI can display the number of trigger criteria that has been specified for the KPI. For example, KPI 34181A may have two trigger criteria (e.g., Critical state more than 29.5% within the duration, High state more than 84.5% within the duration).


In one implementation, the trigger criteria indicators 34189A-B are links, which when selected, can display a corresponding trigger criteria interface (e.g., trigger criteria interface 34121 in FIG. 34J) for the particular KPI to enable a user to edit the trigger criteria.


The correlation search portion 34179 can include summary information 34175 that includes the information for a trigger determination for the KPI correlation search to determine whether to cause a defined action (e.g., generate notable event, sending a notification, display information in an incident review interface). The summary information 34175 can include the number of KPIs that are specified in the KPI correlation search definition and the total number of trigger criteria for the KPI correlation search.


As described above, in one implementation, when there are multiple trigger criteria that pertain to a particular KPI, the trigger criteria are processed disjunctively. For example, if one of the two triggers that have been specified for KPI 34181A are satisfied, then the trigger criteria for KPI 34181A are considered satisfied. If any one of the three triggers that have been specified for KPI 34181B are satisfied, then the trigger criteria for KPI 34181B are considered satisfied.


In one implementation, when there are multiple KPIs that are specified in the KPI correlation search definition, the multiple KPIs are treated conjunctively. Each KPI must have at least one trigger criteria satisfied in order for all of the triggering criteria that are specified in the KPI correlation search definition to be considered satisfied. For example, when any of the two trigger criteria for KPI1 34181A is satisfied, and any of the three trigger criteria for KPI2 34181B is satisfied, then the trigger condition determined using five trigger criteria is considered satisfied for the KPI correlation search, and a defined action can be performed. If none of the two trigger criteria for KPI1 is satisfied 34181A or none of the three trigger criteria for KPI2 34181B is satisfied, then the trigger condition for the KPI correlation search is considered as not being satisfied.


The correlation search portion 34179 can include a create button 34177, which when activated displays a GUI for creating the KPI correlation search as a saved search based on the KPI correlation search definition that has been specified using, for example, GUI 34170.



FIG. 34M illustrates an example of a GUI 34200 of a service monitoring system for creating the KPI correlation search as a saved search based on the KPI correlation search definition that has been specified, in accordance with one or more implementations of the present disclosure. The defined KPI correlation search can be saved as a saved search that can be executed automatically based on, for example, a user-selected frequency (e.g., every 30 minutes) 34211. When a saved search is created for the defined KPI correlation search, a search query of the KPI correlation search will be executed periodically, and the search result set that is produced by the search query of the KPI correlation search can be saved. An action can be performed based on an evaluation of the search result set using the trigger criteria for the KPI correlation search.


A user (e.g., business analyst) can provide a name 34203 for the KPI correlation search, optionally a title 34205 for the KPI correlation search, and optionally a description 34207 for the KPI correlation search. In one implementation, when a title 34205 is specified, the title 34205 is used when an action is performed. For example, if no title 34205 is specified, the name 34203 can be displayed in an incident review interface if an action of displaying information in the incident review interface has been triggered. In another example, if a title 34205 is specified, the title 34205 can be displayed in an incident review interface if an action of displaying information in the incident review interface has been triggered. In another example, if a title 34205 is specified, the title 34205 can be included in the information of a notable event that is posted as the result of the trigger condition being satisfied for the KPI correlation search.


User input can be received via a selection of a schedule type via a type button 34209A-B for executing the KPI correlation search. The type can be a Cron schedule type or a basic schedule type. For example, if the basic schedule type is selected, user input may be received, via a button 34210, specifying that the KPI correlation search should be performed every 30 minutes. When button 34210 is activated a list of various frequencies is displayed which a user can select from. GUI 34200 can automatically be populated with the duration 34213 (e.g., Last 60 minutes) that is selected for example, via button 34079 in FIG. 34G.


Referring to FIG. 34M, user input can be received for assigning a severity level to an action that is performed from the KPI correlation search via a list 34215 of severity types. For example, if the action is to display information in an incident review interface, and the selected severity is “Medium”, when the action is performed, the severity “Medium” will be displayed with the information for the KPI correlation search in the incident review interface. Similarly, if the action is to post a notable event, and the severity selected is “Medium,” information for the notable event will include an indication of the “Medium” severity, when the action is performed.


In one implementation, default values for schedule type and severity are displayed. The default values can be configurable. User input can be received via button 34201 for storing the definition of the KPI correlation search. The KPI correlation search definition can include the parameters that have been specified via GUI 34200 and can be stored in a structure, such as structure 3400 in FIG. 34D.


Graphical User Interface for Adjusting Weights of Key Performance Indicators


Implementations of the present disclosure provide an aggregate KPI that spans multiple services and a graphical user interface that enables a user to create and configure the aggregate KPI. The aggregate KPI may characterize the performance of one or more services and may be displayed to the user as a numeric value (e.g., score). The graphical user interface may enable a user to select KPIs of one or more services and to set or adjust the weights (e.g., importance) of the KPIs. The weight of each KPI may define the influence that the KPI has on a calculation of an aggregate KPI value.


The graphical user interface may include multiple display components for configuring the aggregate KPI. Some of the display components may illustrate existing services and their corresponding KPIs and may enable the user to select some or all of the KPIs. Another display component may display the selected KPIs and provide graphical control elements (e.g., sliders) to enable the user to adjust the weight(s) of one or more of the KPIs. The user may adjust the weight to a variety of values including, for example, values that cause the KPI to be excluded from an aggregate KPI calculation, values that cause the KPIs to be prioritized over some or all of the other KPIs, and so on. The graphical user interface may also display an aggregate KPI value (e.g., health score) and may dynamically update the aggregate KPI value as the user adjusts the weights. This may provide near real-time feedback on how adjustments to the weights affect the aggregate KPI value. This may be advantageous because it may enable the user to adjust the weights of the KPIs to more accurately reflect the influence the constituent KPIs should have on characterizing the overall performance of the service(s).



FIG. 34NA illustrates an example of a graphical user interface (GUI) 34300 for selecting KPIs and adjusting the weights of KPIs, in accordance with some implementations. GUI 34300 may include a services display component 34310, a KPI display component 34320 and a weight adjustment display component 34330. Services display component 34310 may display services that exist in a user's IT environment and may enable the user to select one or more services of interest. Services display component 34310 may include a list 34312 with services 34314A-E. In one example, the list 34312 may be populated using the service definition records that are stored in a service monitoring data store. One or more services in the list 34312 may be selected via a selection box (e.g., check box 34316) that is displayed for each service in the list 34312. When a service (e.g., Machine Resources) is selected from list 34312 via a corresponding check box 34316, dependency boxes may be displayed for the corresponding selected service. The dependency boxes allow a user to optionally further specify whether to select the service(s) that depend on the selected service (e.g., impacted services) and/or to select the services which the selected service depends upon (e.g., impacting services). As described above, a particular service may depend on one or more other services and/or one or more other services may depend on the particular service. These services may be obtained from a service definition of the particular service or determined in view of one or more service definitions, such as a service definition of the particular service and the service that depends on the particular service. That is to say, options exist for where and how to record, reflect, or represent in storage the defined dependencies between and among services represented by service definitions. When one or more services are selected from list 34312, the KPIs that correspond to the selected services can be displayed in the KPI display component 34320.


KPI display component 34320 may display multiple KPIs and may enable the user to select some or all of the KPIs associated with the services selected in services display component 34310. KPI display component 34320 may include KPIs 34322A-C and display KPI data for each KPI. In one example, KPI data may be presented in a table that may include a header row and one or more data rows. Each data row may correspond to a particular KPI. The table may include one or more columns for each row. The header row can include column identifiers to represent the KPI data in the respective columns. For example, the table may include, for each row, a column for the KPI name, a column for the service name of the service that pertains to the particular KPI, and a column for a KPI health indicator. As discussed above, a KPI health indicator can represent the performance of the particular KPI over a certain duration. The KPI data may be referenced by the user when determining which KPIs to select for inclusion within an aggregate KPI.


Weight adjustment display component 34330 may display the KPIs selected by the user and may provide a mechanism for the user to adjust the weights of the KPIs and display a resulting aggregate KPI value. Weight adjustment display component 34330 may include aggregate KPI value 34332, weights 34334A-C and graphical control elements 34336A-C. Aggregate KPI value 34332 may be a numeric value (e.g., score), non-numeric value, alphanumeric value, symbol, or the like, that may characterize the performance of one or more services. In one example, the aggregate KPI value 34332 may be used to detect a pattern of activity or diagnose abnormal activity (e.g., decrease in performance or system failure). Aggregate KPI value 34332 may be determined in view of weights 34334A-C, which may indicate the importance or influence a particular KPI has on a calculation of the aggregate KPI. Weights 34334A-C may be considered when calculating the aggregate KPI value for the services and a KPI with a higher weight may be considered more important or have a larger influence on the aggregate KPI value than other KPIs. The weights of the KPIs may be adjusted by the user by manipulating graphical control elements 34336A-C. Each of graphical control elements 34336A-C may correspond to a specific KPI and may be used to adjust a weight of a specific KPI.


Changes to any of the display components discussed above (e.g., 34310, 34320 and 34330) can cause respective changes to the other display components. In one example, GUI 34300 may receive a first user selection that identifies a subset of services from a list of services within an IT environment. In response to the first selection, GUI 34300 may display a list of KPIs associated with the one or more selected services within KPI display component 34320. GUI 34300 may then receive a second user selection of a subset of the KPIs in the KPI display component 34320. In response to the second selection, GUI 34300 may display one or more user-selected KPIs and graphical control elements in the weight adjustment display component 34330. The functionality of weight adjustment component 34330 is discussed in more detail below, in regards to FIG. 34NB.



FIG. 34NB illustrates an example of a weight adjustment display component (e.g., GUI 34300) that enables a user to adjust the weights of KPIs and illustrates the effect of the adjustment on the aggregate KPI value, in accordance with some implementations. GUI 34300 may include graphical control elements 34436A-C that each correspond to one of KPIs 34434A-C and may display the weight of each KPI relative to the other KPIs. Graphical control elements 34436A-C may be any graphical control element capable of displaying and modifying a weight value. As shown, graphical control elements 34436A-C may be similar to a slider or track bar control element and may enable a user to set a value by moving an indicator element 34437 along an axis, such as a horizontal axis or vertical axis. In other examples, the graphical control elements 34436A-C may include display fields and arrows, which may enable a user to increment or scroll through different values. In yet another example, each of the graphical control elements 34436A-C may include a field that accepts keyboard input and the user may provide a weight by typing a corresponding value (e.g., a numeric value) into the field. When a weight is adjusted, it may be included in the service definition specifying the KPIs, or in a separate data structure together with other settings of a KPI.


The weights displayed by the graphical control elements 34436A-C may be assigned automatically (e.g., without any user input) or may be based on user input or a combination of both. For example, weights may be automatically assigned when graphical control elements 34436A-C are initiated (e.g., default values, historic values) and the user may subsequently adjust the weights. A weight may be automatically assigned based on characteristics of the KPI. In one example, a KPI deriving its value from machine data of a single entity may be automatically assigned a lower weight than a KPI deriving its value from machine data pertaining to multiple entities. Alternatively or in addition, a KPI may be automatically assigned a higher or lower weight based on the frequency in which the search query defining the KPI is executed. For example, a higher weight may be assigned to a KPI that is run more frequently or vice versa.


The weights may also be assigned (e.g., adjusted) based on user input of one or more values within a weight range 34438. As shown in FIG. 34NB, graphical control element 34436C may include an indicator element 34437 to receive a user request to assign or adjust a weight of a KPI. Upon selecting indicator element 34437, a user may position (e.g., drag) the indicator element 34437 to a specific weight within weight range 34438. In one example, the weight range may include values from one to ten and a higher value may indicate a higher importance of the KPI relative to the other KPIs selected to represent the service(s).


Weight range 34438 may include an exclusion value 34439A and a priority value 34439B within its range. In one example, the range may extend from 0-11 and the exclusion value 34439A may be a minimum value (e.g., 0) and the priority value 34439B may be a maximum value (e.g., 11). Though generally shown and discussed as such for ease of illustration, an embodiment is not limited to a weight range of continuous values, numeric, alphabetic, or otherwise. Exclusion value 34439A may be a value that causes the corresponding KPI to be excluded from a calculation of the value of the aggregate KPI. The priority value 34439B may be a value that causes the corresponding KPI to override one or more of the other KPIs selected to represent the services. A weight having priority value 34439 may indicate the status (e.g., state) of the corresponding KPI should be used to represent the overall status of the aggregate KPI. In one example, there may be only one particular KPI that has a weight at the priority value at which point the values of only the particular KPI and no other KPIs may be used to calculate the value of the aggregate KPI. In another example, there may be multiple particular KPIs that have a weight at the priority value at which point only one of the particular KPIs may be selected and used for the calculation of the aggregate KPI. The selection may be based on a variety of factors such as the states, values, frequency, or how recent the KPI value has been determined. For example, the multiple particular KPIs may be analyzed and the KPI with the highest state (e.g., critical, important) may be the particular KPI selected to calculate the value of the aggregate KPI. In this latter example, the priority value may not cause a KPI to be used for the aggregate KPI calculation because there may be another KPI set with the priority value but it may still cause the KPI to override other KPIs that are not set to the priority value. Accordingly, a priority weight value may indicate priority in the sense of overriding dominance, preeminence, exclusivity, preferential treatment, or eligibility for the same.


Aggregate KPI value 34432 may be a numeric value (e.g., score) that may be calculated based on the user-selected weights to better characterize performance of one or more services. In some implementations, an aggregate KPI value 34432 may also be based on impact scores of relevant KPIs. As discussed in more detail above, an impact score of a KPI can be based on a user-selected weight of the KPI and/or the rating associated with a current state of the KPI. In particular, calculating an aggregate KPI value 34432 may involve one or more of: determining KPI values for the KPIs; determining impact scores using the KPI values; weighting the impact scores; and combining the impact scores. Each of these steps will be discussed in more detail below.


Determining the KPI values for the KPIs may involve deriving the values by executing search queries or retrieving previously stored values from a data store. Each KPI value may indicate how an aspect of a service is performing at a point in time or during a period of time and may be derived by executing a search query associated with the KPI. As discussed above, each KPI may be defined by a search query that derives the value from machine data associated with the one or more entities that provide the service. The machine data may be identified using a user-created service definition that identifies the one or more entities that provide the service. The user-created service definition may also identify information for locating the machine data pertaining to each entity. In another example, the user-created service definition may also identify, for each entity, information for a user-created entity definition that indicates how to identify or locate the machine data pertaining to that entity. The machine data associated with an entity may be produced by that entity and may include for example, and is not limited to, unstructured data, log data and wire data. In addition or alternatively, the machine data associated with an entity may include data about the entity, which can be collected through an API for software that monitors that entity.


Determining the KPI values may also or alternatively involve retrieving previously stored values from a data store. In one example, the most recent values for each respective KPI may be retrieved from one or more data stores. The values of each of the KPIs may be from different points in time. This may be, for example, because each KPI may be based on a frequency of monitoring assigned to the particular KPI and when the frequency of monitoring for a KPI is set to a time period (e.g., 10 minutes, 2 hours, 1 day) a value for the KPI is derived each time the search query defining the KPI is executed. Different KPIs may have different frequencies so the most recent value of one KPI may be from a different time than the most recent value of a second KPI.


Once the KPI values have been determined, an impact score may be determined using a variety of factors including but not limited to, the weight of the KPI, one or more values of the KPI, a state of the KPI, a rating associated with the state, or a combination thereof. In one example, the impact score of each KPI may be based on the weight and the corresponding KPI value (e.g., Impact Score of KPI=(weight)×(KPI value)). In another example, the impact score of each KPI may be based on both the weight of a corresponding KPI and the rating associated with a current state of the corresponding KPI. (e.g., Impact Score of KPI=(weight)×(rating)×(KPI value)). In other examples, the impact score of each KPI may be based on the rating associated with a current state of a corresponding KPI and not on the weight (e.g., Impact Score of KPI=(rating)×(KPI value)) and the weight may or may not be used in another step.


The aggregate KPI value may be calculated by combining the one or more impact scores. The combination may involve multiplication, division, summation, or other arithmetic operation or combination of operations such as those that involve deriving a mean, median or mode, or performing one or more statistical operations. In one example, the combining may involve performing an average of multiple individually weighted impact scores.



FIGS. 34NC and 34ND depict flow diagrams of exemplary methods 34800 and 34900 for adjusting the weights of KPIs associate with an aggregate KPI that spans one or more IT services, in accordance with some implementations. Method 34800 describes a machine method of effecting a graphical user interface (e.g., weight adjustment component 34330), which enables a user to adjust weights for multiple KPIs of an aggregate KPI with feedback, in an automated service monitoring system. Method 34900 describes a machine method that embraces the graphical user interface and enables a user to select multiple KPIs that span multiple different services and adjust the weights of the selected KPIs. Methods 34800 and 34900 may be performed by processing devices that may comprise hardware (e.g., circuitry, dedicated logic), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. Methods 34800 and 34900 and each of their individual functions, routines, subroutines, or operations may be performed by one or more processors of a computer device executing the method.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks, steps). Acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media. In one implementation, methods 34800 and 34900 may be performed to produce a machine GUI as shown in FIGS. 34NA and 34NB


Referring to FIG. 34NC, method 34800 may be performed by processing devices of a server device or a client device and may begin at block 34802. At block 34802, the processing device may determine multiple KPIs associated with one or more services selected by a user. Each of the plurality of KPIs may be defined by a search query and may indicate an aspect of how a service provided by one or more entities is performing at a point in time or during a period of time. The search query may derive a value for the respective KPI from machine data produced by one or more entities that provide the one or more services. Each entity of the one or more entities may correspond to an entity definition having an identification of machine data from or about the entity and one or more of the services may be represented by a service definition that references the entity definition.


At block 34804, the processing device may cause for display a GUI that displays a plurality of key performance indicators (KPIs) and graphical control elements (e.g., slider-type control elements) for the KPIs. The KPIs displayed may be only a subset of the KPIs associated with the one or more services selected by a user. For example, the user may be able to review all of the KPIs associated with one or more services and may determine that only a subset of the KPIs reflects the performance of the services. A user may make the determination by using information illustrated by the GUI, such as the KPI values and states (e.g., critical, warning, info).


The graphical control elements displayed within the GUI may enable the user to adjust the weights of one or more of the KPIs. Each graphical control element may accept weights from a range of values. In one example, a user may use the graphical control element to adjust the weight of the respective KPI to an exclusion value that causes the respective KPI to be excluded from a calculation of the value of the aggregate KPI. The exclusion value may be any value within a range of potential weighting values, such as a minimum value (e.g., 0, 1, −1). In another example, the graphical control element may enable the user to adjust the weight of a respective KPI to a priority value that causes the respective KPI to override other KPIs when calculating the value of the aggregate KPI. The value of the aggregate KPI may be calculated based on only one of the KPIs that has the priority value, which may be a maximum value associated with a range of weighting values.


At block 34806, the processing device may cause for display within the graphical user interface a value of an aggregate KPI that is determined in view of the weights and values of one or more of the KPIs. In one example, the values of the KPIs may be determined by retrieving a most recent value for each of a plurality of KPIs from a data store and the most recent value for a first KPI and the most recent value for a second KPI may be derived from different time periods. In another example, the values of the KPIs may be derived by executing search queries defining each of the one or more KPIs. The search query may derive the value for the KPI by applying a late-binding schema to events containing raw portions of the machine data and using the late-binding schema to extract an initial value from machine data.


In addition to displaying a value of the aggregate KPI, the GUI may also display a state corresponding to the aggregate KPI or states corresponding to the KPIs. The state of a constituent KPI or aggregate KPI may correspond to a range of values defined by one or more thresholds. The states are discussed in more detail in regards to FIGS. 30-32 and may include an informational state, a normal state, a warning state, an error state, or a critical state. At any instant in time, a constituent KPI or aggregate KPI may be in one of the states depending on the range in which its value falls in. The GUI may display the state using at least one visual indication such as a textual label (e.g., “critical,” “medium”), a symbol (e.g., exclamation point, a shape) and/or a color (e.g., green, yellow, red). The GUI may display the state of the aggregate KPI, the state of the constituent KPIs or a combination of both. In one example, the GUI may display a state corresponding to the value of the aggregate KPI that includes the label “critical” when the value of the respective KPI exceeds a threshold value.


At block 34808, the processing device may determine whether it has received a user adjustment of the weight of a KPI via a corresponding graphical control element. The graphical control elements may be configured to initiate an event when the graphical control element is adjusted by a user. The event may identify the adjustment as a new value (e.g., 7.1) or a difference (e.g., change) in values (e.g., +2.5 or −1.7).


At block 34810, the processing device may modify, in response to the user adjustment, the value of the aggregate KPI in the GUI to reflect the adjusted weight. In one example, the aggregate KPI may be recalculated using the newly adjusted weight applied against the same KPI values used for a previous calculation. In another example, the aggregate KPI may be recalculated using the newly adjusted weights along with updated KPI values. As discussed in more detail above in regards to FIG. 34NB, the aggregate KPI may be calculated using multiple different formulas and may incorporate different factors. For example, the aggregate KPI value may be based on a weighted average of values from KPIs of multiple different services. Responsive to completing the operations described herein with references to block 34810, the method may terminate.


Referring to FIG. 34ND, method 34900 may be similar to or subsume method 34800 and may include the generation of a correlation search for the aggregate KPI based on the user-selected weights. Method 34900 may begin at block 34902, wherein the processing device may determine values for a plurality of key performance indicators (KPIs) associated with multiple different services. Each KPI may indicate a different aspect of how one of the plurality of services is performing at a point in time or during a period of time and may be defined by a search query that derives a value for the respective KPI from the machine data associated with the one or more entities that provide the plurality of services. The machine data associated with an entity may be produced by that entity. In one example, the machine data may include unstructured log data. The unstructured data may include continuous or string data, and positions, formats, and/or delimitations of semantic data items may vary among instances of corresponding data segments (e.g., entries, records, posts, or the like). In another example, the machine data associated with an entity includes data collected through an API (application programming interface) for software that monitors that entity.


At block 34904, the processing device may receive a plurality of weights for the plurality of KPIs. As discussed above, the weights may be received via graphical user interface 34400. In other examples, the weights may be received from a command line interface or from updates to one or more of a service definition, entity definition, KPI definition, or any configuration data (e.g., configuration record or configuration file) or a combination thereof.


At block 34906, the processing device may calculate a value of an aggregate KPI for the plurality of services in view of the weights and values of one or more of the KPIs. The aggregate KPI value may be a numeric value (e.g., score) that may be calculated based on the user-selected weights to better characterize activity (e.g., performance) of the plurality of services. As discussed above in regards to FIG. 34NB, calculating an aggregate KPI value may involve one or more of: (1) determining KPI values for the KPIs; (2) determining impact scores using the KPI values; (3) weighting the impact scores; (4) and combining the impact scores. In one example, calculating the value of the aggregate KPI may include performing a weighted average of values for each of the plurality of KPIs selected by the user to be associated with the aggregate KPI.


At block 34908, the processing device may receive a user adjustment of the weight of a KPI, which may result in a modification of the value of the aggregate KPI. This block is similar to block 34810 discussed above.


At block 34910, the processing device may receive a user indication to notify (e.g., alert) the user when the value of the aggregate KPI exceeds a threshold, such as a threshold associated with a critical state. In one example, the user indication may be the result of a user selecting a button to create a correlation search. The alert may be advantageous because it may be configured to identify a pattern of interest to a user and may notify the user when the pattern occurs. In response to receiving the user indication, the method may proceed to 34912.


At block 34912, the processing device may create a new correlation search to generate a notification based on a plurality of user-selected KPIs and respective user-selected weights. Creating the correlation search may include storing the correlation search in a definition data store of the service monitoring system. The correlation search may execute periodically to calculate the aggregate KPI based on the user-selected KPIs and user-selected KPI weights. The correlation search may include triggering criteria to be applied to the aggregate KPI and an action to be performed when the triggering criteria is satisfied. The processing device may utilize the triggering criteria to evaluate a value of the aggregate KPI. This may include comparing an aggregate KPI value to a threshold and causing generation of a notification (e.g., alert) based on the comparison. In one example, it may generate an entry in an incident-review dashboard based on the comparison. Responsive to completing the operations described herein above with references to block 34912, the method may terminate.


As discussed herein, the disclosure describes an aggregate key performance indicator (KPI) that spans multiple services and a GUI to configure an aggregate KPI to better characterize the performance of the services. The GUI may enable a user to select KPIs and to adjust weights (e.g., importance) associated with the KPIs. The weight of a KPI may affect the influence a value of the KPI has on the calculation of an aggregate KPI value (e.g., score). The GUI may provide near real-time feedback concerning the effect the weights have on the aggregate KPI value by displaying the aggregate KPI value (e.g., score) and updating the aggregate KPI value as the user adjusts the weights.


Incident Review Interface


Implementations of the present disclosure are described for providing a GUI that presents notable events pertaining to one or more KPIs of one or more services. Such a notable event can be generated by a correlation search associated with a particular service. A correlation search associated with a service can include a search query, a triggering determination or triggering condition, and one or more actions to be performed based on the triggering determination (a determination as to whether the triggering condition is satisfied). In particular, a search query may include search criteria pertaining to one or more KIPs of the service, and may produce data using the search criteria. For example, a search query may produce KPI data for each occurrence of a KPI reaching a certain threshold over a specified period of time. A triggering condition can be applied to the data produced by the search query to determine whether the produced data satisfies the triggering condition. Using the above example, the triggering condition can be applied to the produced KPI data to determine whether the number of occurrences of a KPI reaching a certain threshold over a specified period of time exceeds a value in the triggering condition. If the produced data satisfies the triggering condition, a particular action can be performed. Specifically, if the data produced by the search query satisfies the triggering condition, a notable event can be generated.


A notable event generated by a correlation search associated with a service can represent anomalous incidents or patterns in the state(s) of one or more KPIs of the service. In one implementation, an aggregate KPI for a service can be used by a correlation search to generate notable events. Alternatively or in addition, one or more aspect KPIs of the service can be used by the correlation search to generate notable events.


As discussed above, a graphical user interface is presented that allows a user to review notable events or other incidents created by the system. This interface may be referred to herein as the “Incident Review” interface. The Incident Review interface may allow the user to view notable events that have been created. In order to focus the user's review, the interface may have controls that allow the user to filter the notable events by such criteria as severity, status, owner, name, service, period of time, etc. The notable events that meet the filtering criteria may be displayed in a results section of the interface. A user may select any one or more of the notable events in the result section to edit or delete the notable event, view additional details of the notable event or take subsequent action on the notable event (e.g., view the machine data corresponding to the notable event in a deep dive interface). Additional details of the Incident Review interface are provided below.



FIG. 34O is a flow diagram of an implementation of a method of causing display of a GUI presenting information pertaining to notable events produced as a result of correlation searches, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 34500 is performed by a client computing machine. In another implementation, the method 34500 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 34501, the computing machine performs a correlation search associated with a service provided by one or more entities that each have corresponding machine data. The service may include one or more key performance indicators (KPIs) that each indicate a state of a particular aspect of the service or a state of the service as a whole at a point in time or during a period of time. Each KPI can be derived from the machine data pertaining to the corresponding entities. Depending on the implementation, the KPIs can include an aggregate KPI and/or one or more aspect KPIs. A value of an aggregate KPI indicates how the service as a whole is performing at a point in time or during a period of time. A value of each aspect KPI indicates how the service in part (i.e., with respect to a certain aspect of the service) is performing at a point in time or during a period of time. As discussed above, the correlation search associated with the service may include search criteria pertaining to the one or more KPIs (i.e., an aggregate KPI and/or one or more aspect KPIs), and a triggering condition to be applied to data produced by a search query using the search criteria.


At block 34503, the computing machine stores a notable event in response to the data produced by the search query satisfying the triggering condition. A notable event may represent a system occurrence that is likely to indicate a security threat or operational problem. Notable events can be detected in a number of ways: (1) an analyst can notice a correlation in the data and can manually identify a corresponding group of one or more events as “notable;” or (2) an analyst can define a “correlation search” specifying criteria for a notable event, and every time one or more events satisfy the criteria, the system can indicate that the one or more events are notable. An analyst can alternatively select a pre-defined correlation search provided by the application. Note that correlation searches can be run continuously or at regular intervals (e.g., every hour) to search for notable events. Upon detection, notable events can be stored in a dedicated “notable events index,” which can be subsequently accessed to generate various visualizations containing security-related information. As discussed above, the creation of a notable event may be the resulting action taken in response to the KPI correlation search producing data that satisfies the defined triggering condition. In addition, a notable event may also be created as a result of a correlation search (also referred to as a trigger-based search), that does not rely on a KPI, or the state of the KPI or of the corresponding service, but rather operates on any values produced in the system being monitored, and has a triggering condition and one or more actions that correspond to the triggering condition.


At block 34505, the computing machine causes display of a graphical user interface presenting information pertaining to a stored notable event. The presented information may include an identifier of the correlation search that triggered the storing of the notable event and an identifier of the service associated with the correlation search. In other implementations, the graphical user interface may present additional information pertaining to the stored notable event, and may receive user input to modify or take action with respect to the notable event, as will be described further below.



FIG. 34PA illustrates an example of a GUI 34550 presenting information pertaining to notable events produced as a result of correlation searches, in accordance with one or more implementations of the present disclosure. In one implementation GUI 34550 includes a filtering controls section 34560 and a results display section 34570. Results section 34570 displays one or more notable events and certain information pertaining to those notable events. Filtering controls section 34560 includes numerous controls that allow the user to filter the notable events displayed in results section 34570 using certain filtering criteria. Certain elements of filtering controls section 34560 also provide high-level summary information for the notable events, which the user can view at a glance. In one implementation, filtering controls section 34560 includes severity chart 34561, status field 34562, name field 34563, owner field 34564, search field 34565, service field 34566, time period selection menu 34567, and timeline 34568.


Severity chart 34561 may visually differentiate (e.g., using different colors) between different severity levels and include numbers of notable events that have been categorized into different severity levels. The severity levels may include, for example, “critical,” “high,” “medium,” “low,” “info,” etc. In one implementation, the number corresponding to each of the severity levels in severity chart 34561 indicates the number of notable events that have been categorized into that severity level out of all notable events that meet the remaining filtering criteria in filtering controls section 34560. During creation of a KPI correlation search, a corresponding severity level may be defined such that if the data produced by the search query satisfies the triggering condition, the resulting notable event will be categorized into the defined severity level. In addition, different triggering conditions may be associated with different severity levels. In one implementation, each severity level in severity chart 34561 may be selectable to filter the notable events displayed in results section 34570. When one or more severity levels in severity chart 34561 are selected, the notable events displayed in results section 34570 may be limited to notable events having the selected severity level(s).


Status field 34562 may receive user input to filter the notable events displayed in results section 34570 by status. In one implementation, status field 34562 may include a drop down menu from which the user can select one or more status values. One example of drop down menu 34569 is shown in FIG. 34PB.


Referring to FIG. 34PB, the available options for filtering the status of a notable event in drop down menu 34569 may include, for example, “all,” “unassigned,” “new,” “in progress,” “pending,” resolved,” “closed,” or other options. During creation of a KPI correlation search, a default initial status may be defined such that if the data produced by the search query satisfies the triggering condition, the resulting notable event will be assigned an initial status (e.g., “new”). In addition, different initial status values may be associated with different notable events. In one implementation, a notable event may be edited in GUI 34550 in order to update or modify the current status. For example, if an analyst is assigned to investigate a particular notable event to determine its cause or whether additional action is needed, the status of a notable event can be updated from its initial status (e.g., “new”) to a different status (e.g., “pending” or “resolved”) to reflect the current situation.


Referring again to FIG. 34PA, name field 34563 may receive user input to filter the notable events displayed in results section 34570 by name and/or title. During creation of a KPI correlation search, a name and/or title of the KPI correlation search may be defined such that if the data produced by the search query satisfies the triggering condition, the resulting notable event will be associated with that name. When the notable event is stored, one piece of associated information is the name of the correlation search from which the notable event is generated. Multiple notable events that are generated as a result of the same correlation search may then be given the same name, although they may have different timestamps to allow for differentiation. Accordingly, the notable events can be filtered by name in response to user input from name field 34563.


Owner field 34564 may receive user input to filter the notable events displayed in results section 34570 by owner. In one implementation, owner field 34564 may include a drop down menu from which the user can select one or more possible owners. During creation of a KPI correlation search, the owner of the KPI correlation search may be defined such that if the data produced by the search query satisfies the triggering condition, the resulting notable event will be associated with that owner. The owner may include for example, the name of an individual who created the correlation search, the name of an individual responsible for maintaining the service, an organization or team of people, etc. When the notable event is stored, one piece of associated information is the owner of correlation search from which the notable event is generated. Multiple notable events that are generated as a result of the same correlation search (or different correlation searches) may then have the same owner. Accordingly, the notable events can be filtered by name in response to user input from owner field 34564.


Search field 34565 may receive user input to filter the notable events displayed in results section 34570 by keyword. When one or more search terms is input to search field 34565, those search terms may be compared against the data in each field of each stored notable event to determine if any keywords in the notable event(s) match the search terms. As a result, the notable events displayed in results section 34570 can be filtered by keyword in response to user input from search field 34565.


Service field 34566 may receive user input to filter the notable events displayed in results section 34570 by service. During creation of a KPI correlation search, the related services of the KPI correlation search may be defined such that if the data produced by the search query satisfies the triggering condition, the resulting notable event will be associated with those services. Since the KPI correlation search, whether an aggregate KPI or aspect KPI, indicates a state of a service at a point in time or during a period of time and derives values from corresponding machine data for the one or more entities that make up the service, the service associated with the notable event generated from the KPI correlation search is known. When the notable event is stored, one piece of associated information is the associated service(s) of the correlation search from which the notable event is generated. In one implementation, other services having a dependency relationship with the KPI may also be stored as part of the notable event record. (A dependency relationship may include an inbound or outbound dependency relationship, i.e., an “is depended on by” or a “depends upon” relationship.) Accordingly, the notable events can be filtered by service in response to user input from service field 34566.


Time period selection menu 34567 receive user input to filter the notable events displayed in results section 34570 by time period during which the events were created. In one implementation, time period selection menu 34567 may include a drop down menu from which the user can select one or more time periods. The time periods may include, for example, the last minute, last five minutes, last hour, last five hours, last 24 hours, last week, etc. When a notable event is stored, one piece of associated information is a time stamp indicating a time at which the correlation search from which the notable event is generated was run. In one implementation, each time period from menu 34567 may be selectable to filter the notable events displayed in results section 34570. When one or more time periods are selected, the notable events displayed in results section 34570 may be limited to notable events that were generated during the selected time period(s).


Timeline 34568 may include a visual representation of the number of notable events that were created during various subsets of the time period selected via time period selection menu 34567. In one implementation, timeline 34568 includes the selected period of time displayed along the horizontal axis and broken into representative subsets (e.g., 1 minute intervals, 1 hour intervals, etc.). The vertical axis may include an indication of the number of notable events that were generated at a given point in time. Thus, the visual representation may include, for example a bar or column chart that indicates the number of notable events generated during each subset of the period of time. In other implementations, the visual representation may include a line chart, a heat map, or some other time of visualization. In one implementation, a user may select a period of time represented on timeline 34568 in order to filter the notable events displayed in results section 34570. When a period of time is selected from timeline 34568 (e.g., by clicking and dragging or otherwise highlighting a portion of the timeline 34568, the notable events displayed in results section 34570 may be limited to notable events that were generated during the selected period of time.


In one implementation, results section 34570 of GUI 34550 displays one or more notable events that meet the filtering criteria entered in filtering controls section 34560, and displays certain information pertaining to those notable events. In one implementation, a corresponding entry for each notable event that satisfies the filtering criteria may be displayed in results section 34570. In one implementation, various columns are displayed for each entry in results section 34570, each including a different piece of information pertaining to the notable event. These columns may include, for example, time 34571, service(s) 34572, title 34573, severity 34574, status 34575, owner 34576, and actions 34577. In other implementations, additional and/or different columns may be displayed in results section 34570. Each column may correspond to one of the filtering controls in section 34560. For example, time column 34571 may display a time stamp indicating the time at which the correlation search from which the notable event is generated was run, services column 34572 may display the service(s) with which the correlation search from which the notable event is generated are associated, and title column 34573 may display the name of the correlation search from which the notable event is generated. Similarly, severity column 34574 may display the severity level of the notable event as defined during creation of the corresponding correlation search, status column 34575 may display a status of the notable event, and owner column 34576 may display the owner of correlation search from which the notable event is generated. In one implementation, actions column 34577 may include a drop down menu from which the user can select one or more actions to take with respect to the notable event. The action options may vary according to the type of notable event, such as whether the notable event was generated as a result of a general correlation search or a KPI correlation search. The actions that can be taken are discussed in more detail below with respect to FIGS. 34R-34S. In one implementation, results section 34570 further includes editing controls 34578 which can be used to edit one or more of the displayed notable events. The editing controls are discussed in more detail below with respect to FIG. 34Q.



FIG. 34Q illustrates an example of a GUI 34580 editing information pertaining to a notable event created as a result of a correlation search, in accordance with one or more implementations of the present disclosure. In response to selecting editing controls 34578 and one or more notable event records in GUI 34550 of FIG. 34PA, GUI 34580 of FIG. 34Q may be displayed. For example, GUI 34580 can include multiple fields 34582-34588 for editing a notable event record. In one implementation, status field 34582 may receive user input to change or set the status of the notable event. Status field 34582 may include a drop down menu from which the user can select one or more status values, such as for example, “unassigned,” “new,” “in progress,” “pending,” resolved,” “closed,” or other options. Severity field 34584 may receive user input to change or set the severity level of the notable event. Severity field 34584 may include a drop down menu from which the user can select one or more severity levels, such as for example, “critical,” “high,” “medium,” “low,” “info,” etc. Owner field 34586 may receive user input to change or set the owner of the notable event. Owner field 34586 may include a drop down menu from which the user can select one or more possible owners. Comment field 34588 may be a text input field where the user can add a note, memo, message, annotation, comment or other piece of information to be associated with the notable event record. In one implementation, upon changing or setting one of the values in GUI 34580, the corresponding notable event record may be updated in the notable events index and the change may be reflected in results section 34570 of GUI 34550 of FIG. 34PA.



FIG. 34R illustrates an example of a GUI presenting options for actions that may be taken for a corresponding notable event created as a result of a KPI correlation search, in accordance with one or more implementations of the present disclosure. When actions column 34577 for a particular notable event entry in results section 34570 of GUI 34550 is selected, a number of action options are displayed. In one implementation, when the selected notable event was generated as a result of a KPI correlation search, the action options include “Open contributing kpis in deep dive” 34591 and “Open correlation search in deep dive” 34592. Selection of either option 34591 or 34592 may generate a deep dive visual interface, which includes detailed information for the notable event. A deep dive visual interface displays time-based graphical visualizations corresponding to the notable event to allow a user to visually correlate the values over a defined period of time. Option 34591 may generate a separate graphical visualization for each aspect KPI or aggregate KPI that contributed to the KPI correlation search, where each graphical visualization is displayed on the same timeline. These KPIs are selected during creation of the KPI correlation search, as described above. Option 34592 may generate a single graphical visualization for the values (e.g., the state of the KPI) returned by the KPI correlation search. Deep dive visual interfaces are described in greater detail below in conjunction with FIG. 50A.



FIG. 34S illustrates an example of a GUI presenting options for actions that may be taken for a corresponding notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure. When actions column 34577 for a particular notable event entry in results section 34570 of GUI 34550 is selected, a number of action options are displayed. In one implementation, when the selected notable event was generated as a result of a correlation search, the action options include “Open drilldown search in deep dive” 34593, “Open correlation search in deep dive” 34594. “Open service kpis in deep dive” 34595, and “Go to last deep dive investigation” 34596. Selection of any of options 34593-34596 may generate a deep dive visual interface, which includes detailed information for the notable event. Option 34593 may generate a graphical visualization for the values returned by a drilldown search associated with the correlation search. In one implementation, during creation of the correlation search, a separate drilldown search may be defined such that if the data produced by the search query of the original correlation search satisfies the triggering condition, the separate drilldown search may be run. The drilldown search may return additional values from among the data originally produced by the search query of the correlation search. Option 34594 may generate a single graphical visualization for the values produced by the search query of the correlation search. Option 34595 may generate a separate graphical visualization for each KPI, whether an aspect KPI or an aggregate KPI, that is associated with the service corresponding to the selected notable event, where each graphical visualization is displayed on the same timeline. Option 34596 may open the last deep dive visual interface that was generated for the selected notable event, which may have been generated according to any of options 34593-34595, as described above.



FIG. 34T illustrates an example of a GUI presenting detailed information pertaining to a notable event created as a result of a correlation search, in accordance with one or more implementations of the present disclosure. When a particular notable event entry in results section 34570 of GUI 34550 (of FIG. 34PA) is selected, detailed information section 34600 of FIG. 34T may be displayed. In one implementation, detailed information section 34600 includes the same information in columns 34571-34577, as discussed above, as well as additional information. That additional information may include, for example, possible affected services 34601, contributing KPIs 34602, a link to the correlation search that generated the notable event 34603, a history of activity for the notable event 34604, the original notable event 34605, a description of the notable event 34606, and/or other information.


The services identified in the list of possible affected services 34601 may be obtained from the service definitions of the services indicated in column 34572. The service definition may include service dependencies. The dependencies indicate one or more other services with which the service has a dependency relationship. For example, a set of entities (e.g., host machines) may define a testing environment that provides a sandbox service for isolating and testing untested programming code changes. In another example, a specific set of entities (e.g., host machines) may define a revision control system that provides a revision control service to a development organization. In yet another example, a set of entities (e.g., switches, firewall systems, and routers) may define a network that provides a networking service. The sandbox service can depend on the revision control service and the networking service. The revision control service can depend on the networking service, and so on. The KPIs identified in the list of contributing KPIs 34602 may include any KPIs, whether aspect KPIs or aggregate KPIs, that were specified in the KPI correlation search that generated the notable event. The link to the correlation search 34603 may display the KPI correlation search generation interface that was used to create the KPI correlation search that generated the notable event. History 34604 may show all review activity related to the notable event, including when the notable event was generated, when information pertaining to the notable event was edited (e.g., status, severity, owner), what actions were taken with respect to the notable event (e.g., generation of a deep dive), etc. The original notable event 34605 and the description of the notable event 34606 may display an explanation of how and why the notable event was generated. For example, the explanation may include a written description of what KPIs were monitored in the KPI correlation search, the period of time that was considered and what the triggering condition was that caused generation of the notable event. In other implementations, detailed information section 34600 may include different and/or additional information pertaining to the notable event.


Service Now Integration



FIG. 34U illustrates an example of a GUI for configuring a ServiceNow™ incident ticket produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure. In one implementation, GUI 34700 accepts user input to configure the creation a ticket in an incident ticketing system as the action resulting from the data produced by a correlation search query satisfying the associated triggering condition. In one implementation, the system may create a ticket in the ServiceNow™ incident ticketing system. In other implementations, other incident ticketing or service management systems may be used. The generated ticket serves as a record of the incident or event that triggered the correlation search and can be used to track analysis and service of the incident or event.


In one implementation, GUI 34700 may include a number of user input fields that receive user input to configure creation of the ticket. Ticket type field 34701 receives input to specify the whether the ticket type is an incident or an event. When the ticket type is set as “incident,” fields 34702-34706 are displayed. Category field 34702 receives input to specify whether the ticket should be categorized as a request, inquiry, software related, hardware related, network related, or database related. Contact type field 34703 receives input to specify whether the ticket was created as a result of an email, a phone call, self-service request, walk-in, form or forms. Urgency field 34704 receives input to specify whether an urgency for the ticket should be set as low, medium or high. State field 34705 receives user input to specify whether an initial state of the ticket should be set as new, active, awaiting problem, awaiting user information, awaiting evidence, resolved or closed. Description field 34706 receives textual input specifying any other information related to the ticket that is not included above.



FIG. 34V illustrates an example of a GUI for configuring a ServiceNow™ event ticket produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure. When the ticket type is set as “event,” fields 34707-34712 are displayed in GUI 34700. Node field 34707 receives input to identify the host, node or other machine on which the event occurred (e.g., hostname). Resource field 34708 receives input to identify a subcomponent of the node where the event occurred (e.g., CPU, Operating system). Type field 34709 receives input to specify the type of the event that occurred (e.g., hardware, software). Severity field 34710 receives user input to specify a severity of the event (e.g., critical, high, medium, normal, low). Description field 34711 and additional information field 34712 receive textual input specifying any other information related to the ticket that is not included above.


Once the creation of a ticket is configured as the action associated with a correlation search, a new ticket will be created each time the correlation search is triggered. As described above, the correlation search may be run periodically in the system and when the data generated in response to the correlation search query satisfies the associated triggering condition, an action may be performed, such as the creation of a ticket in the incident ticketing system, according to the configuration parameters described above.



FIG. 34W illustrates an example of a GUI presenting options for actions that may be taken for a corresponding notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure. If the creation of a ticket was not configured to be the action resulting from a correlation search, a ticket can be created from any notable event that was previously created through the Incident Review interface. In another implementation, a ticket can be created from any notable event in the Incident Review interface, even if the creation of another ticket was configured as part of the correlation search. As described above, when actions column 34577 for a particular notable event entry in results section 34570 of GUI 34550 is selected, a number of action options are displayed. In one implementation, the action options additionally include “create ServiceNow ticket” 34718. Selection of option 34718 may create a single ticket for the selected notable event(s). In one implementation, selection of option 34718 causes display of modal window 34720 which contains the configuration options for creating an incident ticket, as shown in FIG. 34X, or for creating an event ticket, as shown in FIG. 34Y. In one implementation, the configuration options are the same as the options illustrated in FIG. 34U and FIG. 34V, respectively.



FIG. 34Z illustrates an example of a GUI presenting detailed information pertaining to a notable event produced as a result of a correlation search, in accordance with one or more implementations of the present disclosure. As discussed above, when a particular notable event entry in results section 34570 of GUI 34550 is selected, detailed information section 34600 may be displayed. In one implementation, detailed information section 34600 additionally includes a ServiceNow option 34730. The presence of option 34730 indicates that a ticket has been created for the selected notable event, whether as an action resulting from the correlation search or manually through the Incident Review interface. In one implementation, selection of the ServiceNow option 34730 may cause display of an external ServiceNow incident ticketing system interface for further review, editing, etc. of the associated ticket. In another implementation, selection of the ServiceNow option 34730 may trigger a search in a new window showing the user all of the tickets created in ServiceNow™ corresponding to this notable event in a tabular format. One such column in the table would be the URL of the ticket in the ServiceNow™ ticketing system. Clicking this URL may open the ServiceNow™ ticketing system interface for further review, editing, etc. of the associated ticket. Other columns in the table may include a unique ID of the ticket in ServiceNow™, a ticket number of this ticket etc. “Event” and “Incident” are specific to the ServiceNow™ implementation. In other implementations, when other ticketing systems are used for integration, the terms pertaining to these systems may be used.


Example Service Detail Interface


FIG. 34ZA1 illustrates a process embodiment for conducting a user interface for service monitoring based on service detail. Method 34920 is an illustrative example and embodiments may vary in the number, selection, sequence, parallelism, grouping, organization, and the like, of the various operations included in an implementation. At block 34921, the computing machine receives the identity of a particular service as may be defined in the service monitoring system. In an embodiment, the service identity may be received by receiving a service identifier, or an indication for it, based on user input to a GUI. In an embodiment, the service identity may be received by receiving an indication of a service identity that has been programmatically passed to the service monitoring system from within or without. Other embodiments are possible. At block 34922, detail information related to the identified service may be gathered. In one embodiment, gathering of the detail information includes identifying the desired detail information related to the service. In one embodiment identifying the desired detail information includes locating it for retrieval. In one embodiment, the identified desired detail information is copied to a common collection, location, data structure, or the like, directly or indirectly, by value or by reference, as part of the gathering operation. In one embodiment, the identified desired detail information is utilized as it is identified from its original location, more or less, without necessarily bringing the information into a common location, structure, construct, or the like. In one embodiment, gathering may include co-locating some items of the identified detail information, and not others. Gathering of the detail information may include, for example, gathering definitional data related to the service such as information from a stored service definition for the service itself, information from stored definitions for entities that provide the service, and information that defines or describes KPIs related to the service. Gathering of the detail information may include, for example, gathering dynamically produced machine or performance data related directly or indirectly to the service, such as current, recent, and/or historic KPI and entity data.


At block 34923, gathered information is presented to the user in a service detail interface. In an embodiment, the gathered detail information may be organized into a number of distinct display areas, regions, portions, frames, windows, segments, or the like. In an embodiment, the gathered detail information may use higher density formats to display the information in order to increase the amount of readable and/or perceivable service detail information available to the user from a single view. Examples of higher density formats may include smaller font sizes, closer spacing, color coding, and iconography, to name a few. In an embodiment, service detail information presented in an interface may be refreshed automatically on a regular basis. In such an embodiment, the regular refreshment of performance or other metric data may provide a user with a real-time or near real-time representation of the service for service monitoring. In an embodiment, a user may be able to suspend an automatic refresh of the displayed information, for example, to study the service date for problem determination. In an embodiment, items of detail information presented by the service detail interface may be enabled for user interaction.


At block 34924, user interaction with the service detail interface is received. The user interaction may be received as data or other signals, received directly or indirectly, from hardware, drivers, or other software, as a result of user interaction with human interface devices such as keyboards, mice, touchpads, touchscreens, microphones, user observation cameras, and the like. At block 34925, a determination is made whether the received user interaction is to perform a navigation away from the service detail interface. If so, at block 34926, the desired navigation is performed and may include carrying certain information forward from the service detail interface to the navigation destination. If not, at block 34927, processing indicated by the user interaction is performed which may include returning to block 34923 to present an updated view of the service detail interface.


Method 34920 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine. Many combinations of processing apparatus to perform the method are possible.


FIG. 34ZA2 illustrates a user interface as may be employed to enable of user to view and interact with service detail information in one embodiment. Interface 34930 illustrates the display of a user interface as might be presented in the processing of block 34923 of FIG. 34ZA1. Interface 34930 of FIG. 34ZA2 is shown to include system title bar area 34931, application menu/navigation bar area 34932, service identifier 34933, timeframe component 34934, service relationship component 34935, KPI detail information component 34936, and entity detail information component 34937. System title bar area 34931 is comparable to system title bar area 27102 of FIG. 27A2 discussed in detail elsewhere. Application menu/navigation bar area 34932 is comparable to application menu/navigation bar area 27104 of FIG. 27A2 discussed in detail elsewhere. Service identifier component 34933 of FIG. 34ZA2 is shown to include an identifier for the service to which the detail information of interface 34930 pertains. Here, “Splunk” is shown as the service name.


In an embodiment, service identifier component 34933 may be enabled for user interaction. In an embodiment interaction with service identifier component 34933 may cause the computing machine to present a drop-down list of available services in the user interface from which the user can make a selection. In an embodiment, making a selection from such a list for a different service may result in the identifier for the selected service appearing in place of “Splunk” in service identifier component 34933 and in the replacement of the information appearing in interface 34930 relating to the “Splunk” service with information relating to the newly selected service. An example of such a drop-down list is illustrated in FIG. 34ZA5.


FIG. 34ZA5 illustrates an embodiment of a service selection interface aspect. User interface displays portion 34960 represents a modified portion of the display of interface 34930 of FIG. 34ZA2 after a user interaction with element 34933. Notably, the display is modified to include the appearance of drop-down list 34961 of FIG. 34ZA5. Drop-down list 34961 is shown to include four selection list entries 34961a-d. Each selection list entry displays an identifier for a service recognized by the service monitoring system. In an embodiment, a list entry may include an indication of the currently selected service such as by highlighting or such as the checkmark as appearing in relation to list entry 34961d. Each of the list entries 34961a-d may be interactive so as to allow a user to indicate the selection of a service as the current service of interface 34930 of FIG. 34ZA2.


The various processing just described in relation to interaction with service identifier component 34933 are examples of the types of processing as may be included in the processing of block 34927 of FIG. 34ZA1.


Service relationship component 34935 of interface 34930 of FIG. 34ZA2 is shown as providing a graphical depiction of the current service (“Splunk”) and its relationships with one or more other services. In an embodiment, the graphical depiction may include a representation of a topology of the relationships. In an embodiment, the relationships may indicate dependencies between the services. In an embodiment, the relationships may be directional such that, in the case of directional dependency relationships, the first of two related services may be said to “impact” the second service, and the second may be said to “depend on” the first. The service relationship component 34935 may be enabled for user interaction such that a user action to indicate the selection of a service represented in the component, such as the “Change Analysis” service that impacts the current service (“Splunk”) as shown, causes processing so as to make the newly selected service the current service of interface 34930. Such processing is an example of the processing as may be included in the processing of block 34927 of FIG. 34ZA1.


Many embodiments to present information about services related to the current service are possible for service relationship component 34935 of FIG. 34ZA2. Many embodiments are also possible where the service relationship component 34935 includes the service topology navigator based on service dependency relationships. Consideration of topology graph component 75310 of FIG. 75C and topology graph component 75410 of FIG. 75D, for example, and the related discussion, may be instructive.


KPI detail information component 34936 is now considered by reference to FIG. 34ZA3. FIG. 34ZA3 illustrates a KPI portion of a service detail user interface in one embodiment. Interface portion 34936a represents matter of a user interface display as may appear in the KPI portion of the service detail interface such as KPI portion 34936 of interface 34930 of FIG. 34ZA2. Interface portion 34936a of FIG. 34ZA3 is shown to include first header section component 34940, second header section component 34941, and KPI detail display component 34946. KPI detail display component 34946 is shown to include KPI list entry components 34942-34945, one entry for each of 4 individual KPIs. In this illustrative embodiment, the list entry for each KPI is shown to include multiple items. For example, list entry 34942 is shown to include color-coded KPI status icon 34942a, KPI state indicator 34942b, KPI name/title/identifier 34942c, KPI sparkline 34942d, and KPI value 34942e. In an embodiment, components of a list entry such as 34942 or the entire list entry itself may be enabled for user interaction. For example, in one embodiment a single mouse click or touchscreen press on the list entry may result in the selection of the associated KPI as a filter criteria to be used elsewhere, such as a filter criteria for the entities displayed in entity detail area 34937 of FIG. 34ZA2. As another example, in one embodiment a double mouse click or double touchscreen press on the list entry may result in navigation to a different user interface that perhaps displays different, additional, or other information related to the particular KPI associated with the list entry. An embodiment may enable both of the interactions just described. Many variations and embodiments are possible.


First header section component 34940 of FIG. 34ZA3 is shown to include a color-coded icon (circle) representing the state of the current service, followed by a fixed title portion (“KPIs in”) and a variable title portion reflecting the identity of the current service (“Splunk”). Second header section component 34941 is shown to include a count of the KPIs in the current service (“4 KPIs”) followed by text indicating a navigation option (“Open in Deep Dive”). Navigation option text (“Open in Deep Dive”) may be enabled for interaction in an embodiment such that a user interaction (e.g., a mouse click) will cause the computing machine to navigate to a different user interface, while possibly passing or carrying forward information from the working context of the current interface to the different user interface. In an embodiment, user interaction with the navigation option text may result in navigation to a user interface that includes a time-based graph lane for each of the KPIs of a service, such as an embodiment of a deep dive GUI as discussed in regards to FIGS. 50A-70, for example. Such navigational processing is an example of the type of processing that may be included in the processing of block 34926 of FIG. 34ZA1.


Entity detail information component 34937 of FIG. 34ZA2 is now considered by reference to FIG. 34ZA4. FIG. 34ZA4 illustrates an entity portion of a service detail user interface in one embodiment. Interface portion 34937a represents matter of a user interface display as may appear in the entity portion of the service detail interface such as entity portion 34937 of interface 34930 of FIG. 34ZA2. Interface portion 34937a of FIG. 34ZA4 is shown to include first header section component 34950, second header section component 34951, content navigation component 34952, and an entity detail display component that includes column header component 34953a and entity detail list data area 34953b. Entity detail list data area 34953b is shown to include multiple list entries, each occupying a row, and each corresponding to an entity related to the current service and possibly to a particular KPI. Column header component 34953a provides an indication of the data items as may be presented in each entity list entry. For example, entity list entry 34954 is shown to include a color-coded icon (circle) and text (“Normal”) that correspond to a column heading of “Alert_Level”, the text “/services/apps/local” that corresponds to a column heading of “Entity_Title”, a graphical spark line that corresponds to a column heading of “spark line” and that represents a time series of entity data for the current KPI and timeframe, and the value 16.000000 that corresponds to a column heading of “alert value”.


In an embodiment, components of a list entry such as 34954 or the entire list entry itself may be enabled for user interaction. For example, in one embodiment a single mouse click or touchscreen press on the list entry may result in the selection of the associated entity as a search or filter criteria to be used elsewhere. As another example, in one embodiment a double mouse click or double touchscreen press on the list entry may result in navigation to a different user interface that perhaps displays different, additional, or other information related to the particular entity associated with the list entry. The entity detail interface described elsewhere in relation to FIG. 34ZB3 is one possible navigation target. An embodiment may enable both of the interactions just described. Many variations and embodiments are possible.


First header section component 34950 of FIG. 34ZA4 is shown to include a color-coded icon (circle) representing the state of a KPI, followed by a fixed title portion (“entities in”), and a variable title portion reflecting the identity of the current KPI (“Splunk KPI 1”). In an embodiment, the current KPI may be indicated by a selection of one of the KPI entries appearing in a KPI detail portion of the interface, such as KPI detail portion 34936 of interface 34930 of FIG. 34ZA2. In an embodiment, no current KPI may be indicated and all entities for a service may populate the detail pages of an entity detail portion, such as entity detail portion 34937 of interface 34930 of FIG. 34ZA2. Second header section component 34951 of FIG. 34AZ4 displays a count of the number of entities included in the service/KPI identified in 34950. Content navigation component 34952 is shown to include interactive elements that enable a user to navigate multiple logical pages of entity list entries. In one embodiment, a content navigation component may include scrolling controls, for example. Other embodiments are possible.


In one embodiment, the contents of the entity detail display component shown here may be replaced with an array, matrix, or other arrangement of tiles that each singularly represent an entity (not shown). Each tile may display an icon. Each tile may be color coded, for example, to indicate a state or status of the entity it represents. In an embodiment, a tile may or may not include additional information beyond its color coding. In an embodiment, a tile may be relatively small and tiles may be spaced closely so as to provide a very high degree of representational density for entities as may be useful when using the SMS to monitor an environment where a large number of entities exist or are likely to exist for a service. A tile may be considered to be relatively small where, for example, the tile occupies less display area than an entity list entry with the information as shown for list entry 34954 of FIG. 34ZA4, or less display area than any single column of an entity list entry with the information as shown for list entry 34954.


Timeframe component 34934 of FIG. 34ZA2 is now considered by reference to FIG. 34ZA6. FIG. 34ZA6 illustrates a timeframe selection interface display in one embodiment. User interface display portion 34963 represents a modified portion of the display of interface 34930 of FIG. 34ZA2 as may appear after a user interaction with element 34934. Notably, the display is modified to include the appearance of time frame selection component 34964. Time frame selection component 34964 is shown to include time frame selection mode component 34964a and timeframe selection mode options (including, in this example, earliest time value component 34965a, earliest time units component 34965b, and earliest time calendar value component 34965c, latest time value component 34966), and an Apply action component 34967. Timeframe selection mode component 34964a is shown to include an identifier for a time frame selection mode—“Real-time” in this example. Timeframe selection mode component 34964 a may be enabled for user interaction, for example to display a drop-down list that enables a user to indicate a selection of a time frame selection mode from a list which may include options such as “Real-time”, “Offset Time”, and “Fixed Period”. An interaction by a user to make a selection from such a list may result in the modification of timeframe selection component 349642 to display timeframe selection mode options relevant to the newly selected timeframe selection mode. Earliest time value component 34965a may be an editable text box that enables a user to indicate a value for the earliest time of the time frame being specified by the user. Earliest time units component 34965b may be a drop-down list component that enables the user to designate a time unit applicable to the value shown by the earliest time value component 34965a. Earliest time units component 34965b may display a default units value, such as “Hours Ago”, or “Hours Ago” as shown may reflect the latest selection made by the user during an interaction with a drop-down list of 34965b. Earliest time calendar value component 34965c may display a computer-generated value of the calendar time (date and time) that corresponds to the information reflected in 34965a-b relative to the current time. Latest time value component 34966 may describe the time value for the end of the time frame being specified by the user. In an embodiment, latest time value component 34966 may indicate “now” whenever the Real-time timeframe selection mode is active for 34964, and user interaction with latest time value component 34966 may be disabled. Apply action component 34967 may be enabled for user interaction so as to permit a user to indicate the acceptance and desirability of a time frame specified by the information appearing in 34964. After such an interaction, the computing machine may remove the display of 34964 and place a descriptor of its designated timeframe timeframe selection component 34934. Further, as a result of such interaction, the information displayed in interface 34930 of FIG. 34ZA2 may be updated to reflect the selected timeframe as appropriate.


Example Entity Detail Interface


FIG. 34ZB1 illustrates a process for conducting a user interface for service monitoring based on entity detail. Method 34970 is an illustrative example and embodiments may vary in the number, selection, sequence, parallelism, grouping, organization, and the like, of the various operations included in an implementation. At block 34970a, the computing machine receives the identity of a particular entity as may be defined in the service monitoring system. In an embodiment, the entity identity may be received by receiving an entity identifier, or an indication for it, based on user input to a GUI. In an embodiment, the entity identity may be received by receiving an indication of an entity identity that has been programmatically passed to the service monitoring system from within or without. Other embodiments are possible. At block 34970b, detail information related to the identified entity may be gathered. In one embodiment, gathering of the detail information includes identifying the desired detail information related to the entity. In one embodiment identifying the desired detail information includes locating it for retrieval. In one embodiment, the identified desired detail information is copied to a common collection, location, data structure, or the like, directly or indirectly, by value or by reference, as part of the gathering operation. In one embodiment, the identified desired detail information is utilized as it is identified from its original location, more or less, without necessarily bringing the information into a common location, structure, construct, or the like. In one embodiment, gathering may include co-locating some items of the identified detail information, and not others. Gathering of the detail information may include, for example, gathering definitional data related to the entity such as information from a stored entity definition for the entity itself, information from stored definitions for services the entity may perform, and information that defines or describes KPIs related to the entity. Gathering of the detail information may include, for example, gathering dynamically produced machine or performance data related directly or indirectly to the entity, such as current, recent, and/or historic KPI and service data.


At block 34970c, gathered information is presented to the user in an entity detail interface. In an embodiment, the gathered detail information may be organized into a number of distinct display areas, regions, portions, frames, windows, segments, or the like. In an embodiment, the gathered detail information may use higher density formats to display the information in order to increase the amount of readable and/or perceivable entity detail information available to the user from a single view. Examples of higher density formats may include smaller font sizes, closer spacing, color coding, and iconography, to name a few. In an embodiment, entity detail information presented in an interface may be refreshed automatically on a regular basis. In such an embodiment, the regular refreshment of performance or other metric data may provide a user with a real-time or near real-time representation of the entity for service monitoring. In an embodiment, a user may be able to suspend an automatic refresh of the displayed information, for example, to study the entity data for problem determination. In an embodiment, items of detail information presented by the entity detail interface may be enabled for user interaction.


At block 34970d, user interaction with the entity detail interface is received. The user interaction may be received as data or other signals, received directly or indirectly, from hardware, drivers, or other software, as a result of user interaction with human interface devices such as keyboards, mice, touchpads, touchscreens, microphones, user observation cameras, and the like. At block 34970e, a determination is made whether the received user interaction is to perform a navigation away from the entity detail interface. If so, at block 34970f, the desired navigation is performed and may include carrying certain information forward from the entity detail interface to the navigation destination. If not, at block 34970g, processing indicated by the user interaction is performed which may include returning to block 34970c to present an updated view of the entity detail interface.


Method 34970 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine. Many combinations of processing apparatus to perform the method are possible.


FIG. 34ZB2 illustrates an entity lister interface in one embodiment. Interface 34971 illustrates a user interface display as it might appear for an embodiment during processing that precedes the processing of FIG. 34ZB1. Interface 34971 of FIG. 34ZB2 may be implemented among a robust set of interfaces that make up the command-and-control console of a data input and query system such as an event processing system, in an embodiment. Interface 34971 may provide an entity list view for all of the entities defined or recognized in an embodiment. Interface 34971 is shown to include entity list header bar 34972. Entity list header bar 34972 in an embodiment may include interface components such as entity count component 34972a for displaying a total count of the entities in the list (e.g., “1 Entity”), Bulk Action drop-down component 34972b for providing an interactive set of action options selectable by the user to perform against one or more selected entities represented in the entity list, filter component 34972c for enabling a user to specify filter criteria to limit the entities included in the list of the interface, and advanced filter component 34972d for enabling a user to specify additional filter criteria and/or parameters, possibly via a drop-down menu, pop-up window, or the like.


Interface 34971 is shown to further include an entity list area having column header component 34973a and entity list entry area 34973b. Entity list entry area 34973b may display one or more entity list entries appearing as list items or list rows such as entity list entry 34974. Each entity list entry may correspond to a single entity having an entity definition or otherwise recognized by an embodiment. Entity list entry 34974 is shown to include the entity title or name identifier “apps-demo05” corresponding to the “Title” column heading of 34973a, the entity alias “apps-demo05” corresponding to the “Aliases” column heading of 34973a, and the associated service identifiers “This is name” and “Splunk OS Host Monitoring” corresponding to the “Services” column heading of 34973a. Entity list entry 34974 is shown to further include navigation link component “View Health” corresponding to the “Health” column heading of 34973a, and action drop-down component “Edit” corresponding to the “Actions” column heading of 34973a. Embodiments may vary as to the number, content, and arrangement of items as may be included in the display of an entity list entry. In one embodiment, an entity list entry may only include an entity identifier.


One or more components within an entity list entry such as 34974, or the entity list entry as a whole and may enable user interaction. A particular user interaction, such as a mouse click, may engage processing to transition to an interface display other than 34971. Such transition processing in an embodiment may include the identifying, collecting, and formatting, or the like, of information of interface 34971 or its working context (e.g., window size, user identity, recent history) to pass or carry forward to the navigation target. In one embodiment, double clicking on the entity title of the entity list entry may cause the computing machine to perform a method of a service monitoring system such as method 34970 of FIG. 34ZB1 which may cause the display of an entity detail interface screen or page, such as entity detail interface display 34980 of FIG. 34ZB3. In such an embodiment, the processing of the double click for interface 34971 of FIG. 34ZB2 may include passing or carrying forward display and context information of the interface, including an entity identifier, such that the initial display of interface 34980 of FIG. 34ZB3 may be pre-populated with information related to the entity represented by the entity list entry that was double clicked (e.g. 34974 of FIG. 34ZB2).


FIG. 34ZB3 illustrates a user interface as may be employed to enable of user to view and interact with entity detail information in one embodiment. Interface 34980 illustrates a user interface display as it might appear for an embodiment during the processing of blocks 34970c-g of FIG. 34ZB1. Interface 34980 of FIG. 34ZB3 is shown to include system title bar area 34931, application menu/navigation bar area 34932, entity information area 34982, timeframe component 34981, entity-specific navigation component 34983, service detail information component 34984, and KPI detail information component 34985. System title bar area 34931 is comparable to system title bar area 27102 of FIG. 27A2 discussed in detail elsewhere. Application menu/navigation bar area 34932 is comparable to application menu/navigation bar area 27104 of FIG. 27A2 discussed in detail elsewhere. Entity information area 34982 of FIG. 34ZB3 is shown to include an identifier for the entity to which the detail information of interface 34980 pertains. Here, “apps-demo05” is shown as the entity identifier. Entity information area 34982 is shown to also include a number of names or descriptors of data fields or information items, followed by corresponding values. The values may represent properties, attributes, characteristics, metadata, or other information pertaining to the entity that is the subject of the display, in one embodiment. In one embodiment, information presented in entity information area 34982 may exclusively be information represented in a formal stored entity definition. In one embodiment, information presented in entity information area 34982 may include information represented in a formal stored entity definition for the subject entity and from other sources. Entity information area 34982 of the present example is shown to display a title value of “apps-demo05”, a host value of “apps-demo05”, a role value of “operating_system_host”, and a vendor_product value of “hardware”.


Timeframe component 34981 of FIG. 34ZB3 is now considered by reference to FIG. 34ZB6. FIG. 34ZA6 illustrates a timeframe selection interface display in one embodiment. User interface display portion 34980a represents a modified portion of the display of interface 34980 of FIG. 34ZB3 as may appear after a user interaction with element 34981. Notably, the display is modified to include the appearance of time frame selection component 34996. Time frame selection component 34996 is shown to include time frame selection mode components 34996a-f. In an embodiment, each time frame selection mode component may be a collapsible interface section and may be interactive to enable a user to toggle between the collapsed and expanded views or states. As shown, time frame selection mode components 34996a-b and 34996d-f are in the collapsed state, while time frame selection mode component 34996c is in the expanded state. When in one expanded state, in an embodiment, the time frame selection mode component may display one or more time frame selection mode options, action buttons, or other elements. The expanded display of time frame selection mode component 34996c, identified as a “Real-time” time frame selection mode, is shown with time frame selection mode options including, in this example, earliest time value component 34997a, earliest time units component 34997b, and earliest time calendar value component 34997c, latest time value component 34998a, and an Apply action component 34998b. Earliest time value component 34997a may be an editable text box that enables a user to indicate a value for the earliest time of the time frame being specified by the user. Earliest time units component 34997b may be a drop-down list component that enables the user to designate a time unit applicable to the value shown by the earliest time value component 34997a. Earliest time units component 34997b may display a default units value, such as “Hours Ago”, or “Hours Ago” as shown may reflect the latest selection made by the user during an interaction with a drop-down list of 34997b. Earliest time calendar value component 34997c may display a computer-generated value of the calendar time (date and time) that corresponds to the information reflected in 34997a-b relative to the current time. Latest time value component 34998a may describe the time value for the end of the time frame being specified by the user. In an embodiment, latest time value component 34998a may always indicate “now” for the Real-time timeframe selection mode and user interaction with latest time value component 34998a may be disabled. Apply action component 34998b may be enabled for user interaction so as to permit a user to indicate the acceptance and desirability of a time frame specified by the information appearing for 34996c. After such an interaction, the computing machine may remove the display of 34996 and place a descriptor of its designated timeframe in timeframe selection component 34981. Further, as a result of such interaction, the information displayed in interface 34980 of FIG. 34ZB3 may be updated to reflect the selected timeframe as appropriate.


While not shown in FIG. 34ZB6, each of time frame selection mode components 34996a-b and 34996d-f when in an expanded state may display one or more time frame selection mode options, action buttons, or other elements relevant to the particular timeframe selection mode. In an embodiment, timeframe selection mode component 34996a representing a “Presets” time frame selection mode, when expanded, may display a drop-down list component enabling a to user to select a time frame from among a list of predefined timeframe option settings. In an embodiment, timeframe selection mode component 34996b representing a “Relative” time frame selection mode, when expanded, may display offset value, offset units, and duration as selection mode options. In an embodiment, timeframe selection mode component 34996d representing a “Date Range” time frame selection mode, when expanded, may display start_date and end_date as selection mode options. In an embodiment, timeframe selection mode component 34996e representing a “Date & Time Range” time frame selection mode, when expanded, may display start_date, start_time, end_date, and end_time as selection mode options. In an embodiment, timeframe selection mode component 34996f representing an “Advanced” time frame selection mode, when expanded, may display user-supplied text representing programming code or an expression language, for example, that may specify filtering procedure or criteria related to time and date information. In an embodiment, each of the time frame selection mode components may include a common element such as an Apply action button. Embodiments of the above may vary.


Entity-specific navigation component 34983 in an embodiment may present the user with a number of navigation option elements, such as the “OS Host Details” navigation option element shown in FIG. 34ZB3. Entity-specific navigation component 34983 may be entity-specific (i.e., specialized to a particular entity) in the sense that, in an embodiment, one or more of the presented navigation option elements were selected or filtered for inclusion in the interface display based on a determined relationship, association, or affinity to the particular entity. For example, one service monitoring system embodiment may permit the installation, selection, or activation of modules having configuration and control data and related content. The total content of the module may be related to a functional role or class occupied by one or more services or entities in the service monitoring system. The presence of an operational module in the service monitoring system may extend the functionality of the system by providing, for example, visualizations and interfaces custom tailored to the functional role. Modules may be used to meet the needs of subject matter domain experts or to leverage the expertise of a subject matter domain expert in the creation of such a module. The presently described service monitoring system embodiment may include modules for such service/entity roles as OS hosts, web servers, load balancers, and authentication servers, for example. In such an embodiment, a navigation option element targeting a visualization or other interface of a module may be included in entity-specific navigation component 34983 by virtue of an association between the entity of interface 34980 and a functional role associated with the module. For example, the “OS Host Details” navigation option element shown in 34983 may target a visualization interface of a module related to an operating_system_host role, and may have been selected or filtered for inclusion in 34983 because entity “apps-demo05” is associated with the operating_system_host role as indicated in 34982, perhaps by an information field of its entity definition indicating the role association.


Service detail information component 34984 of FIG. 34ZB3 is now considered by reference to FIG. 34ZB4. FIG. 34ZB4 illustrates a service portion of an entity detail user interface in one embodiment. Interface portion 34984a represents matter of a user interface display as may appear in the services portion of the entity detail interface such as services portion 34984 of interface 34980 of FIG. 34ZB3. Interface portion 34984a of FIG. 34ZB4 is shown to include a service detail display component that includes column header component 34990a and service detail list data area 34990b. Service detail list data area 34990b is shown to include multiple list entries, each occupying a row, and each corresponding to a service related to the current entity of the user interface. Column header component 34990a provides an indication of the data items as may be presented in each service list entry. For example, service list entry 34991 is shown to include a color-coded icon (green circle) and text (“Normal”) under column heading “Severity” of 34990a, the text “Splunk OS Host Monitoring” under column heading “Service” of 34990a, a graphical spark line that represents a time series of data related to the interface timeframe and to the service represented by entry 34991 (perhaps a time series of data for an aggregate KPI of the service) under column heading “Sparkline” of 34990a, and the value 100.0 under column heading “Score” that perhaps is from aggregate KPI data of the service. Service list entry 34991 is but an example in an illustrative embodiment, and embodiments may vary widely as to the number, content, organization, and the like, of items included in a service list entry.


KPI detail information component 34985 is now considered by reference to FIG. 34ZB5. FIG. 34ZB5 illustrates a KPI portion of an entity detail user interface in one embodiment. Interface portion 34985a represents matter of a user interface display as may appear in the KPI portion of the entity detail interface such as KPI portion 34985 of interface 34980 of FIG. 34ZB3. Interface portion 34985a of FIG. 34ZB5 is shown to include header section component 34992, and a KPI detail display component including list column heading component 34994a and list entry area component 34994b. List entry area component 34994b may include multiple individual KPI list entry components, such as KPI list entry component 34995. In this illustrative embodiment, the list entry for each KPI is shown to include multiple items. For example, list entry 34995 is shown to include a color-coded KPI status icon (e.g., green circle) and a state or status descriptor (i.e., “Normal”) under the column heading “Severity” of 34994a, a KPI name/title/identifier (i.e., “CPU Overutilization: % System”) under the column heading “KPI” of 34994a, a service name/title/identifier (i.e., “Splunk OS Host Monitoring”) under the column heading “Service” of 34994a, and the leftmost portion of a spark line 34993a that extends beyond the edge of the KPI portion of the interface under the column heading “Sparkline” 34993 of 34995 which also extends beyond the edge of the KPI portion. In an embodiment, components of a list entry such as 34994a or the entire list entry itself may be enabled for user interaction. For example, in one embodiment a user interaction such as a double mouse click or double touchscreen press on the list entry may result in navigation to a different user interface that perhaps displays different, additional, or other information related to the particular KPI associated with the list entry. The processing of the user interaction may cause the computing machine to navigate to the different user interface, while possibly passing or carrying forward information from the working context of the current interface to the different user interface. Many variations and embodiments are possible. In an embodiment, user interaction with a KPI list entry may result in navigation to a user interface that includes a time-based graph lane for the KPI represented by the list entry, such as an embodiment of a deep dive GUI as discussed in regards to FIGS. 50A-70, for example. Such navigational processing is an example of the type of processing that may be included in the processing of block 34970f of FIG. 34ZB1.


A KPI portion of an entity detail user interface such as discussed in relation to 34985 of FIG. 34ZB3 and in relation to FIG. 34BZ5, may be further illuminated to the skilled artisan by consideration of the KPI portion of a service detail user interface such as discussed in relation to KPI portion 34936 of interface 34930 of FIG. 34ZA2 and in relation to FIG. 34ZA3.


Header section component 34992 of interface portion 34985a of FIG. 34ZB5 may include interactive elements that enable a user to move through KPI list entries that cannot all appear in the visible KPI portion at one time. The interactive elements may use a paging paradigm to move through the KPI list entries. In another embodiment scrolling controls may be used.


Maintenance Periods/Windows


An advantage of a service monitoring system (SMS) as illustrated generally by FIG. 2, and as elaborated and expanded herein, may be its ability to summarize information and to prioritize or elevate information about a monitored service or system based on importance or urgency to the user. The value of this advantage cannot be overstated such as in the case of an SMS that monitors an IT environment, where the machine components of the IT environment may generate an overwhelming amount of data that reflects performance in the environment. Such an SMS may augment its advantage by implementing features next described that permit a system user or administrator to define maintenance periods to the SMS that adapt its operation during periods of time when the monitored service or system is expected to depart from normal, such as a period of time when a component, device, or system is offline or reduced in capacity while maintenance is performed. Such features may adapt the reporting aspects of the SMS so as to prevent the generation and/or display of numerous and unimportant alerts that may be caused by the departure from normal during the maintenance period, for example. Such features may lower the priority or rank of information from the non-normal period, in juxtapose to the higher priority and rank with which non-normal information may be treated during regular service monitoring.


It should be recognized that a period of time where the measurements and data of a monitored system are expected to depart from the norm may be referred to as a maintenance period, maintenance window, maintenance time frame, downtime interval, off-line window, exception interval, or by using other terminology. Such time periods may not necessarily be for maintenance but for any anticipated or known timeframes where monitoring data (e.g., data produced by an SMS in normal course of operation to characterize or measure the monitored system/service) is expected to depart from normal. Moreover, in an embodiment, a maintenance period capability may be implemented to most readily support the definition of one-time, ad hoc maintenance periods as contrasted with regular and recurring maintenance or downtime periods, such as a weekly backup window.


FIG. 34ZC1 is a system flow diagram illustrating methods in one embodiment to implement maintenance periods. The methods may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, at least a portion of method is performed by a client computing machine. In another implementation, at least a portion of method is performed by a server computing machine. Computer storage containing instructions and data used by the processing logic in performance of the methods or portions thereof may include one or more types of storage in one or more locations and coupled to the processing logic by one or more communication means including, for example, buses, cables, and networks.


FIG. 34ZC1 illustrates methods and certain related components of a system implementation 90100 permitting maintenance periods and having: processing block 90110 associated with control and command console operations of a service monitoring system (SMS), processing blocks 90130 associated with background monitoring operations of an SMS, processing block 90150 associated with output and report generation operations of an SMS, SMS control datastore 90120, a monitoring measurements/data datastore 90128, and user interface device 90190. SMS control datastore 90120 is shown to include maintenance period (MP) definition 90122. Processing block 90140 is illustrated as both part of background monitor processing block 90130 and output/report generation processing block 90150, to illustrate that various embodiments may employ the processing of block 90140 during various aspects of their operation. FIG. 34ZC1 is not intended to illustrate a complete SMS but rather to illustrate portions of an underlying SMS and material specific to explain by way of illustration an SMS embodiment that provides a novel maintenance period (MP) capability. Aspects of an underlying SMS and its diverse range of possible embodiments and capabilities can be understood by consideration of the whole of this specification.


Processing block 90110 is a logical grouping of processing blocks that serve to implement control and command console functions for an SMS. An SMS such as generally illustrated by FIG. 2 exposes control mechanisms to a user, such as an administrator, that permits the user to direct, control, manage, or specify the operation of the SMS, here now called control and command console functions. Control and command console functions related to maintenance periods in one embodiment are now described and these functions entail exposing an interactive interface that enables a user to specify and activate a maintenance period (MP) definition that the SMS relies on to direct its operations. Simply put, the control and command console functions of block 90110 in this embodiment involve creating and storing a definition of a maintenance period. The maintenance period definition in an embodiment may be created and stored by a computing machine, utilizing information indicated by user input.


At block 90112 of FIG. 34ZC1 the SMS presents a maintenance period (MP) interface to a user, perhaps making use of a client computing device such as 90190. The presented interface may be interactive and may permit the user to provide inputs that control SMS operation with respect to maintenance periods. At block 90114, user inputs are received and processed. In one embodiment, the user inputs may relate to creating, modifying, or deleting the information, in whole or in part, of a maintenance period (MP) definition that is stored in proper location and fashion to exert control over the ongoing operation of the SMS. Arrow 90119 connecting blocks 90112 and 90114 illustrates the iterative processing that may occur between these blocks to implement an interactive, back and forth exchange between the SMS and the user as may be required to prompt for, receive, verify, validate, and the like all of the input needed to specify an MP definition and to commit it to control storage. An example of such an interface is illustrated and discussed below in relation to FIGS. 34ZC2-34ZC6. At some point in the processing of block 90114 a user input is recognized to indicate that at least one MP definition is to exert active control over SMS operation and the processing of block 90116 ensues. At block 90116, the subject MP definition is stored as SMS control information such as by placing MP definition 90122 into an SMS control datastore 90120. In an embodiment, the placement of the MP definition 90122 into SMS control datastore 90120 is sufficient to activate the MP definition to exert control over the operation of the SMS. In an embodiment, an additional separate action may be required to activate the MP definition. In an embodiment, the separate action may be a scheduled or on-demand copy of control information from an inactive staging area to an active control area. These and other embodiments are possible. It is to be further understood that the SMS control datastore 90120 and the MP definition 90122 represent logical constructs and their further logical and physical implementation may or may not correspond to the simple, unitary description here.


In one embodiment, MP definition 90122 includes information to identify a maintenance object and to specify a time period. The maintenance object (MO) is the construct that is expected to experience non-normal data and/or measurements during the maintenance period. The maintenance object may be physical (e.g., a host computer used to perform a service that is, perhaps, actually undergoing maintenance) or logical (e.g., a service defined to the SMS). In one embodiment, maintenance objects may be restricted to entities defined in the SMS. In one embodiment, maintenance objects may be freely selected from a variety of the SMS objects in the system (i.e., objects known and recognized by the SMS, perhaps by virtue of definitional entries in the SMS system; e.g., entities, services, and KPIs). Information to identify the maintenance object in the MP definition may include a name, a key value, a unique reference, or other identifier, for example. Information to identify the maintenance object in the MP definition may be express or implied and may directly or indirectly identify the MO. In an embodiment, an MP definition may be limited to including information able only to identify a single maintenance object. In an embodiment, an MP definition may be able to include information to identify any number of maintenance objects. These and other embodiments are possible.


Information to specify a time period in the MP definition may include a start date and/or time, an end date and/or time, a time duration value, or the like, alone or in combination. In one embodiment information to specify a time period may include a start time without a duration or end time, for example, permitting the creation of open-ended MPs. These and other embodiments are possible. Time values in an MP definition may be absolute (e.g., 01:00:00 PM) or relative (e.g., +00:30:00), and calendar (e.g., Jul. 4, 2016 01:00:00 PM) or durational (e.g., 2 hours). In an embodiment, an MP definition may be limited to including information able only to specify a single time period. In an embodiment, an MP definition may be able to include information to specify multiple time periods. These and other embodiments are possible.


Block 90132 represents monitoring performed by the SMS. Such monitoring may include, for example, the automatic and ongoing generation of values for defined KPIs as described elsewhere herein. (See, for example, monitoring frequency aspects of KPI definitions as discussed, for example in relation to FIG. 23 and elsewhere.) The automatic and ongoing nature of such generation is suggested by the inclusion of block 90132 in processing block 90130, identified as the background monitor processing of the SMS. Despite such designation, one of skill will understand that the monitoring of block 90132 may be ad hoc monitoring, perhaps performed in the foreground, and perhaps performed in direct response to contemporaneous user interactions. The output of block 90132 is a collection of one or more measurements/data points 90128. In an embodiment, a measurement/data point may be a KPI in accordance with disclosure herein.


In one embodiment, as part of the processing 90130 that includes block 90132, the processing of block 90140 is utilized to determine or associate a maintenance state for one or more of the measurements/data points produced at block 90132. In such an embodiment the processing of block 90140 may be used in order to tag or otherwise record or reflect an association between a measurement/data point and a maintenance state. In one embodiment a maintenance state is a bit, flag, tag, value, attribute, indicator, metadata item, or other information representation, indicating that an active and/or applicable maintenance period (MP) pertains to the measurement/data point. In an embodiment, the association of a maintenance state with a measurement/data point may be indicated directly or indirectly, and may be express or implied. In an embodiment, a maintenance state indicator or value may be part of a broader scheme of state values, such as described earlier in the discussion related to FIGS. 19 and 31A-C, for example. In such an embodiment, the processing to accommodate a maintenance state as determined by a defined maintenance period may be integrated with processing for other states. For example, assigning the maintenance state a low position in an ordered range of state values may produce a desired result of ascribing maintenance state items a comparatively low level of criticality and placing them near the bottom of reported lists when state values are used to determine an order of appearance (e.g., most critical first). Of course, an embodiment may use the maintenance state of one or more measurements/data points to determine an order of appearance, criticality level, or the like, independent of any other scheme of states.


The determination of an active and/or applicable maintenance period at block 90142 may use the information of MP definition 90122 in making the determination. In one embodiment, the processing of block 90142 may, in an instance, ascertain whether the measurement/data point pertains to the time period specified in the MP definition by determining whether a time value associated with the measurement/data point (e.g., its creation time, its interval start time, its interval end time, or its interval span) is partly or wholly contained within the time period specified by the MP definition information. (Embodiments may vary as to the degree of intersection required between time aspects of a measurement/data point and the time period specified in the MP definition to make a determination that the measurement/data point pertains to, or has correspondence to, the time period defined for the MP.)


Further, the processing of block 90142 may, in an instance, ascertain whether the measurement/data point has a relationship with the maintenance object identified by information of the MP definition. In an embodiment where the measurement/data point is a KPI value, a definitional topology path that includes the KPI (such as the interrelated service, entity, and KPI definitions discussed elsewhere herein (see, for example, FIGS. 2 and 4, and the related discussion)) may be traversed to locate the maintenance object (MO) identified in the MP definition. For example, SMS definitional data may be searched to determine whether the MO of the MP definition is an entity that performs the service which the KPI measures. In an embodiment, where the measurement/data point is a KPI value, language for the search query that defines the KPI may be searched to determine or ascertain whether the MO identified in the MP definition (e.g., an entity) is also identified in the search query. These and other embodiments are possible for determining or ascertaining a relationship between a measurement/data point (e.g., KPI) and the maintenance object of a maintenance period definition.


In one embodiment, if the processing of block 90142 ascertains both that (i) the measurement/data point pertains to the time period specified in the MP definition, and (ii) the measurement/data point has a relationship with the maintenance object identified by the MP definition, then a positive determination exists that an active/applicable maintenance period is associated with the measurement/data point. Additionally, the measurement/data point may be thereby determined to be in a maintenance state, and that determination may be returned to program logic that invoked the processing of block 90140. In an embodiment, the measurement/data point may be thereby determined to be in a maintenance state (and perhaps in a particular one of many maintenance states) and be associated with that state, for example, by tagging the measurement/data point with maintenance state information during the processing of block 90140.


While an SMS embodiment may perform monitoring and produce measurement/data points at block 90132 at first without obvious regard to a defined maintenance period, and seemingly thereafter utilize the processing of block 90140 to associate appropriate measurements/data points with an applicable maintenance state or period, an embodiment of block 90132 may, in contrast, use the processing of 90140 to determine the existence of an active/applicable maintenance period (MP) for a measurement/data point currently under production and modify or adapt the production of the measurement/data point in view of the defined maintenance period. For example, such an embodiment of block 90132 may adapt, in the case of an active/applicable MP, by removing or excluding data regarding the defined maintenance object(s) from the calculation, determination, or production of the measurement/data point. As another example, such an embodiment of block 90132 may adapt its processing, in the case of an active/applicable MP, by calculating, determining, or producing two values for the measurement/data point, one produced normally (tainted) and one produced by removing or excluding data regarding the defined maintenance object(s) (untainted). As another example, a similar embodiment of block 90132 may adapt its processing, in the case of an active/applicable MP, by calculating, determining, or producing two values for the measurement/data point, one produced normally (tainted) and one produced applying some factor, calculation, adjustment, or other derivation to account for data regarding the defined maintenance object(s) (corrected). These and other embodiments and variations are possible.


In foregoing examples, the aspect of the SMS for the generation of monitoring data (seen in FIG. 34ZC1 as background monitor block 90130) was adapted in various ways in accordance with the defined maintenance period. In a contrasting SMS embodiment, the monitoring activity of block 90130 does not utilize the processing of block 90140 and may perform identically to an SMS that includes no provision for maintenance periods. In such an embodiment, the MP definition 90122 in the SMS control datastore 90120 may affect downstream SMS operation, such as during the processing of block 90150 discussed next.


At block 90150, processing is performed related to output/report generation aspects of a service monitoring system (SMS). In an embodiment, the monitoring measurements/data points 90128 produced by monitoring activity 90130 may be a chief input to output/report generation and presentation processing 90150. The outputs or reports of 90150 should be considered broadly without limiting their form or content. An output or report may be a complete user interface (UI) display page, image, or more, or a small part, element, or attribute thereof, for example. Presenting an output or report should be considered similarly broadly so as to encompass, for example, any publication (i.e., exposure to access), possibly by immediately displaying, storing for downstream display, conveying for downstream display, or storing for downstream use by other automation or processing. Examples, here, will focus on presenting outputs for display to an interested user, perhaps via a client device such as 90190. As SMS facilities for reporting SMS information to a user are discussed in great detail elsewhere, herein, the discussion here focuses on extended processing or adaptations to what is elsewhere fully discussed, in order to implement maintenance period (MP) functionality. Notably, an embodiment implement output and reporting aspects for a maintenance window processing capability by making adaptations to base output and reporting embodiments. After consideration of the discussion that follows, particularly in relation to FIG. 34ZC8, one of skill will appreciate adaptions to implement maintenance period functionality on foundational user interface displays such as the SMS reporting outputs discussed elsewhere, including a service monitoring home page (see FIGS. 48-49F and related discussion, for example), service monitoring dashboard (see FIGS. 35-47C and related discussion, for example), deep dive visualizations (see FIGS. 50A-70K and related discussion, for example), incident review interfaces (see FIG. 82B and related discussion, for example), service detail interfaces (see FIGS. 34ZA1-34ZA6 and related discussion, for example), entity detail interfaces (see FIGS. 34ZB1-34ZB6 and related discussion, for example), and others.


As part of the output and report generation processing of block 90150, an output presentation is determined at block 90152. The processing of block 90152 determines, calculates, formulates, constructs, adapts, derives, devices, or otherwise ascertains, the whole or part, of an SMS reporting output. In an embodiment, the processing of block 90152 may include a determination that the output should be null or nothing; for example, suppressing an output such as the display, in whole or in part, that may be otherwise indicated, or excluding information from a display that composites information about a number of SMS objects (e.g., services, entities). Notably, the determination of an output presentation at block 90152 includes a consideration of any maintenance state determined for, or associated with, measurements/data points 90128 that factor into an output display, whether discretely or by representation in an aggregation or derivation. In one embodiment, the maintenance state information for measurements/data points 90128 that is used at block 90152, was recorded or reflected for the measurement/data points 90128 when they were produced by the processing of block 90130 (including the maintenance state determination/association processing of block 90140). In one embodiment, the maintenance state information for measurements/data points 90128 that is used at block 90152, is produced by certain processing of block 90150 in advance of block 90152 (not specifically shown; and including the maintenance state determination/association processing of block 90140). In one embodiment, the maintenance state information for measurements/data points 90128 that is used at block 90152, is produced during the processing of block 90152 by itself utilizing the maintenance state determination/association processing of block 90140. These and other embodiments are possible.


After the determination of the output presentation at block 90152, any output that exists is presented at block 90154. Presentation of the output may include sending data for a user interface display to client device such as 90190.


FIGS. 34ZC2 through 34ZC6 illustrate user interfaces as may be used to implement command and control console functions related to maintenance periods as described in relation to the processing of block 90110 above.


FIG. 34ZC2 illustrates one embodiment of a user interface for displaying and creating maintenance period definitions in the control data of an SMS. User interface 90200 includes header area 90212, footer area 90216, and main display area 90214. Header area 90212 and footer area 90216 parallel the format and content of header and footer areas shown and or described for other user interfaces herein (cs., e.g., FIG. 27A2). Main display area 90214 includes description area 90220, action button 90222, and a maintenance period (MP) listing area including list management area 90230, column heading area 90242, and list item area 90244. Description area 90220 displays a name or other descriptive information about the user interface: “Maintenance Windows, Viewer for all the maintenance Windows.” MP list management area 90230 includes maintenance period definitions count display 90232 that displays the number of maintenance period definitions in the SMS, list action options area 90234 including a “Bulk Action” item and a “View All” item, and filter area 90236. User interaction with the “Bulk Action” item of 90234 may result in the appearance of a drop-down menu (not shown) indicating actions that may be applied to all listed MP definitions that are selected for action via a check box appearing in column 90252. The Bulk Action drop-down menu may include a “delete” action option, for example. User interaction with the “View All” item of 90234 may result in the appearance of a drop-down menu (not shown) indicating one or more filter options that may be applied to the displayed list of maintenance period definitions. The View All drop-down menu may include a “view active only” filter option, for example. Filter area 90236 may allow user interaction to enter filter criteria for the displayed list that may not be otherwise addressed using the View All drop-down of 90234, For example, a user may be enabled to enter filter criteria via 90236 based on the title or start time of a maintenance period definition. Column heading area 90242 is shown to include an “information” icon in column 90251, an interactive check box in column 90252 for selecting all of the maintenance period definitions listed on the interface, and MP definition field names “Title”, “State”, “Start Time”, “Duration”, and “End Time”, in columns 90253-90257, respectively, and “Actions” column name in column 90258. The MP definitions list shown in the interface 90200 includes only the single MP definition appearing in list item area 90244 and displaying: an interactive “information” icon in column 90251 that permits a user to indicate a request for certain additional information about the MP definition, an interactive check box in column 90252 that permits the user to select the MP definition for certain additional processing such as might be invoked by the Bulk Action drop-down of 90234, “Downtime for Jira” as the title of the MP definition in column 90253, “Active” as the state of the MP definition in column 90254, “0/9/2016, 4:44:19 PM” as the start time of the MP definition in column 90255, “1 Hour” as the duration of the MP definition in column 90256, “Jun. 9, 2016, 5:44:19 PM” as the end time of the MP definition in column 90257, and an interactive “Edit” drop-down menu activator in column 90258 that permits a user to indicate an action to perform against the MP definition in that row/entry.


The maintenance period (MP) definition represented in the list item area 90244 of interface 90200 may have been created by the workflow entered into when a user earlier activated action button 90222 of interface 90200, the “Create New Maintenance Window” button. User interaction with button 90222 may result in the display of a pop-up or other window to begin the process of creating a new maintenance period definition such as interface 90300 of FIG. 34ZC3.


FIGS. 34ZC3 and 34ZC4 illustrate an example of a possible user interface embodiment for creating a maintenance period definition in the control data of an SMS. FIG. 34ZC3 illustrates a user interface 90300 that permits a user to specify a first portion of information for a maintenance period (MP) definition. User interface 90300 includes header area 90310, footer area 90320, and main display area 90330. Header area 90310 includes descriptive window information and an interactive “X” cancel action button to terminate the interface. Footer area 90320 includes interactive “Cancel” action button 90322 to terminate the interface without saving any user specified data, interactive “Back” action button 90324 (shown deactivated) to navigate to a prior user interface display, interactive “Next” action button 90326 to navigate to a subsequent user interface display for entering a different portion of MP definition information, and interactive “Finish” action button 90328 to signal completion and acceptance of the information entered for a new MP definition and to signal the desire to save/store the new MP definition in the control data of the SMS.


Main display area 90330 of interface 90300 includes interactive elements allowing the user to specify a certain portion of data for creating a new MP definition. Text box 90332 enables a user to specify a title for the new MP definition. Text boxes 90334 and 90336 enable a user to specify a date and a time, respectively, for the start time of a new MP definition. Drop-down selector box 90338 enables a user to select from a predefined set of durations for maintenance periods. Text boxes 90340 and 90342 enable a user to specify a date and a time, respectively, for the end time of the new MP definition. In one embodiment, a user interaction with interface 90300 to specify the start time and a duration results in a calculation of an end time from those values that is automatically populated into text boxes 90340 and 90342. Selection buttons 90344a and 90344b enable the user to specify classes or types maintenance objects that can be selected for inclusion in the MP definition. The available options shown for this embodiment include entities and services. In one embodiment, a maintenance period may include only one type of maintenance object and interacting with either of buttons 90344a and 90344b will inactivate the other. In one embodiment, a maintenance period may include multiple types of maintenance objects, still buttons 90344a and 90344b remain mutually exclusive with the last activated button determining the type of maintenance objects that will be listed on a subsequent interface for user selection. In one embodiment, a maintenance period may include multiple types of maintenance objects and multiple buttons may be activated simultaneously. An embodiment with more or fewer maintenance object types may have more or fewer object buttons than 90344a-b if implementing an interface similar in fashion to 90300. Once an object button is activated, Next action button 90326 may be activated, for example by a mouse click, to navigate toward a companion user interface that permits the selection of the specific maintenance objects of the selected type(s) to be included in the MP definition.


FIG. 34ZC4 illustrates a user interface that permits the selection of the specific maintenance objects to be included in a newly created MP definition, thereby supplying a second portion of MP definition information. Interface 90400 includes header area 90410 and footer area 90420 that parallels the format and content of the header and footer areas (90310 and 90320, respectively) of interface 90300 of FIG. 34ZC3. Main display area 90430 of FIG. 34ZC4 includes a list of potential maintenance objects from which the maintenance objects for the new MP definition can be selected. Main display area 90430 includes a list display options drop-down component 90432, showing “10 per page” as the default value for the drop-down or as the last value chosen by the user as the result of interaction with 90432. Selection list column heading display area 90434 is shown to include a checkbox to select all list entries, in column 90442, and the column name “Title” in column 90444. The selection list shows the five entries 90436a-e. Entries 90436a, 90436c, and 90436e are shown as selected for use as maintenance objects by virtue of the activated checkboxes appearing in column 90442 for each of those entries. In one embodiment, a user can proceed to select additional maintenance objects of additional types by interacting with Back button 90424 to navigate to interface 90300 of FIG. 34ZC3, there activating a different one of the object buttons (90344a-b), clicking the Next button 90326, to proceed once more to interface 90400 of FIG. 34ZC4 which at this point is populated with a list of potential maintenance objects of the most recently selected object type. When complete, correct, and adequate information for a MP definition has been specified using interfaces 90300 and 90400 of FIGS. 34ZC3 and 34ZC4, respectively, the user can activate a Finish button to save/store the new MP definition in SMS control storage. Processing may then proceed resulting in the display of a user interface presenting a Maintenance Period Definition Detail display.


FIG. 34ZC5 illustrates a maintenance period definition detail user interface in one embodiment. Interface 90500 enables a user to view and initiate editing of a maintenance period (MP) definition in the control storage of the SMS. Main display area 90514 of interface 90500 includes MP definition title display 90520, navigation element 90522, Edit action button 90526, End Now action button 90528, definition information area 90524, and maintenance object tabbed interface area 90530. Interaction with navigation element 90522 may result in processing to cause the display of a different user interface, such as 90200 of FIG. 34ZC2. User interaction with Edit action button 90526 of FIG. 34ZC5 invokes processing to permit editing of the displayed MP definition. In one embodiment, an interaction with Edit button 90526 will alter the presentation of the existing user interface 90500 to enable the editing of definition field information. In one embodiment, an interaction with Edit button 90526 will result in the display of a different user interface (or different display component of the same user interface) that enables editing. Such a different user interface may parallel the MP definition creation user interfaces discussed in relation to FIGS. 34ZC3 and 34ZC4. User interaction with the End Now button 90528 of interface 90500 of FIG. 34ZC5, terminates the interface, possibly navigating to home screen for the application or to an interface such as 90200 of FIG. 34ZC2 which may serve as a home screen for the maintenance period (MP) aspect of the SMS control and command console functionality. Definition information area 90524 of FIG. 34ZC5 may include information from the MP definition itself, such as a start time and a stop time, and information about the defined MP, such as its state. In one embodiment, the state of an MP may be indicated as Active meaning the current time is within the timeframe specified in the MP definition, as Inactive meaning that the current time is past the end time of the MP definition, or as Pending meaning that the current time is before the start time of the MP definition.


Maintenance object tabbed interface area 90530 is shown to include two tab controls: Affected Entities 90532 and Affected Services 90534. Affected Entities tab control 90532 is shown to be active so that the display area of tabbed interface area 90530 below the tab controls displays information about the affected entities of the MP definition. Affected entities can appear in the list in one embodiment by virtue of being defined as maintenance objects for the MP, or by virtue of having been automatically determined to be affected by the MP based on relationships known to the SMS. For example, an entity may be designated as affected when the only service it provides is a defined maintenance object of the MP. Different embodiments may employ different schemes for identifying “affected” objects or may rely entirely on the MP definition to include its “affected” objects as maintenance objects. Varying embodiments are possible. The detail portion of tabbed interface area 90530 is shown to include column heading row 90566 and two affected entity detail rows/entries 90568a-b. Two columns appear: Entity column 90562 and Service column 90564. Interaction with tab control 90534 causes the tabbed interface detail information area to transition to a display of information associated with the Affected Services tab control 90534.


FIG. 34ZC6 illustrates the interface of FIG. 34ZC5 modified by the selection of an alternate tab control. Interface 90580 replicates interface 90500 of FIG. 34ZC5 except for the detail content of the tabbed interface display area 90530 of FIG. 34ZC6. There, a column header row 95084 is display showing the column heading, “Service”, for the single column 90582. There, also, the single row/entry 90586 of the list of Affected Services identifies the affected service as “DB Service”. Again, embodiments may vary as to the criteria and analysis used to determine services that qualify as “affected” services.


FIG. 34ZC7 illustrates examples of different content as may be useful to populate a tab display area such as tabbed interface display area 90530 of FIG. 34ZC6. The examples shown do not limit the possibilities but rather illustrate that many implementations are possible without departing from inventive aspects taught herein, just as explaining inventive embodiments with a web-based or windows paradigm-based graphical user interface that includes a tabbed interface display area does not restrict the practice of inventive aspects to those embodiments, which have been chosen to rapidly convey to the reader an appreciation for inventive teachings.


Example (a) of FIG. 34ZC7 illustrates content for a tabbed interface display area as may usefully appear in an SMS output display such as shown in interface 90580 of FIG. 34ZC6, and more particularly in a tabbed interface display area of such an interface (e.g. 90530). Example (a) of FIG. 34ZC7 in an embodiment may provide a list of configured services. The example illustrates a tab control element 90710, “Configured Services”, that when selected by a user, such as by a mouse click or tap on a touchscreen, displays a list of services that have been defined as maintenance objects for the instant maintenance period. Example (a) illustrates such a list with a single column 90716, a column heading row 90712 displaying “Service” as the single column heading, and a list item row 90714 displaying “Jira Service” as the service name for that list item. Example (a) may be utilized, for example, in an embodiment where services, such as services defined in the SMS, may be included in a maintenance period (MP) definition as a maintenance object (MO). In one embodiment, the single service name column 90716 may be supplemented with additional columns that may contain static information from the SMS definition for the service and/or dynamic information about the service determined by SMS operation, or other information, for example. Inasmuch as it may be useful to populate a tabbed interface display area with a list of objects, e.g., SMS objects, that may be associated with an MP definition, e.g., MO's of the MP definition, and may be of a particular type, the concept of example (a) can be extended beyond the services. Example (b) of FIG. 34ZC7 may be utilized, for example, in an embodiment where entities, such as entities defined in the SMS, may be included in a maintenance period (MP) definition as a maintenance object (MO). Example (b) may provide a list of configured entities. The example illustrates a tab control element 90720, “Configured Entities”, that when selected by a user, such as by a mouse click or tap on a touchscreen, displays a list of entities that have been defined as maintenance objects for the instant maintenance period. Example (b) illustrates such a list with a single column 90726, a column heading row 90722 displaying “Entity” as the single column heading, and a list item row 90724 displaying “Wayne Manor” as the entity name for that list item. In one embodiment, the single entity name column 90726 may be supplemented with additional columns that may contain static information from the SMS definition for the entity and/or dynamic information about the service determined by SMS operation, or other information, for example.


Example (c) of FIG. 34ZC7 illustrates content for a tabbed interface display area as may usefully appear in an SMS output display such as shown in interface 90580 of FIG. 34ZC6, and more particularly in a tabbed interface display area of such an interface (e.g. 90530). Example (c) of FIG. 34ZC7 in an embodiment may provide a list of KPI's that are impacted by the maintenance period. In one embodiment, a KPI may be determined to be impacted by a maintenance period where a service measured or characterized by the KPI is defined as a maintenance object (MO) for the maintenance period (MP). In one embodiment, a KPI may be determined to be impacted by a maintenance period where an entity defined as an MO for the maintenance period has data from or about it utilized in the determination of the KPI. In one embodiment, a KPI may be determined to be impacted by a maintenance period only if the entities having data from or about them utilized in the determination of the KPI are all MO's of the maintenance period. These and other embodiments are possible.


In one or more embodiments, a KPI may be assigned one or more of a number of levels or categories of impact based on criteria that may vary widely among those embodiments. In one such embodiment, for example, a KPI may be assigned a “possible” level of impact where the service to which the KPI corresponds is defined as an MO of the maintenance period; and a KPI may be assigned an “expected” level of impact where the KPI is an aggregate KPI that may represent the overall health of the service to which it corresponds and which service is defined as an MO for the maintenance period. In one such embodiment, for example, a KPI may be assigned a “partial” level of impact where at least one but less than all of the entities which have data from or about them utilized in the determination of the KPI are defined as MO's of the maintenance period; and the KPI may be assigned a “full” level of impact where all of the entities which have data from or about them utilized in the determination of the KPI are defined as MO's of the maintenance period; and the KPI may be assigned a “low” level of impact where 20% or less of the entities which have data from or about them utilized in the determination of the KPI are defined as MO's of the maintenance period. These and other embodiments are possible.


Example (c) of FIG. 34ZC7 illustrates a tab control element 90730, “Impacted KPIs”, that when selected by a user, such as by a mouse click or tap on a touchscreen, displays a list of KPI's that have been determined to be impacted by, or experience some level of impact by, the maintenance periods. Example (c) illustrates such a list with a “KPI” name column 90736, a “Service” name column 90737, and “Impacted” level or indicator column 90738, and an “Entities” name column 90739. The list of example (c) is shown to have a column heading row 90732 appropriately displaying “KPI”, “Service”, “Impacted”, and “Entities” as the headings for columns 90736-90739, respectively. The list of example (c) is further shown to have multiple list item rows including list item row 90734 displaying the KPI name “Error Count”, the service name “Gotham”, the impact indicator/level “Partially”, and the entity name “Wayne Manor” in columns 90736-90739, respectively. Example (c) may be utilized, for example, in an embodiment where maintenance period processing of the SMS identifies SMS objects (e.g., KPI's) that are related directly or indirectly to MO's of an MP definition (e.g., services, entities) to list, characterize (such as by determining an impact level or classification), or otherwise utilize them.


It is noted here that the impact levels or categories discussed in relation to example (c) of FIG. 34ZC7 may, in an embodiment, reflect a degree of “taint” of a measurement or data point in the SMS, which may be to say, a degree, level, category, or classification characterizing the actual, predicted, estimated, or otherwise determined impact of a maintenance period on the measurement or data point, and/or perhaps to say a degree, level, category, or classification characterizing the actual, predicted, estimated, or otherwise determined. In one embodiment, such impact levels, or degrees of taint, may be included within a scheme for the maintenance state. In one embodiment, impact level data is one attribute of the maintenance state. In one embodiment, an impact level is the maintenance state. These and other embodiments are possible. In an embodiment with such a more robust indication of the maintenance state that is transiently or persistently associated with measurements/data points in the SMS, adaptations of SMS processing to accommodate and account for maintenance periods can also be more robust and refined because processing can be conditioned on the additional information. For example, an adjustment to the criticality of a measurement produced using data from a maintenance period can be varied in accordance with an impact level associated with the data.


While the preceding user interfaces and interface components/elements addressed control and command console aspects of an SMS for implementing maintenance periods, the user interface elements described and discussed in the next figure relate principally to output/reporting/presentation aspects of an SMS adapted to implement support for defined maintenance periods.


FIG. 34ZC8 illustrates examples of user interface elements for implementing output presentation related to maintenance periods in an embodiment as contemplated by the processing associated with block 90150 of FIG. 34ZC1. User interface component 90610 of FIG. 34ZC8 (example (a)) illustrates a maintenance state alert message component window of one embodiment. The window 90610 is shown to include maintenance state alert message text 90612 and an interactive “X” button 90614 to clear, close, terminate, cancel, or otherwise exit the window. In one embodiment window 90610 is a modal window that demands the attention of the user by preventing interaction with any other user interface component until the alert message window has been cleared. In one SMS embodiment, the maintenance state alert message is uniformly presented across a variety of interfaces when an instance of any of the interfaces is first presenting information that is or is derived from measurements/data points that may be associated with a maintenance state. In one embodiment, the text 90612 of the alert message is fixed. In one embodiment, the text 90612 of the alert message is specialized to the interface. In one embodiment, the text 90612 of the alert message is specialized to the relevant data presented by the interface. These and other embodiments are possible.


Display tile 90620 (example (b)) is indicative of a display tile as might appear on a service monitoring interface such as illustrated and described in relation to FIGS. 49C-49F. Tile 90620 illustrates the integration of, or adaptation for, maintenance period functionality with foundational visualizations of an SMS. In an embodiment of a service monitoring interface using colored tiles, an adaptation may be made to assign an otherwise unused tile color for use with tiles representing data that includes or is derived from maintenance state measurements/data points. A visual attribute, a color, is associated with the maintenance state and substitutes as the background tile color 90624 when a maintenance state is implicated. Accordingly, the visual tile representation is adapted by the change in its visual attribute of background color. In one embodiment, a visual tile may be alternatively or additionally adapted to indicate maintenance state by the addition of a visual element, such as icon 90622. A maintenance state-indicative visual element (e.g., icon) may overlap, wholly or partially, an interface element representing suspect data, or be placed in proximity thereto such that an observer would likely perceive the two to be associated. Overlap by the visual element may include visual integrations as may be achieved using transparency, dissolves, mixing, or other effects, while remaining an overlap.


Service topology map display 90630 (example (c)) presents another example of a display adaptation to express maintenance state. Topology navigator maps are illustrated and described in relation to FIGS. 75C-75D. Here again, a visual element, such as an icon 90632, is added to a foundational display to indicate that underlying data being represented is suspect because it may have been impacted by a maintenance period. In an embodiment, as with tile 90620, a visual attribute for an affected node in the map, such as color, may be assigned a maintenance state value to adapt the display to indicate maintenance state information. These and other embodiments are possible.


Automated Event Groups


One of skill understands service monitoring system (SMS) embodiments using inventive aspects disclosed herein can provide effective service monitoring, in an IT environment for example, by deriving meaningful performance indicators from a huge volume of machine data in raw and disparate representations. The derived performance indicators, themselves, may be further utilized to derive additional information related to system performance, such as notable event records produced by correlation searches. Despite such distillation, in a large or busy environment, the detail information may still overwhelm even an efficient, inquiring user. This may be particularly true when a system is stressed or experiencing problems. A problem in one area of a system can affect other areas of the system and take some time to correct. This can lead to repetitive and multiple alerts from multiple system areas while a problem is being identified and resolved, say, when a failed router is being identified and rebooted. Apart from the possible inefficiency of user time to address the alerts or notable events on an individual basis, there is inefficiency imposed on the computing platform. Inventive aspects now discussed relate to automatic event grouping for consolidated display and for consolidated processing (by predefined actions in response to predefined conditions that are automatically detected) of related events joined together into an event group. These consolidation aspects improve the computing efficiency of the service monitoring system (SMS) by reducing bandwidth demands for human-machine interfaces, reducing resource demands such as processing overhead for maintaining interactive sessions even during substantial idle time, and by improving processing characteristics of the workload (by maximizing cache hits, for example, by the automated, rapid fire repetition of processing against multiple events).


FIG. 34ZD1 is a system diagram with methods for implementing automated event group processing in one embodiment. System 90800 is shown to include computer data block 90810, processing block 90820, event group action process 90840, user interface device 90802, event data store 90852, service monitoring system (SMS) 90854, and KPI data store 90856. Computer data block 90810 is a figurative depiction of logical collections, constructs, organizations, or the like, that may be diversely and variously represented in computer storage media, mechanisms, and devices. Computer data block 90810 shown to include command/control/configuration data 90812, notable events data 90814, and event groups data 90816.


Processing block 90820 is shown to include a number of processing blocks that may each be described in terms of a logical portion of processing performed by an SMS in an embodiment to effectuate automated event groups. One of skill understands that the processing described for a specific block may be variously implemented (e.g., by a hardware-only, hardware-software solution, single host, distributed system, or other solution, as indicated passim) and may utilize the computer processing capabilities of multiple components (e.g., an SMS (such as 90854), a data input and query system such as an event processing system (EPS), or a host operating system).


For the embodiment described in relation to FIG. 34ZD1 the processing of blocks 90822 and 90824 effect the creation of an event group policy definition in command/control/configuration (CCC) data store 90812. CCC data store 90812 in an embodiment may be used to contain information that directs operational aspects of an SMS such as 90854. At block 90822, user input is acquired that is used to create a new or revised stored representation of an event group policy definition. The event group policy definition includes information used by the SMS to direct its operations in relation to identifying events as members of an event group, recognizing conditions relevant to the group, and performing actions against the members of the group (individually or collectively), for example. User input acquired at block 90822 may indicate much or all of the content of an event group policy definition. Such indications may variously be direct, such as by the user entering a literal value into a text entry box of an interface, or indirect, such as by the user signaling assent to one or more default options. In one embodiment, user input is acquired at block 90822 through the display and processing of user interaction with a graphical user interface (GUI), perhaps utilizing user interface device 90802, for example. In one embodiment, user input is acquired at block 90822 through the use of a IVR system that audibly prompts the user for required information and processes the user's voice responses. These and other embodiments capitalizing on technologies the engage various modes and methodologies of human-machine interaction are possible. To facilitate an understanding of inventive aspects, and embodiment utilizing GUI technology is discussed later in relation to FIGS. 34ZD2 through 34ZD8.


At block 90824, user input acquired at block 90822 is factored into the creation of a stored representation of an event group policy definition. Event Group Policy 1 (EGP1) 90813 is depicted as an example of such a stored representation. In one embodiment, an event group policy such as EGP1 may include (i) information for identifying the events that are members of the group, such as information about field contents to be matched and constraints on group size and/or timeframes, etc.; (ii) information for determining or recognizing the existence of a condition that should result in the processing of a predefined action against the group/members; (iii) information specifying a predefined action; and (iv) information about the group (as a collective representation of its member events) such as a title, description, or other ascribed attribute, property, or characteristic. One of skill appreciates that an event group policy definition, such as EGP190813, is illustrated and described as a unitary logical construct that may or may not reflect one or more layers of underlying logical and/or physical representations used to implement it in a particular embodiment. In one embodiment, CCC data set 90812 that includes event group policy definitions such as EGP190813 may exclusively hold event group policy definitions and possibly related data. In one embodiment, CCC data set 90812 exclusively holds event group policy definitions as an identifiable subset of a larger CCC data set/collection broadly containing command/control/configuration data for an SMS such as 90854. In one embodiment, CCC data set 90812 includes event group policy definition information, such as EGP190813, unsegregated from other types of CCC data used to control the operation of the SMS. These and other embodiments are possible. For purposes of the present discussion, CCC data set 90812 may be considered to be the broad collection of CCC data as may be used by an SMS embodiment. One of skill may also gain an appreciation of event group policy definitions as CCC data by consideration of other representative examples of CCC-type data described elsewhere in this detailed description, such as service definitions (see, e.g., FIGS. 4, 17B, 17C, and the related discussions), entity definitions (see, e.g., FIGS. 4, 10B, 10C, 17C, and the related discussions), and KPI definitions (see, e.g., FIG. 4, and the related discussions), for example.


At block 90826, processing actions that create notable events, such as correlation searches, are performed. In one embodiment, the processing of block 90826 may be part of the ongoing operation of a service monitoring system (SMS) largely irrespective of, and possibly without any special accommodation for, automated event groups. Processing of such an embodiment described here is given to illustrate an operating context. In such an embodiment, the SMS 90854 accesses machine data of an machine data event data store 90852, possibly in conjunction with an event processing system, to produce a collection of KPI values for one or more key performance indicators (KPI) reflecting the performance of one or more services as may be provided by an IT environment, for example. Processing of block 90826, such as the execution of a correlation search, may utilize data such as the KPI data of 90856 to monitor and assess system performance and to create records of notable events related thereto. Data store 90814 represents a collection of such notable events. Operations of SMS 90854, including the processing of block 90826, in one embodiment is controlled or directed, at least in part, by the command/control/configuration information of a CCC data store such as 90812. Such operations may be automatic and ongoing without any substantial user interaction as indicated by the circular arrow appearing at the corner of block 90826. Many figures, details, and descriptive matter regarding event data stores, service monitoring systems, key performance indicators, notable events, and correlation searches is easily found elsewhere in this detailed description and will not be repeated here.


In one embodiment, the notable events of 90814 are all produced as the result of correlation searches against KPI data as described. In one embodiment, correlation searches that produce the notable events of 90814 may use KPI data and non-KPI data, alone or together, to produce the notable events. In one embodiment, the processing of block 90826 may produce notable events without a correlation search of KPI data, for example, by searching out notable events already represented in the machine data of machine data event store 90852 and inserting them, possibly with little to no modification, as notable events of 90814. These and other embodiments are possible.


In one embodiment, machine event data store 90852 is a field-searchable event datastore as richly described in detail elsewhere in this written description (see, for example, FIG. 76, et seq, and the related discussions). In one embodiment, event datastore 90814 may be such a field-searchable event datastore. In one embodiment, machine event data store 90852 and notable event datastore 90814 are each a different field-searchable event datastore. In one embodiment, machine event data store 90852 and notable event datastore 90814 are part of one same, common event field-searchable event datastore system. These and other embodiments are possible.


The processing of block 90828 as now described begins from a context where one or more event group policies, such as EGP190813, are created, active, and enabled, and SMS processing has been, and may be, producing a collection of notable events, such as 90814. The processing of block 90828, in one embodiment, uses group membership criteria information of an event group policy to identify matching notable events and to make a record of the existence and membership of event groups. In one embodiment, the existence of a group is reflected in computer storage as an event group description instance, for example Group1 instance 90817 of event groups data 90816. In one embodiment, at least one event group description instance exists for each event group policy defined in CCC data store 90812, even where the group is empty (i.e., has no member events). In one embodiment, an event group description instance is created in 90816 when a first member event is identified for the group by the processing of 90828. In one embodiment, all event group description instances that are created are retained as a historical record, possibly subject to deletion in accordance with the retention policy. In one embodiment, an event group description instance is deleted in response to a termination or deactivation event for the group or all of its member events. These and other embodiments are possible. An event group description instance, such as 90817, provides a representation of the group as a singular, identifiable object, and a collective representation for the multiple, individual events that may belong to the group. In its capacity to provide a collective representation, an event group description instance may contain little, much, or all of the information or categories of information that commonly describe or apply to each of the individual member events. Embodiments may vary as to the distribution and replication of information common to the group/members. The collective representation provided by an event group description instance may stand in contrast to the individual representation provided by a stored representation of individual notable events in 90814.


In one embodiment, the processing of 90828 identifies notable events for membership in an event group according to an event group policy by performing a search of notable events 90814 using criteria determined, at least in part, by information in an event group policy. In one embodiment, the notable events of 90814 are maintained in an event index of an event processing system (EPS) and the SMS may utilize the search capabilities of the underlying EPS to perform the search of 90828 to identify member events. In one embodiment, the notable events of 90814 are maintained by the SMS apart from an EPS. In one embodiment, the processing of 90828 evaluates each notable event as it is created by 90826 and ascertains any group membership at that time. In one embodiment, the new membership of a notable event in a notable event group is reflected in a stored representation of each such notable event as in, for example, notable event store 90814. In one embodiment, the membership of a notable event in a notable event group is logically organized together with other group information in an event group description instance such as 90817. These and other embodiments are possible.


The processing of block 90828 in the presently described embodiment is repeatedly performed as part of the ongoing operation of an SMS, such as 90854. The repetitive, automatic, ongoing operation is signified by the circular arrow appearing at the corner of block 90828. In one embodiment, the processing of block 90828 recurs on the basis of a regular or irregular frequency or schedule. In one embodiment, the processing of block 90828 recurs on the basis of event record volumes. In one embodiment, the processing of block 90828 recurs on a frequency or schedule that may be individually determined for each event group policy. In one embodiment, the processing of block 90828 recurs on a frequency or schedule applied universally to all event group policies. These and other embodiments are possible.


At block 90830, criteria that define a precondition to a defined action are evaluated against the current state of relevant data and/or system operation and context. The precondition criteria are reflected in the information of an event group policy definition such as EGP190813. In one embodiment, the processing of 90830 evaluates precondition factors by executing a search using search criteria reflecting the precondition information of an event group policy definition. In one embodiment, the search may be performed by an event processing system (EPS) utilized for the SMS. In one embodiment, the search may be performed independent of any EPS. In one embodiment, a mechanism other than a search is used to evaluate the satisfaction of the precondition. These and other embodiments are possible.


The processing of block 90830 in the presently described embodiment is repeatedly performed as part of the ongoing operation of an SMS, such as 90854. The repetitive, automatic, ongoing operation is signified by the circular arrow appearing at the corner of block 90830, and may be repeated in various ways such as the examples discussed in relation to block 90828.


At block 90832, a determination is made whether the evaluation performed at block 90830 indicates that the criteria that define a precondition to a defined action are satisfied. If not, the evaluation of block 90830 is eventually repeated. If so, processing proceeds to block 90834.


At block 90834, one or more event group actions corresponding to the satisfied precondition of the event group policy are initiated, performed, or otherwise caused to be performed. In one embodiment, an event group action may update or otherwise modify (including deleting) the stored representation of each notable event in the group such as may be stored as notable event data 90814. For example, the event group action may set the value of a status, state, priority, criticality, or severity field or indicator as may be maintained for the representation of a notable event. In one embodiment, an event group action may update or otherwise modify (including deleting) the relevant event group description instance, such as 90817 of event groups data 90816. For example, an event group action may modify the event group description instance to indicate that the event group is expired, suspended, or retired. In one embodiment, an event group action may update or otherwise modify the stored representation of each notable event in the group as well as the relevant event group description instance. In one embodiment, an event group action may perform processing unrelated to updating or otherwise modifying data representing the event group and/or its individual member events.


Information in a relevant event group policy, such as EGP190813 of CCC data store 90812, informs the initiation of the event group action at block 90834. In one embodiment, initiation of the event group action may include initiation of an independent process, such as event group action process 90840, to perform the event group action. In one embodiment, the event group action may be initiated and performed under the auspices of the processing of block 90834. These and other embodiments are possible.


At block 90836, command console functions and reporting functions for the SMS are performed that utilize, for example, data of 90810 related to or affected by event group processing of 90820 already described. In one embodiment, the processing of 90836 may perform a command console function of the SMS enabling a user to manage the event group policies used to direct system operation, including displaying a list of the policies of 90812 via a user interface device such as 90802, for example. In one embodiment, the processing of 90836 may perform a system monitoring reporting function enabling a user to display notable event information that effectively subsumes multiple entries for multiple notable events into a single entry for the event group. These and other embodiments are possible.


Block diagram 90800, illustrating certain system components and processing of one embodiment to effect automated event group processing in a service monitoring system (SMS) in order to improve the computer resource utilization profile of the SMS, was discussed in terms of many details in order to illuminate inventive aspects. One of skill will appreciate that details discussed are not intended to limit the practice of inventive aspects. One of skill will further appreciate inventive aspects of automated event group processing by consideration of user interfaces illustrated and described in relation to the figures that now follow. The illustrated user interfaces, components, and portions are such as might be used in an embodiment during various aspects of the processing of 90820 of FIG. 34ZD1. In similar fashion to the foregoing, the user interface matter next discussed provides illustrative matter to aid an understanding of the automatic event group processing taught herein, and the details are intended to illuminate rather than limit the inventive aspects.


FIGS. 34ZD2 through 34ZD7 illustrate user interface display aspects as may be useful in the creation or maintenance of an event group policy definition (such as EGP190813 of FIG. 34ZD1) of the command/control/configuration data (such as 90812 of FIG. 34ZD1) of a service monitoring system. Such user interface components may be useful in the processing described earlier in relation to blocks 90822 and 90824 of FIG. 34ZD1, for example.


FIG. 34ZD2 depicts a user interface related to group membership criteria for an event group in one embodiment. Such an interface may be used in an embodiment to prompt for and acquire user input indicating the desired content for an event group policy being created (or edited), and particularly information of the event group policy definition related to identifying events as members of the event group. The user interface display 90900 is shown to include system title bar area 90902, application menu/navigation bar area 90904, workflow process information and navigation area 90910, workflow section 90920, and vertical scrollbar 90908. Workflow process information and navigation area 90910 is shown to include the title, “Create New Policy,” workflow process status bar 90912, workflow progress status indicator 90914, Previous navigation button 90916, and Next navigation button 90918. Workflow section 90920 is shown to include workflow section title and information area 90921, event content criteria section 90930, multi-group criteria section 90940, group limiting criteria section 90950, and group information section 90960. (It is noted that sections 90921, 90930, 90940, 90950, and 90960 may properly be referred to as sections or as subsections, each being a subsection of workflow section 90920 in the illustrated embodiment.)


System title bar 90902 of FIG. 34ZD2 is comparable to system title bar 27102 of FIG. 27A2 discussed in detail elsewhere. Application menu/navigation bar 90904 is comparable to application menu/navigation bar 27104 of FIG. 27A2 discussed in detail elsewhere.


Workflow section title and information area 90921 of workflow section 90920 is depicted to show “Filtering Criteria” as the section title, and a description of “Create filtering criterion to group notable events.” The reference to filter or filtering criteria is a reference to one or more criteria that may be used in an embodiment to select or identify, or to control the selection or identification, of events such as notable events that qualify for, or are, affirmatively included in, or designated for inclusion in, the membership of an event group.


Event content criteria section 90930 of the illustrated embodiment, depicts a graphical user interface (GUI) portion or component that may be used to prompt and enable user input to indicate filter criteria useful to include or exclude notable events in or from the group based on particular data that each may contain. Event content criteria section 90930 is shown to include section header 90931 which may enable user interaction to selectively collapse or expand the presentation of the section 90930. Event content criteria section 90930 is shown to further include interactive action element 90932 enabling a user to indicate the desire to specify an additional event content criterion. In the presently described embodiment, this and other criteria may be presented and/or specified using a rule paradigm, though other paradigms are possible. A user interaction with action element 90932 of the present embodiment, such as by a mouse click or finger press to a touchscreen, results in a modification to the presentation of user interface 90900—perhaps by the expansion of section 90930, or perhaps by the overlay of a pop-up display component (e.g., modal window), or perhaps by another process/mechanism—to enable the user to indicate additional event content-based criteria for the event group policy being defined. In one embodiment, user interaction with action element 90932 results in the expansion of section 90930 so as to present a display as illustrated, in pertinent part, by FIG. 34ZD3.


FIG. 34ZD3 depicts user interface matter related to group membership criteria for an event group in one embodiment, and particularly to specifying event content criteria. GUI display portion or component 91000 is shown to include section header 90931, conjunctive rules section 91020, and action element 91030. Conjunctive rules section 91020 is shown to include rule entry 91022 and interactive action element 91024. Rule entry 91022 is shown to include field identifier component 91032, comparison operation component 91034, and comparand component 91036, reflecting the structure for individual rules used to specify event content criteria for an event group policy definition. Field identifier component 91032 is shown as a text box containing the user prompt “field name.” Field identifier component 91032 is interactive enabling a user to edit or modify text indicating the name of the field in a notable event, the value of which is evaluated against a comparand as one criterion for determining the membership of the event in an event group created in accordance with the event group policy definition being defined. Comparison operation component 91034 is shown as a drop-down selection box containing the default or most-recently-selected comparison operation value of “matches.” User interaction with comparison operation component 91034 may result in the appearance of a drop-down selection list (not shown) of comparison operations from which a user may make a selection and may include options such as “matches”, “sounds like”, “does not match”, “is greater than”, “is less than”, and others. Comparand component 91036 is shown as an empty text box. Comparand component 91036 is interactive enabling a user to edit or modify text indicating a value or pattern to be compared in the way specified in 91034 with the value of a field identified in 91032. Here, as elsewhere, embodiments may vary and the text box of 91306 may be implemented as any number of appropriate user interface components including, for example, a selection list, perhaps with support for the concurrent selection of multiple list items. Multiple rule entries such as 91022 may be created as event content criteria of an event group policy definition. Interaction with action element 91024 in an embodiment may present the user with an additional rule entry like 91022 to specify a rule to be evaluated in some way conjunctively with the rule of 91022. Interaction with action element 91030 in an embodiment may present the user with an additional rule entry like 91022 to specify a rule to be evaluated in some way disjunctively with the one or more rules of 91020. Embodiments may vary in their implementation and support for compound rules and logic including conjunctive and disjunctive processing.


Multi-group criteria section 90940 of interface 90900 of FIG. 34ZD2 depicts a graphical user interface (GUI) portion or component that may be used to prompt and enable user input to indicate filter criteria (group splitting criteria) useful to separate into multiple event groups, events that otherwise satisfy a common set of group membership criteria, such as the event content criteria as may be specified using section 09030 of interface display 90900. Multi-group criteria section 90940 is shown to include section header 90941 which may enable user interaction to selectively collapse or expand the presentation of the section 90940. Multi-group criteria section 90940 is shown to further include the descriptive text “Split events into multiple groups by” and text box 90942 containing the user prompt “field name.” Text box 90942 is interactive enabling a user to edit or modify text indicating the name of a field in the event data. If specified as part of an event group policy definition, the multi-group or split-by field will segregate events satisfying other group membership criteria into distinct groups on the basis of the value the event has in its split-by field. Such processing may be performed, for example, by the processing of block 90828 of FIG. 34ZD1 and may result in the creation of an event group description instance such as 90817 of FIG. 34ZD1 for each different value occurring in the split-by fields of events satisfying the other membership criteria of the event group policy definition. An embodiment implementing a multi-group or split-by aspect to event group policy definitions can improve processing and storage efficiency by allowing the reuse or multiplexing of a single policy definition in substantial part for multiple groups as opposed to requiring the creation, maintenance, and processing of multiple event group policy definitions, one for each prospective value of the split-by field.


Group limiting criteria section 90950 of FIG. 34ZD2, depicts a graphical user interface (GUI) portion or component that may be used to prompt and enable user input to indicate group limiting criteria useful to include or exclude notable events in or from the group based, for example, on characteristics pertaining to the group or events that may be external to the group events. The group limiting or breaking criteria addressed by 90950 may specify criteria for terminating or closing group membership in an embodiment, i.e., preventing the addition of new members. Group limiting criteria section 90950 is shown to include section header 90951 which may enable user interaction to selectively collapse or expand the presentation of the section 90950. Group limiting criteria section 90950 is shown to further include interactive action element 90952 enabling a user to indicate the desire to specify an additional group limiting criterion. A user interaction with action element 90952 of the present embodiment, such as by a mouse click or finger press to a touchscreen, results in a modification to the presentation of user interface 90900—perhaps by the expansion of section 90950, or perhaps by the overlay of a pop-up display component (e.g., modal window), or perhaps by another process/mechanism —to enable the user to indicate additional group limiting criteria for the event group policy being defined. In one embodiment, user interaction with action element 90952 results in the expansion of section 90950 so as to present a display as illustrated, in pertinent part, by FIG. 34ZD4.


FIG. 34ZD4 depicts user interface matter related to group membership termination criteria for an event group in one embodiment, and particularly to specifying group limiting criteria. Group limiting criteria may be used in an embodiment of to freeze the membership of the group and/or to prevent the addition of any more events to the group once the criteria is satisfied, for example. Such processing may be performed, for example, by the processing of block 90828 of FIG. 34ZD1 and may result in update of an event group description instance such as 90817 of FIG. 34ZD1 to indicate its membership is closed. Events excluded on the basis of a group limiting criteria may be included in a new group dynamically created in accordance with the same event group policy definition and for which a new event group description instance is created.


GUI display portion or component 91100 of FIG. 34ZD4 is shown to include section header 90951, group limiting type selection component 91110, group limiting type selection list 91120, and group limiting parameter component 91112. Group limiting type selection component 91110 is shown as a drop-down selection box containing the default or most-recently-selected value of “the following event occurs.” User interaction with group limiting type selection component 91110 may result in the appearance of a drop-down selection list of group limiting types 91120 from which a user may make a selection, and may include options such as “the following event occurs” 91122, “this group existed for” 91124, and “the number of events in this group” 91126, for example. A selection of option 91122 indicates a user desire for an event-occurrence group limiting type and the user may enter identifying information for a group-limiting event using interface component 91112. A selection of option 91124 indicates a user desire for a time-duration group limiting type and the user may enter information indicating a time duration for this group timespan limit using interface component 91112. A selection of option 91126 indicates a user desire for an group-size group limiting type and the user may enter information indicating a member count or other size measure for this group size limit using interface component 91112.


In one embodiment, GUI display portion or component 91100 of FIG. 34ZD4 expands the drop-down selection list of group limiting types 91120 to present the additional option “The flow of events into the group has paused for” (not shown), the selection of which indicates a user desire for an idle-time group limiting type and the user may enter information indicating a time duration for this group idle-time limit using interface component 91112. The idle time duration indicates the maximum amount of time that is allowed to elapse without the addition of a new member to the group. Such a group limiting type may be useful, for example, where the event group relates to a particular error condition and the cessation of new alerts likely means the error condition no longer obtains, a useful predicate for causing an action to dispose of the event group in some way. Clearly, embodiments may vary greatly as to the use and variety of group-limiting criteria and types.


Group information section 90960 of FIG. 34ZD2 of the illustrated embodiment, rather than specifying filter criteria, relates to descriptive or metadata information as may be associated with an event group created by an application of the filter criteria such as already discussed (and, by association, its member events). Group information section 90960 is shown to include section header 90961 which may enable user interaction to selectively collapse or expand the presentation of the section 90960. A fuller depiction of the contents of group information section 90960 in an embodiment, such as may be seen as the result of user interaction with vertical scrollbar 90908, is shown in FIG. 34ZD5.


FIG. 34ZD5 depicts user interface matter related to event group information in one embodiment. Event group information of an event group policy definition may identify or otherwise specify the value or source of values for descriptive group information as may be included in an event group description instance, perhaps at the time a new group is created in accordance with the respective event group policy definition. GUI display portion or component 91200 of FIG. 34ZD5 is shown to include section header 90961, group title component 91212, group description component 91214, and group severity component 91216. Each of interface components 91212, 91214, and 91216, is shown as a drop-down selection box containing the default or most-recently-selected value of “Same as the first event.” User interaction with any of these components may result in the appearance of a drop-down selection list from which a user may make a selection, and may include options such as “Same as the first event”, “Fixed value”, or “Substitution pattern or string”. Selection of an option requiring additional information, such as “Fixed value” or “Substitution pattern or string” may result in the dynamic presentation of interface components enabling a user to indicate the additional information.


After indicating desired choices for group membership criteria and any other information using interface 90900 of FIG. 34ZD2, a user may indicate acceptance of the user interface content by interacting with “Next” action button 90918. User interaction with action button 90918 may result in the computing machine populating a portion of a nascent event group policy definition in computer storage and causing the display of a user interface such as depicted in FIG. 34ZD6.


FIG. 34ZD6 depicts a user interface related to automated actions for an event group in one embodiment. Such an interface may be used in an embodiment to prompt for and acquire user input indicating the desired content for an event group policy being created (or edited), and particularly information of the event group policy definition related to recognizing conditions relevant to the group and performing actions, possibly against the members of the group (individually or collectively). The user interface display 91300 is shown to include system title bar area 90902, application menu/navigation bar area 90904, workflow process information and navigation area 90910, and workflow section 90920. Workflow section 90920 is shown to include workflow section title and information area 91321, and action rule section 91330. Action rule section 91330 is shown to include action entry section 91332 and Add-Rule interaction component 91334. Action entry section 91332 is shown to include an interactive rule header 91340, a precondition section 91350, and an action specification section 91360. Action rule header 91340 is shown to include a summary description of the rule as specified, “If stop grouping criteria is met, Then change severity to Low on all the events in this group.” Action rule header 91340 may enable user interaction to selectively collapse or expand the presentation of the section 91332, showing only the action rule header 91340 in the collapsed state. Precondition section 91350 is shown to include precondition selection component 91352 and And-If interactive component 91354. Precondition selection component 91352 is shown as a drop-down selection box containing the default or most-recently-selected value of “stop grouping criteria is met.” User interaction with precondition selection component 91352 may result in the appearance of a drop-down selection list of available preconditions from which a user may make a selection, and may include options such as “stop grouping criteria is met”, “the following event occurs”, “this group existed for”, and “the number of events in this group is”, for example. Selection of a precondition option that requires additional information or parameters may result in the display of additional GUI components to elicit indications of the information or parameters from the user (not shown). For example, if the user selects “the number of events in this group is” as the precondition option, and additional GUI element may be displayed, such as a text box or spin counter enabling a user to indicate the desired event count. User interaction with And-If interactive component 91354 may result in the display of an additional precondition selection component such as 91352 permitting the user to build a compound precondition. The multiple individual preconditions of a compound precondition may be evaluated in the conjunctive in an embodiment, or the specification of conjunctive and/or disjunctive evaluation could be made by the user. These and other embodiments are possible.


Action specification section 91360 is shown to include action selection component 91362, action parameter component 91364, action target selection component 91366 and + and interactive component 91368. A combination of information indicated by the user via components 91362, 91364, and 91366, in this illustrative embodiment, together specify an action that may be caused as the result of processing an event group in accordance with the event group policy definition now being created. Action selection component 91362 is shown as a drop-down selection box containing the default or most-recently-selected value of “change severity to.” User interaction with action selection component 91362 may result in the appearance of a drop-down selection list of available actions from which a user may make a selection, and may include options such as “change severity to”, “change status to”, “change owner to”, and “add a comment”, for example. Action parameter component 91364 is shown as a drop-down selection box containing the default or most-recently-selected value of “Low.” User interaction with action parameter component 91364 may result in the appearance of a drop-down selection list of available options from which a user may make a selection, and may include options relevant to the action specified by 91362. Action target selection component 91366 is shown as a drop-down selection box containing the default or most-recently-selected value of “all events in this group.” User interaction with action target selection component 91366 may result in the appearance of a drop-down selection list of available action targets from which a user may make a selection, and may include options such as “all events in this group”, “the following events in this group”, and “only events specified by the left-hand criteria”, for example. Selection of a target option that requires additional information or parameters may result in the display of additional GUI components to elicit indications of the information or parameters from the user (not shown). For example, if the user selects “the following events in this group” as the target option, an additional GUI element may be displayed, such as a text box or selection drop-down enabling a user to indicate identification criteria for the desired target events. User interaction with + and interactive component 91368 may result in the display of an additional action specification components/groups such as the combination of 91362, 91364, and 91366. These and other embodiments are possible. User interaction with Add-Rule interaction component 91334 may result in the display of additional action entry sections such as 91332. In one embodiment, changes made as the result of user interaction with interface components such as 91352, 91362, 91364, 91366, may result in a corresponding change to the summary description of 91340.


In some embodiments, the available options such as provided by interface components of 91330 may be built-in, user-specified, or a combination of both. In an embodiment, user interface elements may be provided enabling a user to specify preconditions and/or actions in a programmatic way, such as by entering partial or complete SPL, JAVA, Python, or other programming language/code. One of skill will appreciate many illustrative ways to provide user customization and extensibility by consideration of the complete written description.


After indicating desired choices for action rules and any other information using interface 91300 of FIG. 34ZD6, a user may indicate acceptance of the user interface content by interacting with “Next” action button 90918. User interaction with action button 90918 may result in the computing machine populating a portion of a nascent event group policy definition in computer storage and causing the display of a user interface such as depicted in FIG. 34ZD7.


FIG. 34ZD7 depicts a user interface related to event group policy information in one embodiment. Such an interface may be used in an embodiment to prompt for and acquire user input indicating the desired content for an event group policy being created (or edited), and particularly information of the event group policy definition related to identifying, describing, or characterizing the event group policy definition. The user interface display 91400 is shown to include system title bar area 90902, application menu/navigation bar area 90904, workflow process information and navigation area 90910, and workflow section 90920. Workflow section 90920 is shown to include workflow section title and information area 91421, and policy information section 91430. Policy information section 91430 is shown to include policy title component 91432, policy description component 91434, and status component 91436. Policy title component 91432 is shown as a text box that is interactive, enabling a user to edit or modify text indicating the title of the event group policy definition being created (as opposed to the title for an event group created as the result of processing in accordance with the policy definition). Policy description component 91434 is shown as a text box that is interactive, enabling a user to edit or modify text indicating the title of the event group policy description being created (as opposed to the description for an event group created as the result of processing in accordance with the policy definition). Status component 91436 is shown as a pair of mutually exclusive interactive buttons for indicating either an “enabled” or “disabled” status for the event group policy description being created. Event group policy definitions in a disabled state may be ignored by the processing of blocks 90828, 90830, 90832, and 90834 of FIG. 34ZD1, for example.


After indicating desired choices for action rules and any other information using interface 91400 of FIG. 34ZD7, a user may indicate acceptance of the user interface content by interacting with “Next” action button 90918. User interaction with action button 90918 may result in the computing machine populating a portion of a nascent event group policy definition in computer storage, and may result in the computing machine placing a now sufficient and/or complete, nascent event group policy definition in an active region of a command/control/configuration data store for an SMS, such as 90812 of FIG. 34ZD1, and causing the display of a user interface such as depicted in FIG. 34ZD8. These and other embodiments are possible.


FIG. 34ZD8 depicts a user interface related to event group policies. Such an interface may be useful in relation to the command console and reporting processing block 90836 of FIG. 34ZD1. The user interface display 91500 is shown to include system title bar area 90902, application menu/navigation bar area 90904, application header area 91510, event group policy list area 91520, and “Create New Policy” action button 91590. Application header area 91510 is shown to include the title “Aggregation Policy” indicative of the usefulness of the event group policy definitions and processing, thereof and thereby, to aggregate or consolidate multiple individual events, such as notable events, under a representative group identification and affiliation. Event group policy list area 91520 is shown to include policy list header component 91530 and policy list display table 91540. Policy list header component 91530 is shown to include policy count indicator 91532, bulk action drop-down element 91534, and filter component 91536. Bulk action drop-down element 91534 is interactive and enables a user to select from a list of available actions to perform against one or more selected policies appearing in policy list display table 91540. Filter component 91536 is shown as a text box displaying the user prompt “filter.” Filter component 91536 is interactive enabling the user to enter or edit filter criteria for determining the event group policies appearing in policy list display table 91540.


Policy list display table 91540 is shown to include column header row 91542 and event group policy list entries 91544 and 91546. Policy list display table 91540 displays a possibly filtered list of event group policy definitions in a tabular format. Column header row 91542 includes an informational-“i” identifier for column 91550, a checkbox identifier for column 91552, a “Title” identifier for column 91554, a “Status” identifier for column 91556, and an “Actions” identifier for column 91558. A representative event group policy list entry row 91544, for example, displays: an interactive token, “>”, in column 91550 enabling a user to navigate to an interface display (not shown) allowing a user to view and/or edit significant information pertaining to the event group policy definition represented by the list entry; a check box in column 91552 enabling the user to select or deselect the event group policy definition represented by the list entry for a bulk processing action as may be selected and activated by interaction with element 91534; the text “Event from localhost” in column 91554 as the title of the event group policy definition represented by the list entry as may have been initially entered using element 91432 of interface 91400 of FIG. 34ZD7, for example; the options “Enabled” and “Disable” in column 915562, respectively, to indicate the current status and change the current status of the event group policy definition represented by the list entry; and an “Editv” interactive element in column 91558 enabling a user to navigate to an interface display or perform other processing for the event group policy definition represented by the list entry, as may be selected from a drop-down list associated with the interactive element.


In one embodiment, user interaction with “Create New Policy” action button 91590 may cause the computing machine to engage processing for the creation of a new event group policy definition which may, in turn, cause the display of an interface such as 90900 as shown and discussed earlier in relation to FIG. 34ZD2.


FIG. 34ZD9 depicts a user interface example including aspects related to automated event group processing. Such an interface may be useful in relation to the command console and reporting processing block 90836 of FIG. 34ZD1. Such an interface may report to the SMS user summary information about notable events and/or groups thereof, and may report detail information about a particular notable event or group thereof. In the present context of explaining automated event group processing embodiments of an SMS, this discussion of the notable events review interface of FIG. 34ZD9 emphasizes event groups over individual events, though one of skill will recognize that information for groups and for individual events may be intermixed in an embodiment. The user interface display 91600 is shown to include application header area 91610, information and options area 91620, notable events/groups list component 91630, and notable event/group detail component 91650. Application header area 91610 is shown to include slide out lister control 91611, the title “Grouped Notable Events”, timeframe selection element 91612, “Save as” action button 91614, and “Save” action button 91616. Timeframe selection element 91612 is an interactive selection component enabling a user to select a timeframe option from a drop-down list that limits, filters, or selects the notable event data that may be included in the report display. Information and options area 91620 is shown to include a count of the events/groups available for viewing via the interface presently 91622, an indication of one or more filter criteria 91624 use to limit, filter, or select the notable event data included for the report display, and “Show Timeline” action element 91624 enabling a user to indicate the selection of an alternate display mode for the event/group data, for example, a timeline presentation rather than a simple list format. Notable events/groups list component 91630 is shown to include a list header area having a sort indicator/selection element 91632 enabling a user to select a sort order for the displayed information, and refresh action element 91634 enabling a user to indicate to the computing machine and resultingly cause a refresh of the data underlying the display and of the display itself. Notable events/groups list component 91630 is shown to further include individual notable event/group list entries of which 91640 is an example. Notable event/group list entry 91640 is the list entry for an event group created in accordance with an event group policy definition. Notable events/group list entry 91640 is shown to include: an indicator 91642 displaying the number of individual events that are members of the group; a title, “Alert on itsi.backfill_services at 147 . . . ”, as may have been determined in view of event group policy definition information populated as a result of user interaction with element 91212 of interface 91200 of FIG. 34ZD5; a group time frame or time span indicator 91644 as may be recorded in an embodiment as part of an event group description instance for the group as part of the processing of block 90828 of FIG. 34ZD1; a severity indicator, “Critical”, as may have been determined in view of event group policy definition information populated as a result of user interaction with element 91216 of interface 91200 of FIG. 34ZD5; a status indicator, “New”; and a description, “Subcomponent [itsi_backfill] [do_run] [19856]”, as may have been determined in view of event group policy definition information populated as a result of user interaction with element 91214 of interface 91200 of FIG. 34ZD5.


Notable events/group list entry 91640 is shown with a distinctive background color or highlighting distinguish it from other notable events/group list entries of 91630. In one embodiment the distinctive background color indicates the selection of, or focus on, the particular list entry and commensurately the presentation of its detail information in notable events/group detail component 91650. Notable events/group list entry 91640 is further shown with a color band along its left edge, with the color band coded to indicate its severity level.


Notable event/group detail component 91650 is shown to include detail header area 91660 and tabbed display area 91670. Detail header area 91660 is shown to include the event group title 91662 and the event group time frame or time span 91664. Tabbed display area 91670 is shown to include: tab controls area 91672 including an “Overview” active tab control 91674, a “Grouped Events” inactive tab control 91675, a “Comments” inactive tab control 91676, and an “Activity” inactive tab control 91678; a description information area 91682 including the group description, a count of the events in the group, the title of the associated event group policy definition as an interactive element for navigating to the display of event group policy definition information, and a color-coded list of the counts of group events by severity; a tickets information area 91684 including a list of trouble tickets associated with the group or the member events thereof; a Contributing KPIs information area 91686 including a list of KPIs contributing to the member events of the group (none shown), each possibly presented as an interactive element for navigating to the display of related and/or more detailed KPI data/information, and including an interactive element for navigating to a deep dive display populated with the KPIs of the list; and a Possible Affected Services information area 91688 including a list of services potentially impacted in light of the notable events of the group, each presented as an interactive element for navigating to the display of other service data/information, and including an interactive element for navigating to a deep dive display populated with the services of the list. User interaction with “Grouped Events” tab control 91675 in one embodiment may result in the transition of the display of tabbed display area 91670 to a display of grouped events information as next shown and discussed in relation to FIG. 34ZD10.


FIG. 34ZD10 depicts a user interface or portion for a grouped events information display in one embodiment. User interface 91700 is such as may appear in notable event/group detail component 91650 after user interaction with tab control 19675, here shown as the active tab control of tab control area 91672. Notable event/group detail component 91650 of interface 91700 is shown to include graph display area 91710 and event detail list area 91740. Graph display area 91710 is shown to display a grouped events severity-by-time graph/timeline for a particular event group, such as notable events/group list entry 91640, highlighted as the selected entry in FIG. 34ZD9. The severity-by-time graph shown in display area 91710 of FIG. 34ZD10 is shown to include severity axis value indicators 91722, time axis value indicators 91724, a number of event subgroup tokens such as 91730 and 91732, and an event subgroup detail element 91734. Each event subgroup token of the example embodiment is located at the intersection of the appropriate severity axis and time axis values. Each event subgroup token is color-coded in accordance with its severity, and sized according to the number of member events in the subgroup (i.e., events ascribed with the same severity and time values). Other such visual representation, or textual representation, could be used in an embodiment to convey additional subgroup information. In one embodiment, each event subgroup token is interactive to enable the user to designate the selection of the subgroup token. A selected subgroup token such as 91732 may indicate the selected state by the appearance of a differently colored outline for the token. In one embodiment, an event subgroup detail element 91734 may appear for a selected subgroup token and display information about the subgroup represented by the token including, for example, a relevant timestamp and the number of events in the subgroup.


Event detail list area 91740 is shown to include a tabular display of information for a particular event group, such as the event group represented in notable events/group list entry 91640 of FIG. 34ZD9, there highlighted as the selected entry. Event detail list area 91740 of FIG. 34ZD10 is shown to include column header area 91742, and event list entries 91744a to 91744J. Each of the event list entries includes a color-coded indicator of the event severity (shown as a colored dot), a textual description of the event severity in column 91750, the event title in column 91752, an event timestamp in column 91754, and an action element in “Search” column 91756. Action elements in search column 91756 may be interactive enabling a user to initiate navigation to an investigative user interface display possibly pre-populated with, or customized in accordance with information related to the event represented in the event list entry.


Automated Event Correlations


As with the automated event groups, already discussed, one of skill understands service monitoring system (SMS) embodiments using inventive aspects disclosed herein can provide effective service monitoring, in an IT environment for example, by deriving meaningful performance indicators from a huge volume of machine data in raw and disparate representations. The derived performance indicators, themselves, may be further utilized to derive additional information related to system performance, such as notable event records produced by correlation searches. Despite such distillation, in a large or busy environment, the detail information may still overwhelm even an efficient, inquiring user. This may be particularly true when a system is stressed or experiencing problems. A problem in one area of a system can affect other areas of the system and take some time to correct. This can lead to repetitive and multiple alerts from multiple system areas while a problem is being identified and resolved, say, when a failed router is being identified and rebooted. Apart from the possible inefficiency of user time to address the alerts or notable events on an individual basis, there is inefficiency imposed on the computing platform. Beyond the inventive aspects already discussed in relation to automated event groups, such as the grouping of events for consolidated display and processing based principally on user supplied rules or policies, the inventive aspects now discussed provide SMS control mechanisms for the grouping of events for consolidated display and processing based principally on event correlations determined by machine learning and/or artificial intelligence. Moreover, such determinations may be made in realtime (<˜2 minutes) or near realtime (<˜5 minutes). These consolidation aspects improve the computing efficiency of the service monitoring system (SMS) by reducing bandwidth demands for human-machine interfaces, reducing resource demands such as processing overhead for maintaining interactive sessions even during substantial idle time, and by improving processing characteristics of the workload (by maximizing cache hits, for example, by the automated, rapid fire repetition of processing against multiple events).


FIG. 34ZE1 is a system diagram including automated event correlation (AEC) processing in one embodiment. The system 92300 illustrated in FIG. 34ZE1 includes aspects of both automated event groups (AEG), previously discussed, and the presently discussed automatic event correlation (AEC) to help illustrate that embodiments may variously practice aspects of each, alone or in combination. When a notable/alert event is received, system 92300 is shown to perform AEG-style processing against the event first, and later to conditionally perform AEC-style processing against the event. This arrangement is by illustrative example and the practice of inventive aspects is not so limited. For example, in one embodiment only AEC-style event grouping is performed and no AEG-style event grouping is implemented. For example, in one embodiment AEG-style event grouping and AEC-style event grouping are both performed against any notable/alert event received by the system. These and other embodiments are possible.


System 92300 is shown to include processing blocks for a method of handling the grouping of a notable/alert event 92310-92320, notable/alert event sources 92324-92328, seed group control information processors 92322-92323, and data storage 92330. Data storage 92330 is shown to include notable events storage 92332, command/configuration/control (CCC) data 92334, and event groups data 92336. Notable event storage 92332 is shown to include an example notable/alert event entry, item, or record 92338, which in an expanded view is shown to include examples of common metadata fields 92340a-d and other event data 92342. CCC data 92334 is shown to include event group policy EGP192344, AEC seed group definition AEC-S1 92346, and AEC active group definition AEC-A1 92348. Event groups data 92336 is shown to include EGP group information EGP-Group1 92350, AEC seed group information AEC-Seed1 92352, and AEC active group information AEC-Active1 92354.


Notable events data 92332 is comparable to notable events data 90814 of FIG. 34ZD1. Notable events data 92332 of FIG. 34ZE1 represents a source of data with notable/alert event information. While a correlation search, discussed elsewhere herein, may be the source for a notable event, i.e., the information, record, entry, or the like that represents a notable event in computer storage, other sources for notable events are possible in an embodiment, possibly alone or in combination, without limitation. The illustrative embodiment of system 92300 of FIG. 347E1, for example, shows correlation search processor 92324, HTTP/REST interface 92326, and Other processing 92328 as sources for populating notable event information 92332. HTTP/REST interface 92326 may represent a processing component of the service monitoring system (SMS) that can receive and respond to requests from other computing processes, components, machines, systems, or the like, to supply information that the SMS should recognize and treat as one or more notable events. The supplied information in such an embodiment may be represented in any number of ways including in a recognized standard representation format, such as XML or JSON, and the supplied information may omit, include, or inhere applicable schema information. As a part of such processing, HTTP/REST interface 92326 may take information received and use it directly as one or more notable event instances or may take the information received and perform any processing necessary to normalize, format, or otherwise process the information into one or more acceptable instances of a notable event conforming with the requirements of the SMS. For example, in one embodiment, a notable event instance (e.g., 92338) may be required to include a common set of metadata fields such as an event identifier 92340a, a severity indicator field 92340b, a status indicator field 92340c, and an owner identification 92340d, over and above other data 92342 that specifies, describes, represents, or relates to the event. The other event data 92342 may vary widely in and among embodiments and may include some or all of the raw data of one or more events collected and processed by an event processing system such as system 7100 of FIG. 76. Data of other event data 92342, in an embodiment, may be processable using a late binding schema to extract field values, and other event data 92342 may include data already having been extracted using a schema such as a late binding schema. The existence, makeup, and content of notable event metadata and event descriptive data may vary among embodiments, and the example shown in FIG. 347E1 is for purposes of illustration and discussion.


A notable/alert event record, entry, or instance in computer storage, such as 92338, represents the occurrence of a condition recognized in the data representing the environment monitored by the SMS (data from the environment or produced about the environment, perhaps by the monitoring itself) that is ascribed the likelihood of meriting further processing or attention by the SMS, its users, or its clients. The ascription of likelihood may be achieved as simply as creating and processing a record of the event using processing that is specifically associated with notable events as opposed to other processing associated with one or more other classes of events. In one embodiment, the ascription of likelihood may include processing event information to satisfy any one or more notability criteria. In one embodiment, a notable/alert event record, entry, or instance is maintained in a mutable fashion, particularly because certain of its metadata fields may be expected to change in value over time (e.g., status, owner), perhaps in contrast to other classes of event information, such as events that make a history and thus should normally be immutable Because of the likelihood of further processing or attention of notable events, an embodiment may utilize higher performance storage and retrieval means for notable event data 92332. Based on the foregoing, it is to be understood that, at least in the context of discussions regarding event grouping, the events being grouped are notable events and the source of such notable events is not limited to correlation searches, described elsewhere.


Command/control/configuration data 92334 is comparable to CCC data 90812 of FIG. 34CD1. Event group policy definition EGP192344 of FIG. 34ZE1 is comparable to event group policy EGP190813 of FIG. 34ZD1, used to direct the operation of the SMS when performing automated event group (AEG) processing such as the declarative rule-based grouping of events. CCC data 92334 of FIG. 34ZE1 further includes information to direct the operation of the SMS when performing the now described automated event correlation (AEC) processing to perform dynamic machine learning-based grouping of notable events. CCC data 92334 is shown to include AEC seed group definition AEC-S1 92346 which includes information about a seed group initially derived from historical event data and used during the automatic event correlation processing performed by the SMS. CCC data 92334 is shown to further include AEC active group definition AEC-A1 92348 which includes information about an active group arising from operation of automatic event correlation processing of incoming notable events by the SMS. Definitional data items 92344, 92346, and 92348 are representative as multiple instances of each type of definition may be used in an embodiment to carry out event grouping. In illustrative system 92300, processing block 92322 represents processing performed by the SMS embodiment to establish seed group definitions for AEC processing of received notable events. The processing of block 92322 may include an interactive user component enabling a user, such as a system administrator or system expert, to engage the process of developing the seed groups. Such processing will be better appreciated after consideration of the figures and related discussion that follows below. Processing block 92323 represents processing performed by the SMS embodiment to refresh seed group definitions, possibly on a recurring and automatic basis as indicated by the circular arrow appearing with block 92323.


Event group data 92336 is comparable to event group data 90816 of FIG. 34ZD1. The content shown for event group data 92336 of FIG. 34ZE1 has a parallel to the content shown for CCC data 92334. In system 92300, CCC data 92334 may be illustratively said to hold definitional data that directs SMS operations to properly group notable events as they arise in real-time or near real-time. In system 92300, event group data 92336 may be illustratively said to hold the dynamic membership information, i.e. which events are members of which groups and possibly other group information, that results from the SMS grouping operations directed by the definitional information. As CCC data 92334 is shown to hold definitional data for event group policies (e.g. 92344), AEC seed groups (e.g., 92346), and AEC active groups (e.g., 92348), so event group data 92336 is shown to hold corresponding event group data for groups resulting from event group policies (AEG processing) (e.g., 92350), seed groups (AEC processing)(e.g., 92352), and active groups (also AEC processing)(e.g., 92354). One of skill recognizes that data representation and organization details discussed here, and generally, are included in the illustrative examples to help teach inventive aspects and that different data representations and organizations may be employed in an embodiment without departing from the scope of the inventive aspects.


Assuming the data environment 92330 of system 92300 is populated in accordance with the preceding discussion—including that notable events data 92332 includes a notable event from a notable/alert event source (any of 92324-92328) and that CCC data 92334 includes group definition data including seed group definitional data as may have been provided by the operation of blocks 92322 and/or 92323—a description of the handling of the grouping of a notable/alert event by the processing of blocks 92310-92320 is now described. At block 92310 a notable/alert event is received. Block 92310 of system 92300 is shown to receive the notable event record/entry/instance from notable event data 92332. Notable event data 92332 may be implemented, for example, as the computer storage of a queue, stream, file, FIFO stack, or other embodiment. Processing then proceeds to block 92312 where automated event grouping such as rule-based grouping is performed against the notable event received at block 92310. In one embodiment, the processing of block 92312 may result in the notable event being placed into the membership of the first group for which the notable event matches the group's event group policy. In an embodiment, the processing of block 92312 may result in the notable event being placed into the membership of a highest scoring group for which the notable event matches the group's event group policy. In an embodiment, the processing of block 92312 may result in the notable event being placed into the membership of all groups where the notable event matches the respective event group policy. Embodiments may vary. In an embodiment, the processing of block 92312 may complete without the notable event being associated with any group because, for example, it fails to match any event group policy. At block 92314 a determination is made whether the processing of block 92312 resulted in the notable event being placed into the membership of an event group such as may be recorded at 92350, for example. If so, for this embodiment, the event grouping activity for this notable event is complete and processing proceeds to block 92310 where a next notable event is received. If not, for this embodiment, processing proceeds to block 92316 where a determination is made whether automatic event correlation processing is enabled. Such a determination may be made by inquiring against configuration data in CCC data 92334 not specifically shown. If the processing of block 92316 determines that AEC is not enabled, processing proceeds to block 92320 where certain default processing is performed for a notable event that has not been associated with any group. In an embodiment, such default processing may include updating statistics about ungroupable events which may be used to inform future grouping membership criteria. In an embodiment, such default processing may include modifying the notable event instance itself, such as by changing its severity, status, or owner information. These and other embodiments are possible.


If the processing of block 92316 determines that AEC is enabled, processing proceeds to block 92318. At block 92318, automatic event correlation processing is conducted in an attempt to determine group membership for the notable event. In one embodiment, and AEC group definition such as 92346 or 92348 identifies a list of one or more fieldname-value pairs that serve as group membership criteria. The notable event instance is compared to the group membership criteria in some way (e.g., exact match, fuzzy match, weighted match, etc.) and the notable event is placed into the group when a successful match is found. In an embodiment, the processing of block 92318 may result in the notable event being placed into the membership of the first matching group. In an embodiment, the processing of block 92318 may result in the notable event being placed into the membership of a highest scoring group to which the notable event matches. In an embodiment, the processing of block 92318 may result in the notable event being placed into the membership of every group to which the notable event matches the membership criteria. These and other embodiments are possible.


In an embodiment, AEC processing uses definitions only for seed groups (e.g., 92346) that were created by an analysis of historical notable event data in interactive time frames (<˜2 minutes using <˜10,000 sample events), such as by the processing of block 92322. In an embodiment, AEC processing uses definitions for both seed groups (e.g. 92346) and active (non-seed) groups (e.g., 92348). In such an embodiment, an attempt may be first made to match the notable event to a seed group. If no match is found, an attempt may be made to match the notable event to an active group. If still no match is found, a new active group may be created with the notable event being its first and only member, and information from and about the notable event may be used to establish active group membership criteria for future notable events. In one embodiment, an active group may grow in significance—perhaps as measured by membership count, membership growth rate, or other factors—to the point where the active group may be promoted to seed group status and may take on a persistent representation. In an embodiment, AEC processing uses definitions only for active groups (e.g., 92348) that arose only from the processing of notable events in real time or near real time. These and other embodiments are possible. When the processing of block 92318 completes, processing may proceed to block 92310 where the next notable event is received.


FIG. 34ZE2 depicts a method for creating seed group control information to direct automated event correlation processing in one embodiment. Method 92400 of FIG. 34ZE2 illustrates processing as may be performed at block 92322 of FIG. 34ZE1. Method 92400 of FIG. 34ZE2 produces a set of seed group definitions 92452 that may be utilized for the real-time grouping of notable events. Embodiments of method 92400 have advantages over other methods of recognizing characteristics of meaningful groupings from historical data particularly in the substantial reduction of computing resources (e.g. CPU time, memory) needed to provide the meaningful groupings. The reduction of computing resources is substantial enough to enable the production of a computing machine that delivers seed group generation from historical data in the context of an interactive user session. Long-running, brute force, batch methods are not required and user attention and continuity of thought are better maintained in the interactive context. These and other advantages will become obvious by consideration of what follows.


Method 92400, is now described in terms of an embodiment where an event group definition for AEC processing includes a list of one or more factors that serve as matching criteria for event membership in the group represented by the definition. Each group inherently has a level or complexity value that corresponds to the number of factors in its factor list. For example, a group defined by a list having only one factor is a level-1 group with a complexity of 1. In the illustrative embodiment, each factor is a coupled fieldname-value pair. The factor list for a level-1 group will contain a single fieldname-value factor. The factor list for a level-2 group will contain two fieldname-value factors, and so on.


At block 92410, a sample set of notable/alert events 92450 is used to add definitions for level-1 seed groups to a seed group candidate pool 92454. Block 92410 ingests the collection of notable event samples 92450. Notable event samples 92450 may consist of historical notable event data but could include, for example, a set of expertly curated training data. Embodiments may vary. Ingestion of the notable event samples produces a list of fieldnames, perhaps normalized to a common data model, found in the sample data. Further, a list is produced for each of the fieldnames of the distinct values found among the sample data for that fieldname. Statistical information may also be produced by the ingestion processing. For example, block 92410 may determine the count of distinct values for each fieldname appearing among the sample data. For example, block 92410 may determine a count or percentage of the events in the sample that contain the field. Block 92410 may produce this and other information and may maintain it in any format and organization. In an embodiment, the processing of block 92410 may identify one or more fields/fieldnames in the list as recommended fields. Recommended fields are those fields determined likely to contribute to identifying events for meaningful grouping. In an embodiment, determining the recommended fields may include comparison of the fieldname with information about fieldnames known from historical records or identified by a user as having a positive contribution to event identification for grouping. The information about fieldnames may include weights, scores, ratings, rankings, or other information that may be used to make the determination of recommended fields. In an embodiment, determining the recommended fields may include consideration of information produced by the ingestion processing. For example, the determination of recommended fields may include consideration of the number of events, percentage of coverage, or number of distinct values determined for a fieldname. For example, in one embodiment, a fieldname may be recommended where it is included in at least 10% of the events in the sample data and has no more than 10 distinct values for the field among the sample data. As another example, in one embodiment, a fieldname may be recommended where it appears in a list of known, valuable fieldnames and it is included in at least 20 of the events in the sample data. Embodiments considering the aforementioned determinants and others, alone or in combinations, are possible.


In one embodiment, the list of fieldnames found in the sample data, statistical or other information about them, and their recommended status may be presented to the user via an interface for user confirmation and/or adjustment. Such a user interface may be interactively presented on a user interface device such as 92408. Such a user interface is described and discussed later in reference to FIG. 34ZE8. In an embodiment, the recommended status of a fieldname may be presented to the user as a default selection for the field. From the list of fieldnames, the processing of block 92410 of FIG. 34ZE2 in one embodiment produces an identification of selected fields, perhaps based on the recommended fields, user input, or other information, alone or in combination. The processing of block 92410 then creates a level-1 seed group definition in seed group candidate pool 92454 for each of the distinct values occurring in the sample data for each of the selected fieldnames—the fieldname-value pairings used as the single factor in the level-1 definition.


Definition (A) of FIG. 34ZE3 illustrates an example of a seed group definition as may appear in seed group candidate pool 92454 of FIG. 34ZE2, in one embodiment. Seed group candidate definition 92510 of FIG. 34ZE3 shows factor list 92512 having Factor192514a, Factor292514b, Factor392514c, intervening factors 92514d, and a final factor, FactorN 92514e. Factor192514a is depicted with a solid outline indicating that the illustrated embodiment will always include at least one factor in the factor list 92512. The remaining factors, Factor2 through FactorN 92514b-92514e, are depicted with a dashed outline indicating that they may not be required for an instance of the seed group definition. An embodiment of a seed group definition such as 92510 may include information other than factor list 92512 as illustrated by event count data 92518 and event correspondence array 92516. Event count 92518 may include, for example, the count of events from the sample data that match the group identifying factors. Event correspondence array 92516 may include, for example, a bitmap where each bit corresponds to a respective event in the sample data with the bits having true or false (e.g. 1 or 0) values indicating the presence or absence of a field value for the fieldname in the data of the respective event.


Once all level-1 seed group candidate definitions have been created by the processing of block 92410 of method 92400 of FIG. 34ZE2, processing may proceed to block 92412. At block 92412, seed group candidate pool 92454 is culled of weak candidates, i.e., seed group candidates having relatively low prospects of value in identifying meaningful event groups. In one embodiment, a candidate seed group is culled if its membership from the sample data falls beneath a level-1 culling threshold. In one embodiment, the level-1 culling threshold is expressed in terms of a percentage of the total number of events in the sample data 92450. In one embodiment, 5% is used as the level-1 culling threshold; in another, a number in the range of 3 to 7% is used as the threshold; in another, 10% is used as the culling threshold; in another, a number in the range of 5 to 15% is used as the culling threshold; in another, a number in the range of 15 to 25% is used as the culling threshold; in another, a number in the range of 25 to 50% is used as the culling threshold; and in another, a number in the range of 1 to 99% is used as the culling threshold. In one embodiment, a candidate seed group is culled if it is among some percentage or count of candidate seed groups having the lowest event membership, the percentage or count determined as the level-1 culling threshold. These and other embodiments are possible. Embodiments may vary in the methodology used to determine seed group candidates having relatively low prospects of value in identifying meaningful event groups for culling. The culling operation of block 92412 may substantially reduce the number of candidate seed groups in pool 92454, eliminating them from future consideration, and improving the overall efficiency of method 92400. Processing may then proceed to block 92414.


At block 92414, prominent candidate seed groups may be identified, possibly merged, and promoted to seed group status. In one embodiment, prominent candidate seed groups are identified by determining each remaining level-1 candidate seed group having sample event membership that exceeds a prominence threshold. In one embodiment, the prominence threshold may be expressed as a percentage of the total number of notable events in sample data 92450. In one such embodiment, the prominence threshold is 10%; in another, 20%; in another, 30%; in another, 40%; in another, 50%; in another a number in the range of 20-40%; in another, a number in the range of 25-75%; in another, a number in the range of 10-100%; etc. In one embodiment, the prominence threshold may be expressed as a count of events. In one embodiment, all of the candidate seed groups identified as prominent are promoted to seed group status. In one embodiment, promoting a candidate seed group to seed group status may involve adding a representative definition to seed group definition data 92452 while eliminating its representative definition from seed group candidate pool 92454.


A seed group definition, as summarily represented by 92346 of FIG. 34ZE1, is illustrated by the somewhat more detailed illustration of definition (B) of FIG. 34ZE3. Seed group definition 92520 of FIG. 34ZE3 shows factor list 92522 having Factor192524a, Factor292524b, Factor392524c, intervening factors 92524d, and a final factor, FactorN 92524e. Factor192524a is depicted with a solid outline indicating that the illustrated embodiment will always include at least one factor in the factor list 92522. The remaining factors, Factor2 through FactorN 92524b-92524e, are depicted with a dashed outline indicating that they may not be required for an instance of the seed group definition. An embodiment of a seed group definition such as 92520 may include information other than factor list 92522 as illustrated by statistics data 92528. Statistics data 92528 may include, for example, the count of events that are currently members of the group, the maximum number of events ever belonging to the group, the average number of events belonging to the group, the shortest, average, and longest lifetime of the group, the shortest, average, and longest intervals of zero membership of the group, measures of user interactivity with instances of the group, and other information.


In one embodiment, candidate seed groups identified as prominent are merged, where possible, before promotion to seed group status. In one such embodiment, subsets of prominent candidates are identified where some measure of the overlap of event membership between or among the candidates in the subset exceeds a merger threshold. In one embodiment, the merger threshold may be expressed as a percentage of the total number of notable events in each respective candidate, in the candidate with the largest membership, in the candidate with the smallest membership, in the average membership of the candidates in the subset, or another value. In one such embodiment, the merger threshold is 10±5%; in another, 20±5%; in another, 30±5%; in another, 40±5%; in another, 50±5%; in another, 60±5%; in another, 70±5%; in another, 80±5%; in another, 90±5%; in another, a value from 10-25%; in another, a value from 25-50%; in another, a value from 50-75%; in another, a value from 50-100%; in another, a value from 20-100%; and in another, some other value. In one embodiment, the merger threshold may be expressed as a count of overlapping events. In an embodiment, consideration may be given to creating the minimum number of subsets possible among the prominent candidate seed groups. In an embodiment, consideration may be given to maximizing the average number of prominent seed groups in each subset. These and other considerations may be made. Additionally, event correspondence information such as illustrated by array 92516 of FIG. 34ZE3 may be useful in assessing group membership and membership overlap between and among groups.


Subsets of prominent candidate seed groups qualifying for merger may be promoted in merged form to seed group status, in one embodiment. Such promotion in merged form may entail creating an entry in seed group definition data 92452 of FIG. 34ZE2 at the level or complexity equaling the number of prominent candidate seed groups in the subset. For example, a subset for merger having three prominent candidate seed groups would result in the creation of a seed group definition having three factors in its factor list, each of the three factors coming from the single factor in the seed group candidate definition of a respective prominent seed group in the subset. In an embodiment, the merger and promotion of the subset may also result in the elimination of the definitions from seed group candidate pool 92454 for the prominent level-1 seed groups merged. The processing of block 92414 provides the early promotion of prominent candidates to seed group status and may produce the size of the seed group candidate pool moving forward, improving the overall efficiency of method 92400.


For the example illustrative method 92400, the processing of block 92414 completes the processing focused on level-1, including the creation of the level-1 seed group candidates, culling out the unpromising, and promoting and/or merging the most prominent. The method then enters an iterative loop at block 92416 that progresses to seed groups of higher level and complexity. At block 92416, the current working level number is increased, possibly by a simple increment. In one example, the 1st execution of the processing of block 92416 will increment the current working level/complexity number from 1 to 2. At block 92418, candidate seed groups at the current level/complexity meeting basic requirements are added to seed group candidate pool 92454. In one embodiment, the potential pool of all possible seed group candidates for the current level includes permutations from crossing the seed group candidates of the prior level (providing one fewer factors than needed for the current complexity level) with the seed group candidates remaining in pool 92454 for level-1 (providing the last factor needed at the current complexity level). For example, the potential pool of possible seed group candidates at level 3 includes 3-factor permutations from crossing 2-factor level-2 seed group candidates with 1-factor level-1 seed group candidates. A culling threshold may be applied to the possible seed group candidates of the current working level to eliminate unpromising candidates as was performed for level-1 at block 92412. The processing of block 92418 in one embodiment uses a culling threshold calculated for the current working level. In one such embodiment, the culling threshold is calculated in consideration of a measure of group membership size in one or more preceding levels. In one such embodiment, the culling threshold is calculated in consideration of the average group membership event count for this seed group candidates of the prior level. In one such embodiment, the culling threshold is calculated by multiplying a scaling factor by the average group membership of groups of the prior level. For example, the culling threshold for potential level-2 seed group candidates is calculated by multiplying a scaling factor by the average group membership event counts of the remaining level-1 seed group candidates in pool 92454. In one embodiment, the scaling factor is between 0 and 1. In one embodiment, a scaling factor of about 0.7±0.1 is used. Possible seed group candidates for the current working level that survive application of the culling threshold may be added to seed group candidate pool 92454.


At block 92420, a determination is made whether to terminate the creation of seed group candidates of higher complexity. In one embodiment, the determination to terminate seed group candidate creation is based on exhaustion, i.e., attempts to create seed groups at the current working level failed or were unsatisfactory. In one embodiment, the determination to terminate seed group candidate creation is based on defined limits, e.g., a maximum complexity level has been reached. These and other embodiments are possible. If the processing of block 92420 determines that the creation of seed group candidate should not be terminated, processing proceeds to block 92416 where the complexity level is increased. If the processing of block 92420 determines that the creation of seed group candidates should be terminated, processing proceeds to block 92430.


At block 92430, method 92400 begins a process of working down through the levels of seed group candidates in pool 92454 to promote or eliminate seed group candidates. At block 92430 the current working level is established as the highest level/complexity for which a seed group candidate was earlier created and maintained as indicated by the presence of a seed group candidate definition of that level/complexity in pool 92454. At block 92432, all seed group candidates in pool 924542 are promoted to seed group status, perhaps by removing their corresponding definition from candidate pool 92454 and adding a corresponding entry to seed group definition data 92452. At block 92434 an overlap threshold is established for the current working level that may be used to cull lower level candidates from pool 92454. In one embodiment, the overlap threshold may be determined at least in part on the complexity/level of the current working level, and may in an embodiment generally trend from a higher overlap threshold value to a lower overlap threshold value as the complexity/level moves from higher to lower. In such an embodiment, the same overlap threshold value may be used for multiple levels. At block 92436, in one embodiment, seed group candidates of pool 92454 having a complexity that is one level lower than the current working level are compared factor-by-factor to seed groups of the current working level. If the lower-level candidate meets or exceeds the allowable overlap threshold, i.e., the number of factors in the candidate definition that can be found in the factor list in the definition of any seed group of the current working level meets or exceeds the allowable overlap threshold number, the lower-level candidate is culled from pool 92454.


At block 92438, the current working level is decreased, possibly by a simple decrement. At block 92440, a determination is made whether the promotion of candidate seed groups to seed groups data should be terminated. In one embodiment, the determination to terminate seed group candidate promotion is based on exhaustion, i.e., the levels have already been processed. In one embodiment, the determination to terminate seed group candidate promotion is based on defined limits, e.g., a minimum complexity desired for seed groups. These and other embodiments are possible. If the processing of block 92440 determines that seed group candidate promotion should not be terminated, processing proceeds back to block 92432. If the processing of block 92440 determines that seed group candidate promotion should be terminated, processing proceeds to block 92442.


At block 92442, any further processing desired for the seed group definitions represented in seed group definition data 92452 prior to their use for realtime automatic event correlation is performed. In one embodiment, the seed group definitions of 92452 are scored, weighted, or ranked. In one such embodiment, multiple merit scores which may be used for ranking are determined for each seed group definition. A factor-based merit score may be determined for a seed group in consideration of its factors, where each factor is associated with a processing regime used to score the similarity of the factor, and where each regime may be associated with a weighting factor. Further consideration may be made of the maximum number of factors associated with each particular processing regime appearing in any seed group (Nmax), and of the number of factors associated with each particular processing regime appearing in the seed group being scored. The factor-based merit score may be calculated in one embodiment as the sum of the product for each regime type of a regime weighting factor and the number of factors in the seed group of that regime type, divided by the sum of the product for each regime of a regime weighting factor and the Nmax of that regime type.


In one embodiment, an event-based merit score may be determined for a seed group in consideration of the number of sample events qualifying for membership in the seed group and of the maximum number of sample events qualifying for membership in any of the seed groups. The event-based merit score may be calculated in one embodiment as the seed membership count divided by the maximum membership count.


In one embodiment, a combined merit score may be calculated for a seed group in consideration of its factor-based merit score, its event-based merit score, and the relative weight ascribed to each. The relative weight may be indicated by a single number between 0 and 1 that in one embodiment directly indicates the importance of the factor-based merit score leaving the remainder of the importance for the event-based merit score. In an embodiment, the combined merit score may be calculated as one half of the sum of the products of the factor-based merit score times the relative weight and the event-based merit score times 1 minus the relative weight.


In one embodiment, the seed group definitions of 92452 may be stored with a particular representation, in a particular format, at a particular location, or with a particular accessibility, so as to prepare and activate them for use as live command/configuration/control data that directs the active operation of an SMS to perform event grouping, as part of the processing of block 92442.


The processing of block 92442 may also include user interaction as indicated by user interface device 92408, such as to display information about the seed groups, enter processing parameters such as weighting factors, enter other information relevant to seed group definition, or indicate acceptance of the seed group definitions that were created. Such user interaction may be further considered in view of FIG. 34ZE9 in the related discussion.


FIG. 34ZE4 depicts a method for performing AEC processing against notable events in or near realtime in one embodiment. The processing depicted and discussed in relation to method 92550 is such as might be performed to effect block 92318 of FIG. 34ZE1. Seed group definitions 92482 of computer data 92480 of FIG. 34ZE4 represents seed group definitions such as 92452 of FIG. 34ZE2, and such as individual definition examples 92346 of FIG. 34ZE1 and 92520 of FIG. 34ZE3. Seed group definitions 92482 of FIG. 34ZE4 may be an aspect of the command/control/configuration data of a service monitoring system (SMS) at least inasmuch as the seed group definition data directs the operations of the SMS to perform automatic event correlation grouping as illustrated by method 92550. Active group definitions 92486 of computer data 92480 represents active group definitions such as the individual example 92348 of FIG. 34ZE1. Definition (C) 92530 of FIG. 34ZE3 illustrates one embodiment of an active group definition. Active group definition 92530 can be seen to have parallel content to seed group definition 92520, already discussed. Active group factor list 92532 and its constituent factors 92534a-e correspond to seed group definition factor list 92522 and its constituent factors 92524a-e. Statistics data 92538 of active group definition 92530 has a general correspondence to statistics data 92528 of seed group definition 92520 although the statistics and other information represented thereby may differ between a seed group definition and an active group definition, given any differences in their function and use. In one embodiment, seed group definitions such as 92482 of FIG. 34EZ4 may initially be created by a user such as a system administrator during an interactive session and may persist in storage until such time as they are re-created, edited, or refreshed. In one embodiment, active group definitions such as 92486 may be created, modified, and destroyed dynamically during system operation apart from any user involvement and may or may not be stored persistently. In one embodiment, a record of the membership of an event group, whether seed or active, is maintained separately from the group definition. The record of the membership of an event group for real time processing may include information for associating the group identity with notable events determined to belong to the group, and may be achieved in many ways. In an embodiment, a record of membership of an event group may be maintained somehow together with group definition information. Seed group membership data 92484 may reflect the current membership of seed groups represented in definitions of 92482. Active group membership data 92488 may reflect the current membership of active groups represented in definitions of 92486. In one embodiment, defined seed groups may have 0 members as the seed groups are predefined and may persist throughout operation. In one embodiment, currently defined active groups always have at least one member as the dynamically created group may be dynamically destroyed during operation when it no longer has members, for example, after all its member events are closed out.


At block 92560, a notable/alert event is received into the method. At block 92562, factors in the event useful for AEC processing are identified and a factor list for the event is populated. Here again, each factor in the factor list is a fieldname-value pair representing fields and corresponding values of data representing the event. At block 92564, one or more matches between the event and criteria established for membership for a seed group are determined. An embodiment of such processing is represented and discussed in relation to FIG. 34ZE5 elsewhere, herein. At block 92466 of method 92550 of FIG. 34ZE4 a determination is made whether the processing of block 92564 resulted in a match of the event to a seed group. If so, AEC processing for the event is completed and processing proceeds to block 92560 where a next notable event is received. If not, processing proceeds to block 92568. At block 92568, one or more matches between the event and criteria established for membership for an active group are determined. An embodiment of such processing is represented and discussed in relation to FIG. 34ZE6 elsewhere, herein. At block 92470 of method 92550 of FIG. 34ZE4, determination is made whether the processing of block 92568 resulted in a match of the event to an active group. If so, AEC processing for the event is completed and processing proceeds to block 92560 where a next notable event is received. If not, processing proceeds to block 92472 where, in this embodiment, a new active group is created which may include adding an entry to active group definitions 92486 based on the notable event and updating active group membership data 92488 to reflect membership of the notable event in the newly created active group. These and other embodiments are possible.


FIG. 34ZE5 depicts a method for matching an event to seed groups in one embodiment. Method 92600 is entered at block 92610. Processing continues at block 92612 where initialization processing sets a 1st seed group as the current working seed group for a loop iterating over all of the defined seed groups, in one embodiment. In an embodiment, such a loop may iterate over some subset of the defined seed groups. At block 92614, initialization processing for a nested loop iterating over the factors of a particular seed group sets a 1st factor as the current factor for the current seed group. In an embodiment, such a loop may iterate over some subset of the factors. At block 92616, a determination is made whether the notable event being evaluated for grouping is a match for the current factor. In one embodiment, a match is determined to exist where a factor in the factor list for the event is an identical match to the current factor of the seed group. In one embodiment, a match is determined to exist where a factor in the factor list for the event is a fuzzy match to the current factor of the seed group. In one embodiment, a match may be determined to exist using even complex matching function logic that determines a match condition even by indirect means (e.g., lookup functions, external data references, calculations), given the current factor of the seed group and the factor list of the event. In one embodiment, the processing of block 92616 may include the use of one or more similarity scoring regimes. If the processing of block 92616 determines a match does not exist between the current event factor and the event, processing proceeds to block 92628 where a determination is made whether there are more membership factors to consider for the current seed group. If not, processing proceeds to block 92624 so that any remaining seed groups can be considered. If so, processing proceeds to block 92630 so that remaining membership factors of the current seed group can be considered. At block 92630 another membership factor of the seed group is established as the current factor of the seed group. Processing then proceeds to block 92616 to look for a match in the event factors. If the processing of block 92616 determines a match does exist between the current event factor and the event, processing proceeds to block 92618 where the event matching score is adjusted. In one embodiment, the event matching score is a count of the number of matches between current seed group factors and factors of the event. In an embodiment, more complex calculations and determination logic may be used to adjust the matching score of an event, for example, weighted multivariable calculations using real numbers. In an embodiment, different logic modules, routines, calculations, formulae, or the like, may be employed with different factors to adjust the event matching score; and the seed group definition may include information identifying or informing the event matching score logic to be used for each factor, directly or indirectly, explicitly or implicitly. In an embodiment, a similarity scoring regime may be such a logic module. These and other embodiments are possible.


At block 92620, a determination is made whether the event matching score has reached the threshold necessary to be considered a match. If not, processing proceeds to block 92628 so any remaining membership factors may be considered. If so, processing proceeds to block 92622. At block 92622, the event is associated with the current seed group i.e., the event is added to the membership of the seed group. The processing of block 92622 may entail additional related processing, for example, updating statistics associated with the current seed group such as its membership count. Processing then proceeds to block 92624 were determination is made whether there is another potential seed group for the event to match. If so, another seed group is identified and established as the current seed group at block 92626 and processing proceeds for another iteration at block 92614. If not, method 92600 exits at block 92632.


FIG. 34ZE6 depicts a method for matching an event to active (non-seed) groups in one embodiment. Method 92650 is entered at block 92652. Processing continues at block 92654 where initialization processing sets a 1st active group as the current working active group for a loop iterating over all of the defined active groups, in one embodiment. In an embodiment, such a loop may iterate over some subset of the defined active groups. At block 92656, initialization processing for a nested loop iterating over the factors of a particular active group sets a 1st factor as the current factor for the current active group. In an embodiment, such a loop may iterate over some subset of the factors. At block 96258, a determination is made whether the notable event being evaluated for grouping is a match for the current factor. In one embodiment, a match is determined to exist where a factor in the factor list for the event is an identical match to the current factor of the active group. In one embodiment, a match is determined to exist were a factor in the factor list for the event is a fuzzy match to the current factor of the active group. In one embodiment, a match may be determined to exist using even complex matching function logic that determines a match condition even by indirect means (e.g., lookup functions, external data references, calculations), given the current factor of the active group in the factor list of the event. In an embodiment, similarity scoring regimes may be used. If the processing of block 92658 determines a match does not exist between the current membership factor and the event, processing proceeds to block 92678 where a determination is made whether there are more membership factors to consider for the current active group. If not, processing proceeds to block 92674 so that any remaining active groups can be considered. If so, processing proceeds to block 92680 so that remaining membership factors for the current active group can be considered. At block 92680 another membership factor of the active group is established as the current factor of the active group. Processing then proceeds to block 92658 to look for a match in the event factors. If the processing of block 92658 determines a match does exist between the current membership factor and the event, processing proceeds to block 92660 where the event matching score is adjusted. In one embodiment, the event matching score is a count of the number of matches between current active group factors and factors of the event. In an embodiment, more complex calculations and determination logic may be used to adjust the matching score of an event, for example, weighted multivariable calculations using real numbers. In an embodiment, different logic modules, routines, calculations, formulae, or the like, may be used with different factors to adjust the event matching score; and the active group definition may include information identifying or informing the event matching score logic to be used for each factor, directly or indirectly, explicitly or implicitly. These and other embodiments are possible.


At block 92662, a determination is made whether the event matching score has reached the threshold necessary to be considered a as qualifying for membership in the current active group. If not, processing proceeds to block 92678 so any remaining membership factors may be considered. If so, processing proceeds to block 92664. At block 92664, the event is associated with the current active group, i.e., the event is added to the membership of the active group. The processing of block 92664 may entail additional related processing, for example, updating statistics associated with the current active group such as its membership count. Processing then proceeds to block 92666 where a determination is made whether the number of factors in the factor list of the current active group is in excess of the desired target. If not, processing proceeds to block 92670, discussed below. If so, processing proceeds to block 92668 where the current active group definition may be updated based on information related to the new member event. In one embodiment, the factor list of the current active group definition is updated to reflect the intersection between the factor list of the current active group and the factor list of the new member event, which may result in a reduction in the number of factors in the current active group factor list. Processing then may proceed to block 92670.


At block 92670, a determination is made whether the current active group has reached a particular level of significance or prominence such that it should be converted to a seed group and persisted. In one embodiment, such a determination may include consideration of whether the factor list of the active group definition has the desired target number of factors. In one embodiment, such a determination may include consideration of the number of member events associated with the active group and, perhaps, consideration of how that number compares to some reference value such as the smallest number of events in a group at a particular complexity level. In one embodiment, such a determination may include consideration of some measure of membership strength for the current active group and, perhaps, consideration of how that number compares to some reference value such as the lowest membership strength measure of a group at a particular complexity level. In one embodiment, a measurement of membership strength may be calculated by dividing the number of events belonging to the active group by the maximum number of events belonging to any active group. In an embodiment, one or more of the aforementioned considerations, or other considerations, may be used alone or in any combination to determine a significance or prominence value, measure, or level. These and other embodiments are possible.


If the processing of block 92670 determines that the current active group has not reached a particular level of significance or prominence, processing proceeds to block 92674, discussed below. If the processing of block 92670 determines that the current active group has reached a particular level of significance or prominence, processing proceeds to block 92672 where the current active group is converted to a seed group. Such processing may entail removing the entry for the current active group from the active group definition data and adding a new entry based on the current active group to the seed group definition data. Such processing may also entail, similarly, effectively moving the membership information for the current active group from the active group membership data to the seed group membership data and, of course, associating it with the new seed group definition, directly or indirectly, implicitly or explicitly. Additional processing, such as updating or initializing statistics, may also be performed, and embodiments may vary.


At block 92674, a determination is made whether there is another potential active group for the event to match. If so, another active group is identified and established as the current active group at block 92676 and processing proceeds for another iteration at block 92656. If not, method 92650 exits at block 92682.


FIG. 34ZE7 depicts a user interface for AEC control functions in one embodiment. Interface presentation 92700 is such as might be utilized in the processing of block 92410 of FIG. 34ZE2. Such an interface may be used in an embodiment to prompt for and acquire user input, and display information, regarding control information that directs the operation of automatic event correlation processing performed by a service monitoring system (SMS). Interface 92700 of FIG. 34ZE7 is shown to include system title bar area 90702, application menu/navigation bar area 92704, application header area 92710, tabbed interface display area 92720, and footer area 92730. Tabbed interface display area 92720 is shown at the top to include a tab control area having one tab control 92722, “Grouping Regimes”. Tabbed interface display area 92720 is shown to further include, for the Grouping Regimes tab, tab header area 92724, collapsible interface area 92726, and tabbed display area 92728. Application header area 92710 is shown to include the title “Default Policy” and interactive Smart Mode selection control 92712. Footer area 92730 is shown to include interactive Cancel button 92732, interactive Save button 92736, and interactive Enable/Disable toggle selection control comprising mutually exclusive selection elements 92734a and 92734b.


Interactive Smart Mode selection control 92712 is interactive, enabling a user to signal an indication of whether AEC processing should be used to group notable events in real time. The selection signaled by a user through interaction with interactive control 92712 may be reflected in the command/control/configuration data of the SMS, such as CCC data store 92334 of FIG. 34ZE1, and be utilized by an active SMS to direct its operations such as to invoke or bypass AEC processing against the notable event as illustrated and discussed in relation to processing block 92316 of FIG. 34ZE1. Smart Mode selection control 92712 of interface 92700 of FIG. 34ZE7 is shown in the “on” position, whether by default or by earlier user interaction with the interface. In an embodiment, the “on” position of Smart Mode switch 92712 may be functionally linked with the activation of the interface elements associated with Grouping Regimes tab 92722. Collapsible interface area 92726 of the Grouping Regimes tab is shown to include three collapsible section headers, “Adjust the importance of each regime” 92740, “Stop grouping if” 92742, and “Group Information” 92744. Collapsible section headers 92742 and 92744 are shown in the collapsed state as indicated by the rightward facing arrowhead shown in each section header. Collapsible section header 92740 is shown in the expanded state as indicated by the downward facing arrowhead shown in the section header and by the appearance of a section display area beneath the header. The section display area is shown to include action button 92750, “Select Fields to Analyze”. The appearance of interface 92700 is such as might be presented to a user when, for example, processing commences for block 92322 of method 92300 of FIG. 34ZE1. At such a point, as prompted by the text shown for tabbed display area 92728, a user may interact with action button 92750 to signal to the computing machine the desire to select fields to analyze from sample event data in order to establish a set of seed groups. In response to user interaction with action button 92750 of interface 92700 of FIG. 34ZE7, the computing machine may cause the display of an interface such as depicted in FIG. 34ZE8.


FIG. 34ZE8 depicts a user interface for AEC control functions related to seed group determinations in one embodiment. Interface 92800 is shown to include title bar 92810, process information area 92812, field information table display area 92814, and footer area 92816. Process information area 92812 is shown to include Rerun Analysis control button 92820, timeframe control 92822, selected field count information area 92824, 1st regime field count information area 92826, and 2nd regime field count information area 92828. Field information table display area 92814 is shown to include column header row 92830, and table data row area 92840. Field information table display area 92814 includes columns 92832a-f Table data row area 92840 is shown to include data rows 92842a-f Footer area and 92816 is shown to include interactive Cancel action button 92850 and interactive Done action button 92852.


Interface 92800 of FIG. 34ZE8 is such as might be used during the processing of block 92322, “Seed Group Establishment”, of method 92300 of FIG. 34ZE1. Interface 92800 of FIG. 34ZE8 is such as may be used in an embodiment to prompt for and acquire user input, and display information, regarding control information that directs the operation of automatic event correlation processing performed by a service monitoring system (SMS), particularly the analysis of sample historical event data for the purpose of creating seed group definitions for AEC processing. Interactive timeframe control 92822 is shown as a selection list control displaying the current selection for the timeframe, “Last 24 hours”, and responding to user interaction by presenting a drop-down list of timeframe options from which a selection can be made (not shown). The timeframe selected using control 92822 is the historical timeframe that supplies the sample event data used to create AEC seed groups. The selected option appearing in control 92822 when interface 92800 is initially presented to the user during an interactive session may be a system default option, the option last selected by the user, the 1st option in the selection list, or some other option. In one embodiment, the selection of a new timeframe by user interaction with control 92822 may result in the immediate analysis of sample notable event data from the newly selected timeframe to repopulate the content of interface 92800. In one embodiment the selection of a new timeframe by user interaction with control 92822 may result in the recognition and display of the new user timeframe selection, with an analysis of sample notable event data from the newly selected timeframe not performed until some other triggering event, such as user interaction with Rerun Analysis action button 92820. In response to any user interaction with Rerun Analysis action button 92820 the computing machine may perform an analysis of sample notable event data from the historical time period indicated by timeframe control 92822, to repopulate the content of interface 92800.


In one embodiment, the analysis performed over sample notable event data from the historical timeframe indicated by timeframe selection control 92822 may include identifying each distinct fieldname for fields appearing in the sample event data and, for each such fieldname, identifying each distinct value for the field appearing in the sample event data. Such analysis may also include the generation and capture of statistical information such as the number of sample events corresponding to each fieldname and the number of sample events having each of the distinct field values. Such analysis may also include looking at the distinct field values appearing in the sample data for each fieldname, and perhaps other metadata about the field, and perhaps other information, in order to recommend a selection of fields most likely to contribute to meaningful event grouping by AEC processing and to recommend the regime of similarity scoring to be associated with each fieldname. These and other embodiments are possible.


The user interface embodiments illustrated and discussed in relation to FIGS. 34ZE7-34ZE10 assume for illustration that two regimes for similarity scoring are implemented: Text (Textual) and Category (Categorical). A similarity scoring regime as discussed here in regard to AEC processing generally refers to an implementation of processing logic that implements a specific method for producing a similarity score, i.e., a measure of similarity or likeness, for its input comparands. Embodiments may vary as to the number and types of similarity scoring regimes implemented. In one embodiment, the SMS implements a modular framework for similarity regimes, defining format and content requirements for individual similarity scoring regime modules such that the SMS is able to seamlessly integrate relevant matter from each installed and/or activated module into user interface content, processing flows, and the like, where similarity regimes are relevant.


A Text-type regime, in one embodiment, produces a binary similarity score, match or no match, on two text comparands. Each comparand may be stripped of noise words, punctuation, special characters, and extraneous whitespace, and the resulting text may be reduced to a list of word units. A match value is returned as the similarity score if 80% of word units in the 1st list can be found in the 2nd list, without regard to order of appearance. In one embodiment, after similar processing, a match value is returned as the similarity score if 100% of the word units in the 1st list can be found in the identical order in the 2nd list. These and other embodiments are possible. Moreover, it is noted that while general purpose regimes may be implemented that are closely associated with a programming data type such as text, string, integer, or floating-point, regimes may not be so limited. Regimes may be implemented, for example, that provide similarity scoring for comparand having differing programming data types. Too, different regimes may be implemented, for example, that provide substantially different similarity scoring results even though all directed to comparands having the same particular programming data type. Accordingly, regimes are perhaps sometimes better characterized by the logic and assumptions of their comparison methodology, or by the characteristics or content of comparand data with which they are recognized to produce good results, rather than by any particular programming data type.


A Category-type regime, in one embodiment, produces a binary similarity score, match or no match, on two text or numeric comparands. Text comparands are indicated to have a match score where after the removal of any leading or trailing white space, and conversion to all uppercase characters, the two comparands are identical. Numeric comparands are first ascribed a category name corresponding to a range of numeric values in which the comparand belongs, then a match score is indicated where the category names ascribed to the comparands are identical.


As discussed earlier, regimes beyond those discussed here for illustration purposes may be implemented in an embodiment. For example, an embodiment may implement a topology regime. Such a regime may produce a similarity score based on the distance between nodes associated with each comparand in a topology, such as a service, entity, or network topology. Such a regime may produce a similarity score based on the distances between a common reference node in a topology and nodes associated with each comparand. As another example, an embodiment may implement a time sequence mapping regime. Such a regime may produce a similarity score based on characteristics of the events that partially or completely match user-provided criteria to identify events occurring in a particular sequence of time with particular sets of field values. As another example, an embodiment may implement a timing regime. Such a regime may produce a similarity score based on compliance of events with a multi-event timing pattern. The multi-event timing pattern may be based on a known or predicted rate of propagation of problem-related activity in an environment for a given causal path through the environment. These and other embodiments are possible.


Similarity scoring regimes may be useful at many points in processing where the values for multiple instances of field data need to be compared to determine some measure of similarity, especially so where the measure of similarity requires more than a binary determination of whether the instances of field data are identical. Such is the case, for example with a topology regime as discussed above which requires references to external data, such as a stored representation of a topology, in order to make determine the similarity measurement. Similarity scoring regimes may also be useful especially where the measure of similarity requires more than a binary, yes/no, 1/0, true/false, value for the similarity score or measure.


Similarity scoring regimes, accordingly, may have application in comparing fields of notable events, or factors in factor lists derived therefrom, to fields or factor lists of other notable events or event group definition data. In one embodiment, a seed group establishment aspect of AEC processing may use similarity scoring regimes when comparing factors characterizing the notable event with factors in the factor list of a seed group candidate definition to determine membership of a notable event in a seed group candidate. In one embodiment, a realtime notable event processing aspect of AEC processing may use similarity scoring regimes when comparing field data of the notable event with values in factors of the factor list of a seed group or active group definition to determine membership of a notable event in a group. These are but examples.


Returning to the discussion of interface 92800 as illustrated in FIG. 34ZE8, much of the content of the interface is populated as the result of the aforementioned analysis performed against historical notable event sample data covering a timeframe indicated by timeframe selection control 92822, possibly as modified by user interaction. Process information area 92812 is shown to include an indication of the number of fields currently selected for use in a subsequent seed group generation process 92824, and an indication of the number of fields currently associated with each utilized similarity scoring regime. Here, 92826 indicates the number of fields currently associated with the Text regime type, and 92828 indicates the number of fields currently associated with the Categorical regime type.


The tabular information display presented at 92814 includes column header row 92830 showing: an informational-i as the column heading for column 92832a containing interactive collapse/expand control elements; an unchecked checkbox as the column heading for column 92832b containing interactive checkboxes enabling the selection/de-selection of fields to be used from historical notable event sample data to create seed groups for AEC processing; “Field” as the column heading for column 92832c containing fieldnames (and, possibly, values when expanded) of fields discovered in the historical notable event sample data; “Type” as the column heading for column 92832d containing identifications of associated similarity scoring regimes; “# of Values” as the column heading for column 92832e containing counts of the number of distinct values found in the historical notable event sample data for a particular field; and “Event Coverage” as the column heading for column 92832f containing percentages of the total number of events in the historical notable event sample data that include a particular field.


Each of the data rows 92842a-f in the tabular display represents a distinct field, values for which appear among the events of the historical notable event sample data. The fieldname appears in column 92832c. The selection or non-selection of the field for subsequent use in creating seed groups for AEC processing is indicated by a checkbox in column 92832b. Accordingly, the title field is selected for use in seed group creation as indicated by the checked box appearing in row 92842a, and the severity field is not currently selected for seed group creation as indicated by the unchecked box in row 92842d, for example. The current selection from a drop-down list box in column 92832d indicates the similarity scoring regime to be used for the field. Accordingly, the title field is associated with the Text regime as indicated by 92834 in row 92842a, and the host field is associated with the Category regime as indicated by 92836 in row 92842b, for example. The count displayed in column 92832e indicates the number of distinct values appearing in the historical notable event sample data for the field. The percentage displayed in column 92832f indicates the percentage of the total events of the historical notable event sample data having the field. A user may interact with a collapse/expand control element in column 92832a to expand a row to show additional information related to the field or to collapse a row to suppress the display of such additional information. Row 92842b is shown in the expanded state resulting in the presentation of field value detail information 92838 for the host field. Field value detail information 92838 is shown as presenting a listing of values for the host field found among the events of the historical notable event sample data, and a related percentage for each. The percentages shown for 92838 represent the proportion of events that have the particular value. Embodiments may vary as to the number, proportion, order, and the like of field values so displayed. Embodiments may vary as to the format and content of detail display information presented in such an expanded row.


While interface 92800 is illustrated with drop-down list boxes in column 92832d enabling the selection of a single similarity scoring regime for association with a particular field, it is to be understood that an embodiment may support associating multiple similarity scoring regimes with a particular field. In such an embodiment, similarity scoring for the field may be performed in each instance using as many of the specified similarity scoring regimes as necessary to achieve a particular scoring target, such as any single identical match. In an embodiment, similarity scoring for the field may be performed in each instance using all of the specified similarity scoring regimes, with perhaps the highest score, or some combination of the scores, being used as a singular similarity score for the field instance, perhaps where a single score is preferred over an array of multiple scores. Other embodiments are possible.


In one embodiment, when interface 92800 is initially displayed, or after user interaction with Rerun Analysis command button 92820, the field selection represented in the checkboxes of column 92832b and the regime selection represented in column 92832d, are based on computer-generated recommendations. Thereafter, the user may interact with the checkboxes of column 92832b and the drop-down selection lists of column 92832d to change the field selection and regime assignments, respectively. Whether or not having made any changes, the user may interact with Done action button 92852 to indicate to the computing machine acceptance of the field selection and regime assignments indicated by the interface. In response to such an indication, the computing machine may initiate processing to perform seed group creation, such as processing described in relation to method 92400 of FIG. 34ZE2. When seed group creation is complete, the computing machine may present an interface showing certain results, such as the interface illustrated in FIG. 34ZE9.


Computer-generated recommendations for field selection and scoring regime may be based on a wide variety of information including, for example, data type used to represent the field values, field metadata including information from a data model or schema, statistical information determined from analysis of the historical notable event sample data, and other information. In one embodiment, fields represented with a text data type and having a relatively large number of distinct values among the sample data may be recommended for a Text-type scoring regime, while text data type fields having a relatively small number of distinct values and fields represented with a numeric data type may be recommended for a Category-type scoring regime, in one embodiment. In one embodiment, fields having a high occurrence in user supplied event group policies, or high event coverage and a large number of distinct values, or low event coverage and a high number of distinct values, may be recommended for selection to the seed group creation process, while fields having high event coverage and a small number of distinct values, or low event coverage and a low number of distinct values may be recommended against selection to the seed group creation process, for example. Machine learning techniques may be applied to improve the computer-generated recommendations made over time. These and other embodiments are possible.


FIG. 34ZE9 depicts a user interface for AEC control functions in one embodiment that includes example seed group creation controls and information. Interface 92900 of FIG. 34ZE9 is such an interface as may be displayed as the result of updating interface 92700 of FIG. 34ZE7 after the seed group creation process conditioned and invoked using interface 92800 of FIG. 34ZE8 is performed. Interface 92900 such an interface as may be utilized during the processing of block 92442 of method 92400 of FIG. 34ZE2. Much of interface 92900 of FIG. 34ZE9 duplicates what is shown for interface 92700 of FIG. 34ZE7, and elements duplicated across the interfaces are numbered identically. The differences in interface 92900 of FIG. 34ZE9 are now discussed. The content appearing in collapsible section 92740, “Adjust the importance of each regime”, has been replaced.


Collapsible section 92740 of interface 92900 shows field selection action button 92914, 1st interactive regime weighting control 92916, and 2nd interactive regime weighting control 92918. Field selection action button 92914 displays a count of the fields selected for seed group creation processing, such as may have resulted from user interaction with interface 92800 of FIG. 34ZE8. Field selection action button 92914 of interface 92900 of FIG. 34EZ9 may be interactive, enabling a user to navigate to interface 92800 of FIG. 34ZE8 to reconfirm or modify field selection and analysis information, for example. Each distinct selection scoring regime type appearing for a selected field in column 92832c of interface 92800 has a corresponding weighting control in collapsible section 92740 of interface 92900 of FIG. 34ZE9. In one embodiment, regime weightings have values between 0 and 1 represented by an interactive slider control in the user interface. The Text/Textual similarity regime is shown to have weighting control 92916 set to a value of 0.7 and Categorical/Category similarity regime is shown to have weighting control 92918 set to a value of 0.5. In one embodiment, the default value for all regime weightings is 0.5, or at the midpoint of the range of acceptable values. Regime weighting values such as indicated by, or modified using, interactive controls of a user interface, such as interactive controls 92916 and 92918 of interface 92900, are such as may be used in the processing of block 92442 of method 92400 of FIG. 34ZE2, as previously discussed.


Interface 92900 shows newly introduced collapsible section 92910 entitled “Split events by field”. Split-by collapsible section 92910 is shown to include interactive text box 92912 enabling a user to view and edit an optional list of fields. Designating a field as a split-by field by entering its fieldname in text box 92912 will segregate events satisfying other group membership criteria into distinct groups on the basis of the value the event has in its split-by field. Split-by functionality in AEC processing may closely parallel or essentially duplicate the split-by functionality described elsewhere herein in relation to AEG event group processing.


Tab header area 92724 of interface 92900 shows newly introduced interactive controls 92920, 92922 for viewing and adjusting the schedule for automatically performed refreshes of the seed group definitions by the SMS. The schedule information displayed and modified by controls 92920 and 92922 may be reflected in a command/control/configuration data store such as 92334 of FIG. 34ZE1 to control SMS operations regarding the initiation of seed group refresh processing, such as illustrated at block 92323 of FIG. 34ZE1. Interactive frequency control 92920 is shown as a drop-down selection box enabling a user to view or update control information regarding the frequency with which to perform seed group refresh operations automatically. Interactive frequency control 92920 is shown with “Daily” is the currently selected automatic refresh frequency. User interaction with frequency control 92920 may result in the display of a list of available frequency options (not shown) from which the user may make a selection. Interactive time-of-day (TOD) control 92922 is shown as a drop-down selection box enabling the user to view or update control information regarding a base time of day from which to schedule the performance of seed group refresh operations automatically. Interactive TOD control 92922 is shown with “12:00 AM” as the currently selected automatic refresh base time. User interaction with TOD control 92922 may result in the display of a list of available TOD options (not shown) from which the user may make a selection. In one embodiment, TOD control 92922 is a spinner selection control rather than a drop-down selection box. Many embodiments are possible for providing a user with interactive access to control information of the SMS regarding automatic seed group refresh operations.


Tabbed display area 92728 of interface 92900 shows newly introduced preview timeframe control 92928 followed by a newly introduced tabular display of seed group creation information. The tabular display consists of column header row 92930 and individual seed group entry rows 92940a-f. Column header row 92930 is shown to include column name “Count” for column 92932c, column name “Title” for column 92932d, column name “Time” for column 92932e, column name “Severity” for column 92932f, column name “Status” for column 92932g, column name “Service” for column 92932h, and column name “Assignee” (Owner) for column 92932i. Two columns of the tabular display do not have titles in column header row 92930. The 1st untitled column is 92932a which contains a color-coded graphical representation of the severity level associated with the event seed group represented in each row. The 2nd untitled column is 92932b which contains a collapse/expand control element to add or remove additional detail information in rows for each seed group having more than one event in its membership. User interaction with such a collapse/expand control element, in an embodiment, may result in the expansion of the row to accommodate and include the display of additional information about the member events of the group, perhaps on a per-member basis. These and other embodiments are possible.


Each row in the tabular display of 92728 of interface 92900 contains information regarding a seed group as would exist at the end of the preview period if the subject seed group definitions had been in operation over the preview period indicated at 92928. In one embodiment, the seed group rows 92940a-f are ordered in accordance with their merit scores or some other ranking or sorting criteria. As an example, row 92940a is shown to include: a red graphical block in column 92932a in accordance with its “Critical” severity as indicated in column 92932f; a collapse/expand control in column 92932b; the value 93 in column 92932c indicating the number of member events for the seed group had it been active during the preview period; the title “Web Service Response Timeout” in column 92932d indicating a title in the definition data for the seed group; the value “10.15.2015 7:00:32 am” in column 92932e indicating the timestamp associated with the notable event first added into the membership of the group; the value “Critical” in column 92932f indicating a severity level for the group as may have been determined from the severities of one or more of the member events of the seed group; the value “New” in column 92932g indicating a status associated with the group as may have been determined from the statuses of one or more of the member events of the seed group; and the value “Unassigned” appearing in column 92932i indicating the name or other identifier of an assignee (owner) associated with the seed group.


One of skill can appreciate that interface 92900, and more particularly the tabular display of 92728, can be readily adapted from the seed group definition-time context (cs., e.g., block 92322 of FIG. 34ZE1) to the AEC real time context (cs., e.g., block 92318 of FIG. 34ZE1) by using live operational data of the active SMS rather than the definition-time data reflecting a set of historical notable event data samples used for training.


The user may indicate acceptance of the seed group definitions used to produce the tabular display of 92728, and other information and settings shown in interface 92900, by interacting with Save command button 92736, perhaps by a mouse click or finger press to a touchscreen. In response to the user interaction, the computing machine may perform any processing necessary to save and activate the newly created seed group definitions for realtime AEC processing of notable events.


FIG. 34ZE10 depicts a user interface for AEC control functions in one embodiment that includes an augmented information display. Interface 92950 of FIG. 34ZE10 is such an interface as may be displayed as the result of updating interface 92900 of FIG. 34ZE9 in response to a user interaction with interface 92900. The user interaction for purposes of illustration here is positioning a mouse and hovering over the count value, or any part, of row 92940a of interface 92900 of FIG. 34ZE9, for example. In response to such user interaction, the computing machine updates interface 92900 to produce interface display 92950 of FIG. 34ZE10. Interface display 92950 is shown to include hover-over display box 92952 which displays a textual description of the conditions, considerations, contributing factors, or other information, that caused the relevant notable events to be combined into a group by AEC processing.


By-Service Automatic Event Correlation and Uses


The improved service monitoring machine that results from implementing one or more aspects of event grouping already discussed is readily appreciated by one of skill in the art. An improved service monitoring machine also results from implementation of one or more aspects of by-service notable/alert event group processing next described, by itself or as extended to implement aspects such as automated action initiation. Such an improved service monitoring machine may effect the prevention of predicted, undesirable events, or drive scenario assessment of undesirable events in order to ascertain and alleviate the probable cause, so that undesirable—possibly, critical—states may be quickly resolved. The by-service notable/alert event processing now described results in an improved computing machine that performs higher levels of service monitoring without requiring the levels of computing resources (e.g., CPU, memory) that would otherwise make it prohibitive or impossible to produce real-time (RT) or near real-time (NRT) results. Such results may include automated communications to one or more network attached devices targeted to resolve or contribute to the resolution of a critical state. One of skill will appreciate after consideration of what follows how inventive aspects may, for example, reduce an exponential computing problem to a linear one.


FIG. 34ZF1 is a flow diagram of an example process for associating a notable/alert event with a service in one embodiment. Processing discussed in relation to subsequent diagrams presumes an association between a notable/alert event (a notable event) and a particular service, and the processing depicted in the flow diagram of FIG. 34ZF1 illustrates one example of how such associations may be made in an embodiment. Process 93000 for associating a service with a notable event is shown to begin at block 93010 where an aspect of the computing machine receives a notable/alert event. In one embodiment, where processing a notable event that has already been established, receiving the notable event may involve gaining access to a representation of the notable event in computer storage. In one embodiment, where processing a notable event currently under construction (being established or created), receiving the notable event may involve gaining access to information used during the construction of the notable event as currently represented in computer storage. Once the information representing the notable event is identified at block 93010, processing proceeds to block 93012.


At block 93012 an association between the notable event and an entity is determined. In one embodiment, the determination may be made by identifying a field value in the data of the notable event known or suspected to identify an entity, and then searching entity definitions for the field value. When a match is found, the matching entity definition, for example, the entity definition showing the field value as a name, alias, or other identifying field, is associated with the notable event.


At block 93014, a service is identified that is associated with the entity determined at block 93012. In one embodiment, the service may be identified by searching through service definitions to locate one having an association with the entity definition associated with the notable event at block 93012.


At block 93016, an association between the notable event and the service or service definition identified at block 93014 is recorded or reflected in some manner in storage accessible to the computing machine. Embodiments may vary as to the method and mechanisms for reflecting the association in storage. In one embodiment, a data store of notable events indexed by service may be used, such as illustrated and discussed in relation to FIG. 34ZF2. In one embodiment, a data store of notable events may be used wherein information associating the service is reflected directly in the stored content of each notable event, such as illustrated and discussed in relation to FIG. 34ZF3. These and other embodiments are possible.


The processing described for method 93000 and its variants are but a few examples of the possible ways one of skill will now understand for determining and retaining information reflecting the association between a notable event and a service with which it is identified. Other methods may be used in an embodiment to associate a notable event with a service. For example, a fixed association between a notable event type and a particular service may be established in the definition for the notable event type. In such an embodiment, modes for creating notable events may include functionality for specifying the fixed association to a service. As an example, a GUI related to defining a correlation search such as illustrated in FIG. 34M may be adapted to include an interactive component such as a text, list, or combo selection box enabling a user to indicate a service to be associated with notable events resulting from executions of the defined correlation search. These and other embodiments are possible.


FIG. 34ZF2 illustrates an embodiment of notable/alert events datastore indexed by service. Indexed data store 93020 includes notable events data store 93040 and by-service indexes 93030. Notable events data store 93040 is a collection of notable event representations or records where notable event record instance 93042 is one such example. By-service indexes 93030 includes multiple indexes, index segments, or subindexes that each represent a respective service and have information, such as pointers or record identifiers, to locate or identify the individual notable events in notable event data store 93040 that are associated with the respective service. By-service indexes 93030 is shown to include service indexes Service193032, Service293034, and ServiceN 93036,


FIG. 34ZF3 illustrates an example storage representation of a notable/alert event in one embodiment. The illustrated storage representation for notable event record/representation/instance 93050 may be seen to have close parallels to the illustrated notable event instance 92338 of FIG. 34ZE1. Common metadata fields ID 92340a, Severity 92340b, Status 92340c, Owner 92340d, and Other event data 92342 of FIG. 34ZE1 can be seen to correspond to common metadata fields ID 93052a, Severity 93052b, Status 93052c, Owner 93052d, and Other event data 93054, respectively, of FIG. 34ZF3. Notably, notable event record 93050 of FIG. 34ZF3 is shown to further include a Service common metadata field 93052e. In one embodiment, Service field 93052e contains information identifying the service with which the particular notable event record is associated. In one embodiment, Service field 93052e may further contain information about one or more other services topologically related to the service with which the particular notable event record is immediately associated. Such embodiments may vary as to the information reflected for a topologically related service and vary as to the number, degree, and direction of service nodes in the topology to be represented in the Service common metadata field information 93052e of a notable event record as illustrated by 93050. These and other embodiments and variations are possible.


FIG. 34ZF4 depicts a method for creating by-service seed group control information to direct automated event correlation processing on a by-service basis in one embodiment. Method 93100 of FIG. 34ZF4 illustrates processing as may be performed at block 92322 of FIG. 34ZE1 as adapted for by-service event group processing.


A computing machine implementing method 93100 of FIG. 34ZF4 in this example embodiment begins processing at block 93110 where a 1st service for which seed groups are to be determined is identified. In one embodiment, a 1st service may be identified by selecting a 1st service definition representing a service matching some criteria from among a collection of service definitions of the service monitoring system (SMS). Such matching criteria may vary among embodiments and may include, for example, a 1st service in a collated order, a 1st service occupying a terminal node in a service dependency topology, a 1st service identified among a collection of notable events, or some other criteria. The identified service is the current working service during the processing of 93100 and the service for which event seed groups will be created.


At block 93112 notable/alert event sample data associated with the current service for seed group creation is identified. In one embodiment, notable events associated with the current service are identified from a larger pool of notable/alert event samples such as 92450 of FIG. 34ZE2. In one embodiment, a by-service index such as 93032 of FIG. 34ZF2 may be used to identify the notable events associated with the current service. In one embodiment, service identifying information within the notable events, such as Service metadata fields 93052e of FIG. 34ZF3 may be used to identify notable events associated with the current service. Other embodiments are possible. The identified events are the pool of notable/alert event samples used for the creation of seed group definitions for the current service as represented in FIG. 34ZF4 by 93116.


At block 93114 seed groups are created for the current service. The processing of block 93114 corresponds to the processing of the method described for 92400 of FIG. 34ZE2 and is not described again here. Notable/alert event samples 93116 of FIG. 34ZF4 corresponds to notable/alert event samples 92450 of FIG. 34ZE2. Resulting seed group definitions 93118 of FIG. 34ZF4 corresponds to a seed group definitions 92452 of FIG. 34ZE2.


At block 93120 the service-specific seed groups created by the processing of block 93114 are reflected in the storage of the computing machine. One of skill recognizes that the processing described for block 93120 may be subsumed by, or duplicative of, the processing of block 93114, but will understand the usefulness of discussing it separately here to help convey an understanding of embodiments employing inventive aspects. The processing of block 93120 may include representing the resulting seed group definitions 93118 in an Automatic Event Correlation (AEC) by-service event groups datastore 93130. For example, where Service1 is the current service, the resulting seed group definitions 93118 may be recorded as the seed group definition (SGD) information of Service1 event groups datastore 93140. For example, where Service2 is the current service, the resulting seed group definitions 93118 may be recorded as the seed group definition (SGD) information of Service2 event groups datastore 93150; and so on. By-service event groups data store collection 93130 represents a logical depiction of the data, illustrating that event group information, including seed group definitions, is distinct for each service.


At block 93122 a determination is made whether there are any more services to be considered for seed group creation. If so, processing is shown to proceed to block 93126 where the next current service is identified before returning to the top of the processing loop at block 93112. If not, processing is shown to proceed to block 93124 where the by-service seed group definitions are finalized. The processing of block 93124 may include any processing needed within an embodiment to finalize the by-service seed group definitions which may include, for example, registering the completed by-service event groups data store 93130 for active use in the command/configuration/control data of the service monitoring system, moving data store 93130 to persistent storage, or copying data store 93130 to a particular location.


One of skill will understand that while the processing of method 93100 is described in terms of a serial processing loop, embodiments practicing inventive aspects are not so limited. Operations related to different services may be performed in parallel or interleaved, for example, without departing from inventive teachings. Many embodiments are possible as is generally true for methods that may be implemented with combinations of hardware and software.


FIG. 34ZF5 illustrates an example service monitoring interface GUI portion for effecting by-service automatic event correlation in one embodiment. GUI portion 93170 represents a possible adaptation of the contents of collapsible interface area 92726 of FIG. 34ZE9 to include support for optional, user-selected, by-service event grouping. Collapsible section headers 93172, 93174, 93176, and 93178 of FIG. 34ZF5 correspond to collapsible section headers 92740, 92910, 92742, and 92744 appearing in FIG. 34ZE9, respectively. GUI portion 93170 of FIG. 34ZF5 is shown to include interactive component 93180, labeled “Group by Service”, and showing an “Enabled” setting or state. Interactive component 93180 is depicted as a toggle switch enabling the user to indicate to the computing machine whether by-service grouping should be enabled or disabled; i.e., whether the by-service seed group creation and real time notable event processing as contemplated by FIG. 34ZF4 and FIG. 34ZF6 is to be performed (i.e. the enabled setting), or whether generalized seed group creation and real time notable event processing as contemplated by FIG. 34ZE2 and FIG. 34ZE4 is to be performed (i.e., the disabled setting), in one embodiment. In one embodiment, when interactive component 93180 indicates the enabled setting, both by-service and generalized seed group creation and real time notable event processing may be performed. These and other embodiments are possible.


FIG. 34ZF6 depicts a method for performing by-service AEC processing against notable events in or near realtime in one embodiment. The method depicted by FIG. 34ZF6 can be seen to have many parallels to the method depicted by FIG. 34ZE4 which addresses performing generalized (i.e., not service-specific or by-service) AEC processing against notable events in or near realtime, in one embodiment. The processing depicted and discussed in relation to method 93200 of FIG. 34ZF6 is such as might be performed to effect block 92318 of FIG. 34ZE1 as adapted to by-service event grouping.


By-service event groups datastore 93130 of FIG. 34ZF6 for Automatic Event Correlation (AEC) is shown to include Service1 event groups data store 93140, Service2 event groups data store 93150, and ServiceN event groups data store 93160. When seen previously, the by-service event groups data stores 93140, 93150, and 93160 of AEC by-service event groups datastore 93130 were each shown to include Seed Group Definition (SGD) data, as repeated here. These per-service event groups data stores 93140, 93150, and 93160, are now shown to each also include Seed Group Memberships information (SGM), Active Groups Definition information (AGD), and Active Group Memberships (AGM) information.


Current Service event groups data 93240 of FIG. 34ZF6 represents the instance of a by-service event groups data store corresponding to the current service, i.e., the service associated with a notable event currently being processed by method 93200. For example, if a notable event currently being processed is associated with Service1, then Current Service event groups data 93240 reflects the Service1 event groups data store 93140. One of skill will appreciate that Current Service event groups data 93240 may reflect the underlying by-service event groups data store by reference, rather than by duplication or some other method, in an embodiment.


The content depicted for Current Service event groups data 93240 is now illustrated by use of an example where a notable event associated with Service1 is currently being processed by method 93200. Seed Group Definitions 93242 of Current Service event groups data 93240 represents seed group definitions, also depicted as the seed group definition (SGD) data of Service1 event groups data store 93140, such as may have been produced by the processing of method 93100 of FIG. 34ZF4. Seed Group Membership data 93244 of Current Service event groups data 93240 represents seed group membership data, also depicted as the Seed Group Membership (SGM) data of Service1 event groups data store 93140, such as may have been produced by the current and prior executions of processing of method 93200 of FIG. 34ZF6. Active Group Definitions 93246 of Current Service event groups data 93240 represents active group definition data, also depicted as the Active Group Definition (AGD) data of Service1 event groups data store 93140, such as may have been produced by the current and prior executions of processing of method 93200. Active group membership 93248 of Current Service event groups data 93240 represents active group membership data, also depicted as the Active Group Membership (AGM) data of Service1 event groups data store 93140, such as may have been produced by the current and prior executions of processing of method 93200.


At block 93210, a notable/alert event is received into the method. At block 93212, the service associated with the received notable/alert event is identified and used to establish the corresponding per-service event groups data store (from among the collection of datastore 93130) as the Current Service event groups data 93240. At block 93214, factors in the event useful for AEC processing are identified and a factor list for the event is populated. Here again, each factor in the factor list is a fieldname-value pair representing fields and corresponding values of data representing the event. At block 93216, one or more matches between the event and criteria established for membership for a seed group of the service are determined. Where matches are found, the event is added to the seed group membership. An embodiment of such processing is represented and discussed in relation to FIG. 34ZE5 elsewhere, herein. At block 93218 of method 93200 of FIG. 34ZF6 a determination is made whether the processing of block 93216 resulted in a match of the event to a seed group of the service. If so, by-service AEC processing for the event as to the current service is completed and processing proceeds to block 93210 where a next notable event is received. If not, processing proceeds to block 93220. At block 93220, one or more matches between the event and criteria established for membership for an active group are determined. Here again, event membership in the groups is reflected where matches are found. An embodiment of such processing is represented and discussed in relation to FIG. 34ZE6 elsewhere, herein. At block 93222 of method 93200 of FIG. 34ZF6, determination is made whether the processing of block 93220 resulted in a match of the event to an active group of the service. If so, by-service AEC processing for the event as to the current service is completed and processing proceeds to block 93210 where a next notable event is received. If not, processing proceeds to block 93224 where, in this embodiment, a new active group is created which may include adding an entry to active group definitions 93246 based on the notable event and updating active group membership data 93248 to reflect membership of the notable event in the newly created active group. These and other embodiments are possible.


FIG. 34ZF7 depicts a system diagram showing activation of a network attached action target device using causal determinations from by-service event groups in one embodiment. As an overview, system 93250 shows processing of first block 93260 which draws on a history of notable events grouped by-service and information about a service topology to produce predictor rules or patterns that represent causal relationships between earlier predictive events having some causal force in relationship to a subsequent (predicted/response/caused) event. System 93250 further shows processing of second block 93270 which draws on a history of notable events, often the most recent history, and the predictor rules or patterns produced by 93260, to identify matches with predictive event patterns to predict, anticipate, or warn of some likelihood of a future occurrence of the corresponding predicted event type. Processing of second block 93270 may include activating a network attached Action Target Device based on the event prediction to initiate action(s) leading to the prevention of, correction of, historical capture of, or other appropriate response to, the event prediction. Accordingly, the employment of inventive aspects as illustrated by the example of system 93250 may provide a heretofore unrealized service monitoring system capability to predict critical events or states in an IT environment and automatically initiate preventive or corrective measures on network attached devices, and doing so in real-time or near real-time in a computationally efficient manner.


System 93250 of FIG. 34ZF7 is shown to include a number of data resources particularly useful in illustrating inventive aspects through the discussion of an example embodiment. System 93250 is shown to include by-service-by-group notable event data store 93252 which is shown in one embodiment to include notable events data store 93254 and AEC by-service event groups datastore 93130. AEC by-service event groups data store 93130, as before, is shown to include a collection, set, aggregation, or the like of per service event groups data stores 93140, 93150, 93160, each corresponding to a particular service and each including seed group definition (SGD) information, seed group membership (SGM) information, active group definition (AGD) information, and active group membership information (AGM). In one embodiment, Notable Events data store 93254 may include representations of all notable events arising during the operation of the service monitoring system (SMS) over some period, and information of the SGM and AGM components of 93130 serve as indexes, pointers, locators, or the like, to the notable events of 93254 having corresponding group membership. For example, the membership information for one of many possible seed groups of Service1 represented in the SGM information of 93140 includes the information necessary to identify the notable events of 93254 that belong to the seed group membership. In a similar example, the membership information for one of many possible active groups of Service2 represented in the AGM information of 93150 includes the information necessary to identify the notable events of 93254 that belong to the active group membership. Accordingly, by-service-by-group notable event data store 93252 may permit rapid and efficient access to the notable event representations of each of the various event groups for any particular service. These and other embodiments are possible.


System 93250 is shown to include Service Topology information 93300 as a data resource. Service Topology information 93300 may represent dependency relationships between and among services. The dependency relationships may be directional and may indicate that a 1st service is influenced by, is affected by, or is subordinate to, a related 2nd service which may be said to be an influencer of, to affect, or to be superior to the 1st service. The dependency relationships may be represented in the form of directed dependency graphs where the nodes of a graph represent the services and the edges of a graph represent the dependency relationships. Other embodiments are possible and one of skill in the art can appreciate that many ways exist to represent such graphs in computer storage using various structures, representations, formats, and the like. Because services or groups of services may be independent of others in the environment monitored by the service monitoring system, multiple independent graphs may be used to represent all the dependency relationships expressed for the environment. The set or collection of services interconnected to one or another, directly or indirectly, by dependency relationships may be considered a dependency group or a topology group, in one embodiment. The collection or set of graphs or other informational representations for the topology groups may be included in Service Topology information 93300. For illustration, Service Topology information 93300 is shown to include topology group 93310 having node 93310 representing Service1 (S1), node 93312 representing Service2 (S2), and node 93314 representing Service3 (S3), and directed edges (arrows) representing the inter-nodal dependency relationships in the direction of influence such that Service1 is shown to be dependent upon, or influenced by, Service2, and Service2 is shown to be dependent upon, or influenced by, Service3. Service Topology information 93300 is shown to further include topology group 93320 indicating that ServiceA (SA) is influenced by ServiceB (SB) and ServiceC (SC), and that ServiceC (SC) is influenced by ServiceD (SD) and ServiceE (SE). In one embodiment, the inter-service dependency relationships are indicative of performance, i.e., a service influences another service where it affects the performance of that other service.


System 93250 is shown to include Predictor Models/Rules/Patterns information 93290 as a data resource. As will be better understood after consideration of the discussion of at least the processing of block 93260 of FIG. 34ZF7 that appears below, predictor models/rules/patterns are used to represent causal relationships between and among notable events. While many embodiments are possible, predictor rule representation 93292 is an illustrative example of the information that may be used in an embodiment to represent a model/rule/pattern of 93290. Predictor rule 93292 is illustrated to have the capacity for timestamp information (e.g., creation time, modification time, last matched time), ID information that may uniquely identify the particular rule, Service information that may identify the service (and the event group information) to which the rule applies, Response Event Type information for identifying the caused event type in the causal relationship, Predictor Service-EventType(s) information for identifying the causing event type(s) in the causal relationship, Weight information for representing proportional influence for different factors when identifying a match for the rule, Time information for factoring in relative inter-event occurrence times when identifying a match for the rule, Count information for factoring in event-type count information when identifying a match for the rule, Severity information for factoring in event-type severity information when identifying a match for the rule, Match Criteria information for specifying some or all of the method and criteria used to determine a match for the rule, Prediction Action(s) information for specifying one or more actions to be performed or initiated whenever the pattern of predictor event types is discovered in real time data before the related occurrence of the response event type, Root Cause Action(s) information for specifying one or more actions to be performed or initiated whenever the pattern of predictor event types is discovered in real time data after the related occurrence of the response event type, and Other information to represent additional or different information as may be useful in an embodiment to represent a predictor rule, model, or pattern. Predictor rule 93292 may include information items that essentially define the predictor rule (e.g., Response EventType, Predictor Service-EventType(s)) and metadata information items that provide information about the essential predictor rule (e.g., Timestamp(s), ID). Example Predictor Row 93292 is merely illustrative, and many embodiments are possible. Moreover, while using automated methods of system 93250 to populate the contents of Predictor Models/Rules/Patterns information 93290 is chiefly discussed now in relation to FIG. 34ZF7, one of skill appreciates that the contents of Predictor Models/Rules/Patterns information 93290, including the contents of an individual predictor rule such as 93292, may be otherwise supplied, including by using GUI processes which enable a user or administrator to add, change, or delete some or all content.


Having discussed the data resources illustrated for example system 93250, the processing depicted for system 93250 will now be described. Block 93260 depicts processing which draws on a history of notable events grouped by service, and information about a service topology, to produce predictor models, rules, or patterns that represent causal relationships between earlier predictive events having some causal force in relationship to a subsequent (predicted/response/caused) event. At block 93262, a history of notable events, perhaps including the very most recent history, is searched to identify response event types occurring for a particular service. In one embodiment, response event types are notable event types for an occurrence of which it is desired to quickly and automatically identify one or more probable root causes. In one embodiment, response event types are automatically determined by recognizing in the historical record, event types exceeding some severity level, occurring with some frequency, and factoring in the topological importance of the related service, perhaps by scoring a service based on the number and importance of services in the topology it influences. In one embodiment, response event types may be identified based largely on user input received earlier from a user interface and maintained as identification criteria. These and other embodiments are possible. As one example, the processing of block 93262 may identify response event types for a Service1 by searching through the notable events belonging to any of the Service1 seed groups as indicated by the SGM data of 93140. In this example, Service1 is considered the affected service because it is the service associated with the response (caused) event type.


At block 93264, possible predictor event types are identified for an influencing service. The service is considered the influencing service because it is the service associated with the predictor event types that potentially cause an occurrence of the response event type for the affected service. The influencing service for which predictor event types are sought may be identified by reference to Service Topology information 93300. Continuing with the example where Service1 is the affected service, Service2 may be selected as the influencing service for the processing of block 93264 because Service Topology information 93300 indicates that Service2 influences or impacts Service1. Accordingly, the processing of block 93264 may identify possible predictor event types for Service2 by searching through the notable events belonging to any of the Service2 seed groups as indicated by the SGM data of 93150 and capturing a list of all event types encountered.


In one embodiment, only a service represented by an immediate superior node in the service topology may be used as the influencing service for the processing of block 93264. In one embodiment, any service represented by an immediate superior node in the service topology may be used as the influencing service for the processing of block 93264. In one embodiment, only a highest scoring service represented by any connected superior node in the service topology may be used as the influencing service for the processing of block 93264. In one embodiment, the processing of block 93264 may be adapted to identify possible predictor event types for any selection or combination of nodes in the service topology, though perhaps restricted to the nodes within a single topology group, and perhaps restricted to the superior nodes of the single topology group. These and other embodiments are possible.


At block 93266, a predictor-response relationship model is determined for each response event type identified at block 93262. In one embodiment, a predictor-response relationship model may be determined by creating a specific instance of a model using a model template. In one embodiment, a single model template is used. In one embodiment, the model template is selected from among a set of model templates based on factors that may include the number of predictor event types identified at block 93264, the count of occurrences of the predictor event types found in the notable events of the seed group membership of the influencing service, and/or other factors. These and other embodiments are possible. As an example, the model template in one embodiment may represent an assumption that “The occurrence of an event of type [ResponseEventType] in service [AffectedService] is preceded by the occurrence of predictor event types [PredictorEventType(s)] in service [InfluencingService] within a duration of [T] seconds.” Using such a template, a specific model can be determined by substituting the ResponseEventType and AffectedService parameters with information from the processing of block 93262, substituting the PredictorEventType(s) and InfluencingService parameters with information from the processing of block 93264, and substituting the T parameter with a value recorded in the configuration data of the SMS. An example of an instance of a model determined as described may be “The occurrence of an event of type E100 in service Service1 is preceded by the occurrence of predictor event types B314, E400, and Z999 in service Service2 within a duration of 300 seconds.” One of skill will appreciate that while a model template as discussed in this example may be entirely effective, embodiments are not so limited. An embodiment may readily employ more sophisticated model templates designed to account for greater complexity of relationships and topology, for example. Many embodiments are possible.


At block 93268, each model determined at block 93266 is applied against historical data to determine a corresponding predictor rule for operational use that may be employed to detect event patterns that predict or anticipate the occurrence of a response event type. Determining a predictor rule may be performed in one embodiment for a model by identifying each occurrence of a response event type for the affected service. For each such occurrence, the subset of predictor event types from the model that preceded such occurrence in the timeframe specified by the model is ascertained, resulting in patterns of the actual predictor events that preceded actual occurrences of the response event in the historical data. The actual patterns of predictor events may be aggregated to determine the predictor rule for operational use. In one embodiment, the simple intersection of the actual patterns of predictor events may be used to determine the operational predictor rule. In one embodiment, a weighted sum of the frequency of occurrence of event types in the actual patterns of predictor events may be used to determine the prediction rule. These and other embodiments are possible. Predictor rules determined by the processing of block 93268 may be recorded in Predictor Models/Rules/Patterns information 93290.


For simplicity to facilitate an understanding of inventive aspects, the processing of block 93260 was described in terms of response event types for a single service, but one of skill readily understands how to apply such processing across multiple services to determine predictor rules for each service. Similarly, the processing of block 93260 was described in terms of identifying inter-service causation patterns (i.e., predictor-response), but one of skill understands by consideration thereof how to adapt the process to the narrower case of determining predictor rules based on intra-service event patterns, which may include adapting the processing of block 93264 to identify possible predictor event type(s) for the affected service, rather than for the influencing service.


Also for simplicity to facilitate an understanding of inventive aspects, the processing of block 93260 was described in terms of an embodiment employing a single predictor rule for a particular response event type for a particular service. Is to be understood that an embodiment may employ multiple predictor rules for a particular response event type for a particular service. Such an embodiment may adapt the processing of block 93268, for example, to identify different subsets of predictor event types having a high co-occurrence with one another and with the response event type within the designated timeframe, and to create a predictor rule for each such subset. Such an embodiment may provide improved predictor rules (e.g., more precise, more accurate, more reliable) particularly useful for addressing operational situations in the monitored environment where multiple causation paths may lead to a common outcome, namely, a high priority or critical situation as may be recognized by the occurrence of the response event type. (Response event types do not necessarily signal a high priority or critical situation but are commonly discussed in such terms because of the regularity with which that may be the actual use case.) These and other embodiments are possible.


The discussion of the processing depicted for system 93250 now proceeds to a discussion of the operational use of predictor rules to predict or anticipate critical response events in real time or near realtime as part of the operation of the service monitoring system. System 93250 shows processing block 93270 which draws on a history of notable events 93252, often the most recent history, and the predictor rules 93290 as may be produced by 93260, to identify matches with predictive event patterns to predict, anticipate, or warn of some likelihood of a future occurrence of the corresponding predicted response event type, and to initiate an action to address the predicted occurrence. The processing described in relation to block 93270, may be included as part of a general processing loop of the service monitoring system. Such a loop may begin at block 93280 where a determination is made whether predictor rule processing should begin or continue. In one embodiment, predictor rule processing occurs on a periodic basis triggered by a timer. Once triggered, predictor rule processing continues for one complete cycle through all of the active operational predictor rules. Accordingly, the processing of block 93280 in such an embodiment determines to proceed with predictor rule processing if a new cycle has been triggered by the timer or if the predictor rules have not been exhausted for the previous cycle, and upon such a determination processing proceeds to block 93272.


At block 93272, a determination is made as to a timeframe of the historical record to be used for event pattern matching. In one embodiment, the end time of the timeframe is the current time and the start time of the timeframe is T seconds before the current time, where T is derived from the predictor rule to be evaluated and may represent the maximum time that a predictor event type from the predictor rule may precede the response event type from the predictor rule. At block 93274, information of the historical record for the determined timeframe is searched to find a matching pattern of predictor event types specified by the predictor rule. In one embodiment, the historical data will only match the pattern of predictor event types if there is at least one occurrence of each of the predictor event types identified in the rule within the timeframe specified by the rule. In one embodiment, the historical data will match the pattern of predictor event types if at least some proportion of the predictor event types identified in the rule have an occurrence within the timeframe specified by the rule. In one embodiment, the historical data will match the pattern of predictor event types if it complies with matching criteria also spelled out in the predictor rule. These and other embodiments are possible.


At block 93276, a determination is made whether a match was found at block 93274 between the historical data of the relevant timeframe and the pattern of predictor event types identified in the predictor rule. If not, processing returns to block 93280. If so, processing proceeds to block 93278.


At block 93278, an action is initiated or caused for responding to the prediction of an occurrence of the response event defined in the predictor rule. In one embodiment, the action to be initiated is specified as part of the information defining a predictor rule and its context, such as predictor rule definition 93292. In one embodiment, at least part of the processing of block 93278 always includes logging information about the response event prediction including, for example, the exact time of the prediction, the predictor rule in effect, and an identification of the predictor event set used to satisfy the rule.


In one embodiment, initiating the predicted response event action includes transmitting an action message or sequence to action target device 93284 via network 93282. Network 93282 may be any data communication network, including internetworks, and may include LANs, WANs, the Internet, mobile networks, or the like, and combinations thereof. Action Target Device 93284 may be any of a large variety of devices capable of receiving the transmitted action message via network 93282. In one embodiment, the transmitted action message may consist of properly formatted commands directed to the control interface of an IT system component that directs its operation, for example, a command string or sequence that instructs a network router or other device to reboot; or, as another example, commands to adjust the loading configuration sent to a load balancer; or, as another example, configuration commands to a cloud computing complex to add or modify its configuration of resources. In one embodiment, the transmitted action message may be an SMS alert or HTML page sent to a human interface device (HID) such as a smart phone, perhaps with detailed instructions for recommended preventive or corrective actions to take. These and other embodiments are possible.


It will be recognized that the critical response event prediction of a system such as 93250 of FIG. 34ZF7 cannot always result in critical event prevention. Where the time difference between the occurrence of a recognizable matching predictor event pattern and the occurrence of the response event is small, perhaps measured in a few seconds or milliseconds, and some human intervention is required, response event prevention may be unlikely. Nonetheless, early detection leads to early resolution, and the detection can drive corrective action, if not preventive. And even so, where the time difference between the occurrence of a recognizable matching predictor event pattern and the occurrence of the responsive event is small, prevention of the undesirable responsive event may be possible where automated communication at network speeds takes place with an action target device capable of an instantaneous response.


FIG. 34ZF8 depicts a processing diagram for activating a network attached action target device using backtracking of predictor rules to make plausible causal determinations from by-service event groups, in one embodiment. The process here, in a way, may be seen as the converse of the processing performed by block 93270 of FIG. 34ZF7. The processing performed by block 93270 of FIG. 34ZF7 was described as a forward-motion use of the causal relationship information embodied in predictor rules. That is to say, the predictor rules were used in a forward looking manner, starting with the earlier predictor events to anticipate a future, or later, response event. The processing performed by block 93360 of FIG. 34ZF8 will be seen by the discussion that follows to backtrack through a predictor rule, i.e., use the predictor rule in a backward looking manner, by starting with the later, response event to reach a plausible identification of the events that caused it, i.e., the predictor events.


System 93350 shows processing block 93360 which draws on a history of notable events 93252, often the most recent history, and the predictor rules 93290 as may be produced by the processing of block 93260 of FIG. 34ZF7, to identify the occurrence of a response event type and determine whether it was preceded by a pattern of actual events matching the predictor event pattern of a predictor rule in order to highlight that pattern of events as the plausible probable cause. The ability to highlight the likely probable cause events so as to stand out from a noisy background of event activity greatly speeds problem resolution and makes possible a focused automated responsive action likely to correct the condition of concern. The processing described in relation to block 93360 of FIG. 34ZF8 may be included as part of a general processing loop of the service monitoring system. Such a loop may begin at block 93352 where a determination is made whether predictor rule processing should begin or continue. In one embodiment, predictor rule processing occurs on a periodic basis triggered by a timer. Once triggered, predictor rule processing may continue for one complete cycle through all of the active operational predictor rules. Accordingly, the processing of block 93352 in such an embodiment determines to proceed with predictor rule processing if a new cycle has been triggered by the timer or if the predictor rules have not been exhausted for the previous cycle, and upon such a determination processing proceeds to block 93362.


At block 93362, the occurrence of a response event type represented in the active set of predictor rules of Predictor Models/Rules/Patterns information 93290 is detected. In one embodiment, Notable Events data 93254 relating to a particular time period is searched for the event type occurrence using seed groups membership (SGM) data for the relevant service to identify and restrain the Notable Events data to be searched. In one illustrative example, a particular predictor rule definition of 93290 indicates its association with the Service1 service, indicates a response event type of E100, and indicates the predictor event types as J100 and K100 in service Service1. Accordingly, the processing of block 93362 will search the notable events of Notable Events data 93254 relating to a particular time period that are identified in the seed groups membership (SGM) data of 93140 (Service1), looking for a notable event with an event type of E100 (response event type). The particular time period may be based on the triggering timer interval for predictor rule processing as utilized by block 93352, or some other time period. If such an event is detected by the search, processing proceeds to block 93364, otherwise the processing of block 93362 may be repeated for a different predictor rule definition or processing may return to block 93352.


At block 93364, a pattern of actual events matching the predictor events of the predictor rule is identified, if possible. In one embodiment, Notable Events data 93254 relating to a particular time period is searched for predictor event type occurrences using seed groups membership (SGM) data for the relevant service to identify and restrain the Notable Events data to be searched. Continuing with the earlier illustrative example, the processing of block 93364 will search the notable event of Notable Events data 93254 relating to a particular time period that are identified in the seed groups membership (SGM) data of 93140 (Service1), looking for notable events with an event type of J100 or K100 (predictor event types). The particular time period may be based on (i) time information of the predictor rule indicating the greatest amount of time that he predictor event may occur in advance of the response event while still matching the rule, and (ii) the actual time that the response event that matched the rule occurred. Other determinations for the particular time period may be used. Embodiments are possible where advance determinations of the particular time period are not used, though it may be possible to improve the process speed and efficiency by constraining a search of notable events to a particular time period.


At block 93366, a determination is made whether a matching predictor event pattern was identified by the processing of 93364. If not, processing may return to block 93352. If so, processing proceeds to block 93368. At block 93368, a root cause action is initiated. In one embodiment, the root cause action to initiate is determined from information in the predictor rule definition. Accordingly, the root cause action may be tailored not just to the response event type, but also to the likely cause of the response event, i.e., the pattern of predictor of events defined in the rule. Root cause actions may generally be of the same type and variety as described for the predicted event actions discussed in relation to the processing of block 93278 of FIG. 34ZF7, though corrective rather than preventive actions may be most appropriate to the processing of block 93368 of FIG. 34ZF8. In one embodiment, it may be significant in an IT environment that the root cause action may include transmitting an action message over a network such as 93282 to an action target device 93370. Network transmission of root cause action messages to action target devices may greatly increase the speed with which a critical situation signaled by the occurrence of the response event may be resolved, and permits the resolution of such a critical situation in rapid fashion in a completely automated way. These and other advantages will be understood by one of skill in the art after consideration of the foregoing, and many embodiments practicing the variety of inventive aspects in various combinations are possible.


Example Service-Monitoring Dashboard



FIG. 35 is a flow diagram of an implementation of a method 3500 for creating a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 3501, the computing machine causes display of a dashboard-creation graphical interface that includes a modifiable dashboard template, and a KPI-selection interface. A modifiable dashboard template is part of a graphical interface to receive input for editing/creating a custom service-monitoring dashboard. A modifiable dashboard template is described in greater detail below in conjunction with FIG. 36B. The display of the dashboard-creation graphical interface can be caused, for example, by a user selecting to create a service-monitoring dashboard from a GUI. FIG. 36A illustrates an example GUI 3650 for creating and/or editing a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. In one implementation, GUI 3650 includes a menu item, such as Service-Monitoring Dashboards 3652, which when selected can present a list 3656 of existing service-monitoring dashboards that have already been created. The list 3656 can represent service-monitoring dashboards that have data that is stored in a data store for displaying the service-monitoring dashboards. Each service-monitoring dashboard in the list 3656 can include a button 3658 for requesting a drop-down menu listing editing options to edit the corresponding service-monitoring dashboard. Editing can include editing the service-monitoring dashboard and/or deleting the service-monitoring dashboard. When an editing option is selected from the drop-down menu, one or more additional GUIs can be displayed for editing the service-monitoring dashboard.


The dashboard creation graphical interface can be a wizard or any other type of tool for creating a service-monitoring dashboard that presents a visual overview of how one or more services and/or one or more aspects of the services are performing. The services can be part of an IT environment and can include, for example, a web hosting service, an email service, a database service, a revision control service, a sandbox service, a networking service, etc. A service can be provided by one or more entities such as host machines, virtual machines, switches, firewalls, routers, sensors, etc. Each entity can be associated with machine data that can have different formats and/or use different aliases for the entity. As discussed above, each service can be associated with one or more KPIs indicating how aspects of the service are performing. The KPI-selection interface of the dashboard creation GUI allows a user to select KPIs for monitoring the performance of one or more services, and the modifiable dashboard template of the dashboard creation GUI allows the user to specify how these KPIs should be presented on a service-monitoring dashboard that will be created based on the dashboard template. The dashboard template can also define the overall look of the service-monitoring dashboard. The dashboard template for the particular service-monitoring dashboard can be saved, and subsequently, the service-monitoring dashboard can be generated for display based on the customized dashboard template and KPI values derived from machine data, as will be discussed in more details below.


GUI 3650 can include a button 3654 that a user can activate to proceed to the creation of a service-monitoring dashboard, which can lead to GUI 3600 of FIG. 36B. FIG. 36B illustrates an example dashboard-creation GUI 3600 for creating a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. GUI 3600 includes a modifiable dashboard template 3608 and a KPI-selection interface 3606 for selecting a key performance indicator (KPI) of a service. GUI 3600 can facilitate input (e.g., user input) of a name 3602 of the particular service-monitoring dashboard that is being created and/or edited. GUI 3600 can include a button 3612 for storing the dashboard template 3608 for creating the service-monitoring dashboard. GUI 3600 can display a set of identifiers 3604, each corresponding to a service. The set of identifies 3604 is described in greater detail below. GUI 3600 can also include a configuration interface 3610 for configuring style settings pertaining to the service-monitoring dashboard. The configuration interface 3610 is described in greater detail below. GUI 3600 can also include a customization toolbar 3601 for customizing the service-monitoring dashboard as described in greater detail below in conjunction with FIG. 35. The configuration interface 3610 can also include entity identifiers and facilitate input (e.g., user input) for selecting entity identifier of entities to be included in the service-monitoring dashboard.



FIG. 38B illustrates an example GUI 3810 for displaying a set of KPIs associated with a selected service for which a user can select for a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. When button 3812 is activated a list 3814 of a set of KPIs that are associated with the service can be displayed. The list 3814 can include an item 3816 for selecting all of the KPIs that are associated with the service into a modifiable dashboard template (e.g., modifiable dashboard template 3710 in FIG. 37). The list 3814 can include a health score 3818 for the service. In one implementation, the health score is an aggregate KPI that is calculated for the service. An aggregate KPI can be calculated for a service as described above in conjunction with FIG. 34.


Returning to FIG. 35, at block 3503, the computing machine optionally receives, via the dashboard-creation graphical interface, input for customizing an image for the service-monitoring dashboard and causes the customized image to be displayed in the dashboard-creation graphical interface at block 3505. In one example, the computing machine optionally receives, via the dashboard-creation graphical interface, a selection of a background image for the service-monitoring dashboard and causes the selected background image to be displayed in the dashboard-creation graphical interface. The computing machine can display the selected background image in the modifiable dashboard template. FIG. 37 illustrates an example GUI 3700 for a dashboard-creation graphical interface including a user selected background image, in accordance with one or more implementations of the present disclosure. GUI 3700 displays the user selected image 3708 in the modifiable dashboard template 3710.


Referring again to FIG. 35, in another example, at block 3503, the computing machine optionally receives input (e.g., user input) via a customization toolbar (e.g., customization toolbar 3601 in FIG. 36B) for customizing an image for the service-monitoring dashboard. The customization toolbar can be a graphical interface containing drawing tools to customize a service-monitoring dashboard to define, for example, flow charts, text and connections between different elements on the service-monitoring dashboard. For example, the computing machine can receive input of a user drawing a flow chart or a representation of an environment (e.g., IT environment). In another example, the computing machine can receive input of a user drawing a representation of an entity and/or service. In another example, the computing machine can receive input of a user selection of an image to represent of an entity and/or service.


At block 3507, the computing machine receives, through the KPI-selection interface, a selection of a particular KPI for a service. As discussed above, each KPI indicates how an aspect of the service is performing at one or more points in time. A KPI is defined by a search query that derives one or more values for the KPI from the machine data associated with the one or more entities that provide the service whose performance is reflected by the KPI.


In one example, prior to receiving the selection of the particular KPI, the computing machine causes display of a context panel graphical interface in the dashboard-creation graphical interface that contains service identifiers for the services (e.g., all of the services) within an environment (e.g., IT environment). The computing machine can receive input, for example, of a user selecting one or more of the service identifiers, and dragging and placing one or more of the service identifiers on the dashboard template. In another example, the computing machine causes display of a search box to receive input for filtering the service identifiers for the services.


In another example, prior to receiving the selection of the particular KPI, the computing machine causes display of a drop-down menu of selectable services in the KPI selection interface, and receives a selection of one of the services from the drop-down menu. In some implementations, selectable services can be displayed as identifiers corresponding to individual services, where each identifier can be, for example, the name of a particular service or the name of a service definition representing the particular service. As discussed in more detail above, a service definition can associate the service with one or more entities (and thereby with heterogeneous machine data pertaining to the entities) providing the service, and can specify one or more KPIs created for the service to monitor the performance of different aspects of the service.


In response to the user selection of a particular service, the computing machine can cause display of a list of KPIs associated with the selected service in the KPI selection interface, and can receive the user selection of the particular KPI from this list.


Referring again to FIG. 37, a user may select Web Hosting service 3701 in FIG. 37 from the set of KPI identifiers 3702, and in response to the selection of the Web Hosting service 3701, the computing machine can cause display of a set of KPIs available for the Web Hosting service 3701. FIG. 38A illustrates an example GUI 3800 for displaying a set of KPIs associated with a selected service, in accordance with one or more implementations of the present disclosure. GUI 3800 can be a pop-up window that includes a drop-down menu 3801, which when selected, displays a set of KPIs (e.g., Request Response Time and CPU Usage) associated with the service (e.g., Web Hosting service) corresponding to the selected service identifier. The user can then select a particular KPI from the menu. In another implementation, GUI 3800 also displays an aggregate KPI associated with the selected service, which can be selected to be represented by a KPI widget in the dashboard template for display in the service-monitoring dashboard.


Returning to FIG. 35, at block 3509, the computing machine receives a selection of a location for placing the selected KPI in the dashboard template for displaying a KPI widget in a dashboard. Each KPI widget can provide a numerical or graphical representation of one or more values for a corresponding KPI or service health score (aggregate KPI for a service) indicating how a service or an aspect of a service is performing at one or more points in time. For example, a user can select the desired location for a KPI widget by clicking (or otherwise indicating) a desired area in the dashboard template. Alternatively, a user can select the desired location by dragging the selected KPI (e.g., its identifier in the form of a KPI name), and dropping the selected KPI at the desired location in the dashboard template. For example, when the user selects the KPI, a default KPI widget is automatically displayed at a default location in the dashboard template. The user can then select the location by dragging and dropping the default KPI widget at the desired location. As will be discussed in greater detail below in conjunction with FIGS. 40-42 and FIGS. 44-46, a KPI widget is a KPI identifier that provides a numerical and/or visual representation of one or more values for the selected KPI. A KPI widget can be, for example, a Noel gauge, a spark line, a single value, a trend indicator, etc.


At block 3511, the computing machine receives a selection of one or more style settings for a KPI identifier (a KPI widget) to be displayed in the service-monitoring dashboard. For example, after the user selects the KPI, the user can provide input for creating and/or editing a title for the KPI. In one implementation, the computing machine causes the title that is already assigned to the selected KPI, for example via GUI 2200 in FIG. 22, to be displayed at the selected location in the dashboard template. In another example, after the user selects the KPI, the user is presented with available style settings, and the user can then select one or more of the style settings for the KPI widget to be displayed in the dashboard. In another example, in which a default KPI widget is displayed in response to the user selection of the KPI, the user can choose one or more of the available style setting(s) to replace or modify the default KPI widget. Style settings define how the KPI widget should be presented and can specify, for example, the shape of the widget, the size of the widget, the name of the widget, the metric unit of a KPI value, and/or other visual characteristics of the widget. Some implementations for receiving a selection of style setting(s) for a KPI widget to be displayed in the dashboard are discussed in greater detail below in conjunction with FIG. 39A. At block 3513, the computing machine causes display of a KPI identifier, such as a KPI widget, for the selected KPI at the selected location in the dashboard template. The KPI widget that is displayed in the dashboard template can be displayed using the selected style settings. The computing machine can receive further input (e.g., user input) for resizing a KPI widget via an input device (e.g., mouse, touch screen, etc.) For example, the computing device may receive user input via mouse device resizing (e.g., stretching, shrinking) the KPI widget.



FIG. 39A illustrates an example GUI 3900 facilitating user input for selecting a location in the dashboard template and style settings for a KPI widget, editing the service-monitoring dashboard by editing the dashboard template for the service-monitoring dashboard, and displaying the KPI widget in the dashboard template, in accordance with one or more implementations of the present disclosure. GUI 3900 includes a configuration interface 3906 to display a set of selectable thumbnail images (or icons or buttons) 3911 representing different types or styles of KPI widgets. The KPI widget styles can include, for example, and not limited to, a single value widget, a spark line widget, a Noel gauge widget, and a trend indicator widget. FIG. 39B illustrates example KPI widgets, in accordance with one or more implementations of the present disclosure. Widget 3931 is an example of one implementation of a Noel gauge widget. Widget 3932 is an example of one implementation of a spark line widget. Widget 3933 is an example of one implementation of a trend indicator widget.


Referring to FIG. 39A, configuration interface 3905 can display a single value widget thumbnail image 3907, a spark line widget thumbnail image 3908, a Noel gauge widget thumbnail image 3909, and a trend indicator widget thumbnail image 3910. For example, a user may have selected the Web Hosting service 3901, dragged the Web Hosting service 3901, and dropped the Web Hosting service 3901 on location 3905. The user may also have selected the CPU Usage KPI for the Web Hosting service 3901 and the Noel gauge widget thumbnail image 3909 to display the KPI widget for the CPU Usage KPI at the location 3905. In response, the computing machine can cause display of the Noel Gauge widget for the selected KPI (e.g., CPU Usage KPI) at the selected location (e.g., location 3905) in the dashboard template 3903. Some implementations of widgets for representing KPIs are discussed in greater detail below in conjunction with FIGS. 40-42 and FIGS. 44-46. In response to a user selection of a style setting for the KPI widget, one or more GUIs can be presented for customizing the selected KPI widget for the KPI. Input can be received via the GUIs to select a label for a KPI widget and the metric unit to be used for the KPI value with the KPI widget.


In one implementation, GUI 3900 includes an icon 3914 in the customization toolbar, which can be selected by a user, for defining one or more search queries. The search queries may produce results pertaining to one or more entities. For example, icon 3914 may be selected and an identifier 3918 for a search widget can be displayed in the dashboard template 3903. The identifier 3918 for the search widget can be the search widget itself, as illustrated in FIG. 39A. The search widget can be a shape (e.g., box) and can display results (e.g., value produced by a corresponding search query) in the shape in the service-monitoring dashboard when the search query is executed for displaying the service-monitoring dashboard to a user.


The identifier 3918 can be displayed in a default location in the dashboard template 3903 and a user can optionally select a new location for the identifier 3918. The location of the identifier 3918 in the dashboard template specifies the location of the search widget in the service-monitoring dashboard when the service-monitoring dashboard is displayed to a user. GUI 3900 can display a search definition box (e.g., box 3915) that corresponds to the search query. A user can provide input for the criteria for the search query via the search definition box (e.g., box 3915). For example, the search query may produce a stats count for a particular entity. The input pertaining to the search query is stored as part of the dashboard template. The search query can be executed when the service-monitoring dashboard is displayed to a user and the search widget can display the results from executing the search query.


Referring to FIG. 35, in one implementation, the computing machine receives input (e.g., user input), via the dashboard-creation graphical interface, of a time range to use for the KPI widget, editing the service-monitoring dashboard, and clearing data in the dashboard template.


At block 3515, the computing machine stores the resulting dashboard template in a data store. The dashboard template can be saved in response to a user request. For example, a request to save the dashboard template may be received upon selection of a save button (e.g., save button 3612 in GUI 3600 of FIG. 36). In one implementation, an image source byte for the resulting dashboard template is stored in a data store. In one implementation, an image source location for the resulting dashboard template is stored in a data store. The resulting dashboard template can be stored in a structure where each item (e.g., widget, line, text, image, shape, connector, etc.) has properties specified by the service-monitoring dashboard creation GUI.


Referring to FIG. 35, at block 3517, the computing machine can receive a user request for a service-monitoring dashboard, and can then generate and cause display of the service-monitoring dashboard based on the dashboard template at block 3519. Some implementations for causing display of a service-monitoring dashboard based on the dashboard template are discussed in greater detail below in conjunction with FIG. 47.



FIG. 40 illustrates an example Noel gauge widget 4000, in accordance with one or more implementations of the present disclosure. Noel gauge widget 4000 can have a shape 4001 with an empty space 4002 and with one end 4004 corresponding to a minimum KPI value and the other end 4006 corresponding to a maximum KPI value. The minimum value and maximum value can be user-defined values, for example, received via fields 3116,3120 in GUI 3100 in FIG. 31A, as discussed above. Referring to FIG. 40, the value produced by the search query defining the KPI can be represented by filling in the empty space 4002 of the shape 4001. This filler can be displayed using a color 4003 to represent the current state (e.g., normal, warning, critical) of the KPI according to the value produced by the search query. The color can be based on input received when one or more thresholds were created for the KPI. The Noel gauge widget 4000 can also display the actual value 4007 produced by the search query defining the KPI. The value 4007 can be of a nominal color or can be of a color representative of the state to which the value produced by the search query corresponds. A user can provide input, via the dashboard-creation graphical interface, indicating whether to apply a nominal color or color representative of the state.


The Noel gauge widget 4000 can display a label 4005 (e.g., Request Response Time) to describe the KPI and the metric unit 4009 (e.g., ms (milliseconds)) used for the KPI value. If the KPI value 4007 exceeds the maximum value represented by the second end 4006 of the shape 4001 of the Noel gauge widget 4000, the shape 4001 is displayed as being fully filled and can include an additional visual indicator representing that the KPI value 4007 exceeded the maximum value represented by the second end 4006 of the shape 4001 of the Noel gauge widget 4000.


The value 4007 can be produced by executing the search query of the KPI. The execution can be real-time (continuous execution until interrupted) or relative (based on a specific request or scheduled time). In addition, the machine data used by the search query to produce each value can be based on a time range. The time range can be user-defined time range. For example, before displaying a service-monitoring dashboard generated based on the dashboard template, a user can provide input specifying the time range. The input can be received, for example, via a drop-down menu 3912 in GUI 3900 in FIG. 39A. The initial time range, received via GUI 3900, can be stored with the dashboard template in a data store and subsequently used for producing the values for the KPI to be displayed in the service-monitoring dashboard.


When drop-down menu 3912 is selected by a user, GUI 4300 in FIG. 43A can be displayed. FIG. 43A illustrates an example GUI 4300 for facilitating user input specifying a time range to use when executing a search query defining a KPI, in accordance with one or more implementations of the present disclosure. For real-time execution, for example, used to update the service-monitoring dashboard in real-time, the time range for machine data can be a specified time window (e.g., 30-second window, 1-minute window, 1-hour window, etc.) from the execution time (e.g., each time the query is executed, the events with timestamps within the specified time window from the query execution time will be used). For relative execution, the time range can be historical (e.g., yesterday, previous week, etc.) or based on a specified time window from the requested time or scheduled time (e.g., last 15 minutes, last 4 hours, etc.). For example, the historical time range “Yesterday” 4304 can be selected for relative execution. In another example, the window time range “Last 15 minutes” 4305 can be selected for relative execution.



FIG. 43B illustrates an example GUI 4310 for facilitating user input specifying an end date and time for a time range to use when executing a search query defining a KPI, in accordance with one or more implementations of the present disclosure. When button 4314 is selected, an interface 4312 can be displayed. When a search query that defines a KPI is executed, the search query can search a user-specified range of data. For example, the search query may use “4 hours ago” to view the KPI state(s) at that end time. The start time can be determined based on whether the KPI is a service-related KPI or adhoc KPI, as described below.


In one implementation, for a service-related KPI (e.g., a KPI that is associated with a service) interface 4312 can specify the end parameter for a search query defining the service-related KPI, and the service-related KPI definition can specify the start parameter for the search query. For example, for a particular service-related KPI, the range of data “four hours of data” can be specified by a user via a service-related KPI definition GUI (e.g., “Monitoring” portion of GUI in FIG. 34AC described above). The four hours of data that are used for the search query can be relative to an end date and time that is specified via interface 4312.


In one implementation, for an adhoc KPI (i.e., a KPI that is not associated with a service), interface 4312 can specify the end parameter for a search query defining the adhoc KPI, and the particular type (e.g., spark line, single value) of widget used for the adhoc KPI can specify the start parameter for the search query. In one implementation, the use of a single value widget for an adhoc KPI specifies a time range of “30 minutes”. In one implementation, the use of a spark line widget for an adhoc KPI specifies a time range of “30 minutes”. In one implementation, the use of a single value delta widget (also referred to as a trend indicator widget) for an adhoc KPI specifies a time range of “60 minutes”. The time range associated with a particular widget type can be configurable.


The interface 4312 can present a list of preset end parameters (e.g., end date and/or end time), which a user can select from. The list can include end parameters (e.g., 15 minutes ago, etc.) that are relative to the execution of the KPI search queries. For example, if the “15 minutes ago” 4316 is selected, the search queries can run using data for a time range (e.g., last 4 hours) up until “15 minutes ago” 4316.


The interface 4312 can include a button 4320, which when selected can run the search queries for the KPIs (e.g., service-related KPIs, adhoc KPIs) in the modifiable dashboard template 4323 and update the KPIs (e.g., KPI 4326 and KPI 4328) in the modifiable dashboard template 4323 in response to executing the correspond search queries.


The interface 4312 can include one or more boxes 4318A-B enabling a user to specify a particular end date and time. In one implementation, when one of the boxes 4318A-B is selected, an interface 4322 enabling a user to specify the particular date or time is displayed. In one implementation, user input specifying the particular data and time is received via boxes 4138A-B. For example, Jan. 7, 2015 at midnight is specified. If the button 4320 is selected, the search queries for KPI 4326 and KPI 4328 can be executed using four hours of data up until midnight on Jan. 7, 2015.


When “Now” 4312 is selected, the search query for each KPI (e.g., service KPI, adhoc KPI) that is being represented in a service-monitoring dashboard is executed using a pre-defined time range, and the current information for the corresponding KPI is displayed in the service-monitoring dashboard. In one implementation, the pre-defined time range for the “Now” 4312 option is “2 minutes”. The search queries can be executed every 2 minutes using four hours of data up until 2 minutes ago. The pre-defined time range can be configurable.


When a historical preset end parameter (e.g., “Yesterday” 4319) is selected, the end parameter is relative to when the search queries for the KPI are executed for the service monitoring dashboard. For example, if the search queries for the KPI are executed for the service monitoring dashboard at 1 pm today, then the search queries use a corresponding range of data (e.g., four hours of data) up until 1 pm yesterday.


Referring to FIG. 40, the KPI may be for Request Response Time for a Web Hosting service. The time range “Last 15 minutes” may be selected for the service-monitoring dashboard presented to a user, and the value 4007 (e.g., 1.41) produced by the search query defining the Request Response Time KPI can be the average response time using the last 15 minutes of machine data associated with the entities providing the Web Hosting service from the time of the request. FIG. 42 illustrates an example GUI 4200 illustrating a search query and a search result for a Noel gauge widget, a single value widget, and a trend indicator widget, in accordance with one or more implementations of the present disclosure. A single value widget is discussed in greater detail below in conjunction with FIG. 41. A trend indicator widget is discussed in greater detail below in conjunction with FIG. 46A. Referring to FIG. 42, the KPI may be for Request Response Time. The KPI may be defined by a search query 4501 that outputs a search result having a single value 4203 (e.g., 1.41) for a Noel gauge widget, a single value widget, and/or a trend indicator widget. The search query 4201 can include a statistical function 4205 (e.g., average) to produce the single value (e.g., value 4203) to represent response time using machine data from the Last 15 minutes 4207.



FIG. 41 illustrates an example single value widget 4100, in accordance with one or more implementations of the present disclosure. Single value widget 4100 can include the value 4107, produced by the search query defining the KPI, in a shape 4101 (e.g., box). The shape can be colored using a color 4103 representative of the state (e.g., normal, warning, critical) to which the value produced by the search query corresponds. The value 4107 can be also colored using a nominal color or a color representative of the state to which the value produced by the search query corresponds. The single value widget 4100 can display a label to describe the KPI and the metric unit used for the KPI. A user can provide input, via the dashboard-creation graphical interface, indicating whether to apply a nominal color or color representative of the state.


The machine data used by the search query to produce the value 4107 is based on a time range (e.g., user selected time range). For example, the KPI may be fore Request Response Time for a Web Hosting service. The time range “Last 15 minutes” may be selected for the service-monitoring dashboard presented to a user. The value 4107 (e.g., 1.41) produced by the search query defining the Request Response Time KPI can be the average response time using the last 15 minutes of machine data associated with the entities providing the Web Hosting service from the time of the request.



FIG. 44 illustrates spark line widget 4400, in accordance with one or more implementations of the present disclosure. Spark line widget 4400 can include two shapes (e.g., box 4405 and rectangular box 4402). One shape (e.g., box 4405) of the spark line widget 4400 can include a value 4407, which is described in greater detail below. The shape (e.g., box 4405) can be colored using a color 4406 representative of the state (e.g., normal, warning, critical) to which the value 4407 corresponds. The value 4407 can be also be colored using a nominal color or a color representative of the state to which the value 4407 corresponds. A user can provide input, via the dashboard-creation graphical interface, indicating whether to apply a nominal color or color representative of the state.


Another shape (e.g., rectangular box 4402) in the spark line widget 4400 can include a graph 4401 (e.g., line graph), which is described in greater detail below, that includes multiple data points. The shape (e.g., rectangular box 4402) containing the graph 4401 can be colored using a color representative of the state (e.g., normal, warning, critical) of which a corresponding data point (e.g., latest data point) falls into. The graph 4401 can be colored using a color representative of the state (e.g., normal, warning, critical) of which a corresponding data point falls into. For example, the graph 4401 may be a line graph that transitions between green, yellow, red, depending on the value of a data point in the line graph. In one implementation, input (e.g., user input) can be received, via the service-monitoring dashboard, of a selection device hovering over a particular point in the line graph, and information (e.g., data value, time, and color) corresponding to the particular point in the line graph can be displayed in the service-monitoring dashboard, for example, in the spark line widget 4400. The spark line widget 4400 can display a label to describe the KPI and the metric unit used for the KPI.


The spark line widget 4400 is showing data in a time series graph with the graph 4401, as compared to a single value widget (e.g., single value widget 4100) and a Noel gauge widget (e.g., Noel gauge widget 4000) that display a single data point, for example as illustrated in FIG. 42. The data points in the graph 4401 can represent what the values, produced by the search query defining the KPI, have been over a time range (e.g., time range selected in GUI 4300). FIG. 45A illustrates an example GUI 4500 illustrating a search query and search results for a spark line widget, in accordance with one or more implementations of the present disclosure. The KPI may be for Request Response Time. The KPI may be defined by a search query 4501 that produces multiple values, for example, to be used for a spark line widget. A user may have selected a time range of “Last 15 minutes” 4507 (e.g., time range selected in GUI 4300). The machine data used by the search query 4501 to produce the search results can be based on the last 15 minutes. For example, the search results can include a value for each minute in the last 15 minutes. The values 4503 in the search results can be used as data points to plot a graph (e.g., graph 4401 in FIG. 44) in the spark line widget. Referring to FIG. 44, the graph 4401 is from data over a period of time (e.g., Last 15 minutes). The graph 4401 is made of data points (e.g., 15 values 4503 in search results in FIG. 45A). Each data point is an aggregate from the data for a shorter period of time (e.g., unit of time). For example, if the time range “Last 15 minutes” is selected, each data point in the graph 4401 represents a unit of time in the last 15 minutes. For example, the unit of time may be one minute, and the graph contains 15 data points, one for each minute for the last 15 minutes. Each data point can be the average response time (e.g., avg(spent) in search query 4501 in FIG. 45A) for the corresponding minute. In another example, if the time range “Last 4 hours” is selected, and the unit of time used for the graph 4401 is 15 minutes, then the graph 4401 would be made from 16 data points.


In one implementation, the value 4407 in the other shape (e.g., box 4405) in the spark line widget 4400 represents the latest value in the time range. For example, the value 4407 (e.g., 1.32) can represent the last data point 4403 in the graph 4401. If the time range “Last 15 minutes” is selected, the value 4407 (e.g., 1.32) can represent the average response time of the data in that last minute of the 15 minute time range.


In another implementation, the value 4407 is the first data point in the graph 4401. In another implementation, the value 4407 represents an aggregate of the data in the graph 4401. For example, a statistical function can be performed on using the data points for the time range (e.g., Last 15 minutes) for the value 4407. For example, the value 4407 may be the average of all of the points in the graph 4401, the maximum value from all of the points in the graph 4401, the mean of all of the points in the graph 4401. Input (e.g., user input) can be received, for example, via the dashboard-creation graphical interface, specifying type (e.g. latest, first, average, maximum, mean) of value to be represented by value 4407.



FIG. 45B illustrates spark line widget 4520, in accordance with one or more implementations of the present disclosure. Spark line widget 4520 can include a graph 4521 (e.g., line graph). The data points in the graph 4521 can represent what the values, produced by the search query defining the KPI, have been over a time range. The graph 4521 is from data over a period of time (e.g., Last 30 minutes). The graph 4521 is made of data points.


When a user hovers, for example, a point over a point in time in the graph 4521, data that corresponds to the point in time can be displayed in a box 4525. The data can include, for example, and is not limited to, a value, time, and a state corresponding to the KPI at that point in time. In one implementation, a line indicator 4523 is displayed that corresponds to the point in time.



FIG. 46A illustrates a trend indicator widget 4600, in accordance with one or more implementations of the present disclosure. Trend indicator widget 4600 can include a shape 4601 (e.g., rectangular box) that includes a value 4607, produced by the search query defining the KPI, in another shape 4601 (e.g., box) and an arrow 4605. The shape 4601 containing the value 4607 can be colored using a color 4603 representative of the state (e.g., normal, warning, critical) of which the value 4607 produced by the search query falls into. The value 4607 can be of a nominal color or can be of a color representative of the state for which the value produced by the search query falls into. A user can provide input, via the dashboard-creation graphical interface, indicating whether to apply a nominal color or color representative of the state. The trend indicator widget 4600 can display a label to describe the KPI and the metric unit used for the KPI.


The arrow 4605 can indicate a trend pertaining to the KPI by pointing in a direction. For example, the arrow 4605 can point in a general up direction to indicate a positive or increasing trend, the arrow 4605 can point in a general down direction to indicate a negative or decreasing trend, or the arrow 4605 can point in a general horizontal direction to indicate no change in the KPI. The direction of the arrow 4605 in the trend indicator widget 4600 may change when a KPI is being updated, for example, in a service-monitoring dashboard, depending on the current trend at the time the KPI is being updated.


In one implementation, a color is assigned to each trend (e.g., increasing trend, decreasing trend). The arrow 4605 can be of a nominal color or can be of a color representative of the determined trend. A user can provide input, via the dashboard-creation graphical interface, indicating whether to apply a nominal color or color representative of the trend. The shape 4607 can be of a nominal color or can be of a color representative of the determined trend. A user can provide input, via the dashboard-creation graphical interface, indicating whether to apply a nominal color or color representative of the trend.


In one implementation, the trend represented by the arrow 4605 is of whether the value 4607 has been increasing or decreasing in a selected time range relative to the last time the KPI was calculated. For example, if the time range “Last 15 minutes” is selected, the average of the data points of the last 15 minutes is calculated, and the arrow 4605 can indicate whether the average of the data points of the last 15 minutes is greater that than the average calculated from the time range (e.g., 15 minutes) prior. In one implementation, the trend indicator widget 4600 includes a percentage indicator indicating a percentage of the value 4607 increasing or decreasing in a selected time range relative to the last time the KPI was calculated.


In another implementation, the arrow 4605 indicates whether the last value for the last data point in the last 15 minutes is greater than the value immediately before the last data point.


The machine data used by the search query to produce the value 4607 is based on a time range (e.g., user selected time range). For example, the KPI may be fore Request Response Time for a Web Hosting service. The time range “Last 15 minutes” may be selected for the service-monitoring dashboard presented to a user. The value 4607 (e.g., 1.41) produced by the search query defining the Request Response Time KPI can be the average response time using the last 15 minutes of machine data associated with the entities providing the Web Hosting service from the time of the request.


As discussed above, once the dashboard template is defined, it can be saved, and then used to generate a service-monitoring dashboard for display. The dashboard template can identify the KPIs selected for the service-monitoring dashboard, KPI widgets to be displayed for the KPIs in the service-monitoring dashboard, locations in the service-monitoring dashboard for displaying the KPI widgets, visual characteristics of the KPI widgets, and other information (e.g., the background image for the service-monitoring dashboard, an initial time range for the service-monitoring dashboard).



FIG. 46B illustrates an example GUI 4610 for creating and/or editing a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. GUI 4610 can present a list 4612 of existing service-monitoring dashboards that have already been created. The list 4612 can represent service-monitoring dashboards that have data that is stored in a data store for displaying the service-monitoring dashboards. In one implementation, the list 4612 includes one or more default service-monitoring dashboards that can be edited.


Each service-monitoring dashboard in the list 4612 can include a title 4611. In one implementation, the title 4611 is a link, which when selected, can display the particular service-monitoring dashboard in a GUI in view mode, as described in greater detail below.


Each service-monitoring dashboard in the list 4612 can include a button 4613, which when selected, can present a list of actions, which can be taken on a particular service-monitoring dashboard, from which a user can select from The actions can include, and are not limited to, editing a service-monitoring dashboard, editing a title and/or description for a service-monitoring dashboard, editing permissions for a service-monitoring dashboard, cloning a service-monitoring dashboard, and deleting a service-monitoring dashboard. When an action is selected, one or more additional GUIs can be displayed for facilitating user input pertaining to the action, as described in greater detail below. For example, button 4613 can be selected, and an editing action can be selected to display a GUI (e.g., GUI 4620 in FIG. 46C described below) for editing the “Web Arch” service-monitoring dashboard.


GUI 4610 can display application information 4615 for each service-monitoring dashboard in the list 4612. The application information 4615 can indicate an application that is used for creating and/or editing the particular service-monitoring dashboard. GUI 4610 can display owner information 4614 for each service-monitoring dashboard in the list 4612. The owner information 4614 can indicate a role that is assigned to the owner of the particular service-monitoring dashboard.


GUI 4610 can display permission information 4616 for each service-monitoring dashboard in the list 4612. The permission information can indicate a permission level (e.g., application level, private level). An application level permission level allows any user that is authorized to access to the service-monitoring dashboard creation and/or editing GUIs permission to access and edit the particular service-monitoring dashboard. A private level permission level allows a single user (e.g., owner, creator) permission to access and edit the particular service-monitoring dashboard. In one implementation, a permission level include permissions by role. In one implementation, one or more specific users can be specified for one or more particular levels.


GUI 4610 can include a button 4617, which when selected can display GUI 4618 in FIG. 46BA for specifying information for a new service-monitoring dashboard.



FIG. 46BA illustrates an example GUI 4618 for specifying information for a new service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. GUI 4618 can include a text box 4619A enabling a user to specify a title for the service-monitoring dashboard, a text box 4619B enabling a user to specify a description for the service-monitoring dashboard, and buttons 4916C enabling a user to specify permissions for the service-monitoring dashboard.



FIG. 46C illustrates an example GUI 4620 for editing a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. GUI 4620 is displaying the service-monitoring dashboard in an edit mode that enables a user to edit the service-monitoring dashboard via a KPI-selection interface 4632, a modifiable dashboard template 4360, a configuration interface 4631, and a customization toolbar 4633.


The current configuration for the “Web Arch” service-monitoring dashboard that is stored in a data store can be used to populate the modifiable dashboard template 4630. One or more widgets that have been selected for one or more KPIs can be displayed in the modifiable dashboard template 4630.


A KPI that is being represented by a widget in the modifiable dashboard template 4630 can be a service-related KPI or an adhoc KPI. A service-related KPI is a KPI that is related to one or more services and/or one or more entities. A service-related KPI can be defined using service monitoring GUIs, as described in above in conjunction with FIGS. 21-33A. An ad-hoc KPI is a key performance indicator that is not related to any service or entity. For example, service-related KPI named “Web performance” is represented by Noel gauge widget 4634. The Web performance can be a KPI that is related to “Splunk Service” 4635.


The configuration interface 4631 can display data that pertains to a KPI (e.g., service-related KPI, adhoc KPI) that is selected in the modifiable dashboard template 4630. For example, an adhoc KPI can be defined via GUI 4620. For example, an adhoc search button 4621 can be activated and a location (e.g., location 4629) can be selected in the modifiable dashboard template 4630. A widget 4628 for the adhoc KPI can be displayed at the selected location 4629. In one implementation, a default widget (e.g., single value widget) is displayed for the adhoc KPI.


The configuration interface 4631 can display data that pertains to the adhoc KPI. For example, configuration interface 4631 can display source information for the adhoc KPI. The source information can indicate whether the adhoc KPI is derived from an adhoc search or data model. An adhoc KPI can be defined by a search query. The search query can be derived from a data model or an adhoc search query. An adhoc search query is a user-defined search query.


In one implementation, when the adhoc search button 4621 is activated for creating an adhoc KPI, the adhoc KPI is derived from an adhoc search query by default, and the adhoc type button 4624 is displayed as enabled. The adhoc type button 4624 can also be user-selected to indicate that the adhoc KPI is to be derived from an adhoc search query.


When the adhoc type button 4624 is enabled, a text box 4626 can be displayed for the search language defining the adhoc search query. In one implementation, the text box 4626 is populated with the search language for a default adhoc search query. In one implementation, the default adhoc search query is a count of events, and the search language “index=internal|timechart count is displayed in the text box 4626. A user can edit the search language via the text box 4626 to change the adhoc search query.


When the data model type button 4625 is selected, the configuration interface 4631 can display an interface for using a data model to define the adhoc KPI is displayed. FIG. 46D illustrates an example interface 4640 for using a data model to define an adhoc KPI, in accordance with one or more implementations of the present disclosure. If button 4641 is selected, a GUI is displayed that enables a user to specify a data model, an object of the data model, and a field of the object for defining the adhoc KPI. If button 4643 is selected, a GUI is displayed that enables a user to select a statistical function (e.g., count, distinct count) to calculate a statistic using the value(s) from the field.


Referring to FIG. 46C, one or more types of KPI widgets can support the configuration of thresholds for the adhoc KPI. For example, a Noel gauge widget, a spark line widget, and a trend indicator widget (also referred to as a “single value delta widget” throughout this document) can support setting one or more thresholds for the adhoc KPI. For example, if the Noel gauge button 4627 is activated, the configuration interface 4631 can display an interface for setting one or more thresholds for the adhoc KPI.



FIG. 46E illustrates an example interface 4645 for setting one or more thresholds for the adhoc KPI, in accordance with one or more implementations of the present disclosure. The configuration interface 4645 can include a button 4647, which when selected, displays a GUI (e.g., GUI 3100 in FIG. 31A, GUI 3150 in FIG. 31B) for setting one or more thresholds for the adhoc KPI. If the update button 4648 is activate, the widget for the adhoc KPI can be updated, as described below.


Referring to FIG. 46C, if the update button (e.g., update button 4648 in FIG. 46E) is activated, the widget 4628 can be updated to display a Noel gauge widget. If the adhoc KPI is being defined using a data model, the configuration interface 4631 can display the user selected settings for the adhoc KPI that have been specified, for example, using GUI 4640 in FIG. 46D.


Referring to FIG. 46C, if a service-related KPI widget is selected in the modifiable dashboard template 4630, the configuration interface 4631 can display information pertaining to the service-related KPI. For example, the Noel gauge widget 4634 can be selected, and the configuration interface 4631 can display information pertaining to the “Web performance” KPI that is related to the Splunk Service 4635.



FIG. 46F illustrates an example interface 4650 for a service-related KPI, in accordance with one or more implementations of the present disclosure. The text box 4651 can display the search language for the search query used to define the service-related KPI. The text box 4651 can be disabled to indicate that the service-related KPI cannot be edited from the glass table.


Referring to FIG. 46C, if the run search link 4636 is activated, a search GUI that displays information (e.g., search language, search result set) for a KPI (e.g., service KPI, adhoc KPI) that is selected in the modifiable dashboard template 4630.



FIG. 46GA depicts an exemplary graphical user interface 4649A that can configure/override the selection behavior (e.g., click-in behavior) of a widget in the modifiable dashboard template 4630. For example, upon selecting one of the referenced widgets, the various menu items/elements of graphical user interface 4649 can be selected in order to define the type of visualization that is to be presented in response to the selection of a particular widget (e.g., glass tables, deep dives, dashboards, etc.). Additionally, FIG. 46GA depicts a graphical user interface 4649B that can configure/override the default behavior of a widget in the modifiable dashboard template 4630 such that a custom URL is to be selected/presented in response to selection of a particular widget.



FIG. 46GB illustrates exemplary GUI 4653 which may be another example of a modifiable dashboard template. As shown, GUI 4653 may include a portion 4654 that provides the features of GUI 4649A and 4649B and enables a user to reconfigure or otherwise override the default drill down behavior of a widget, such as a widget that depicts various aspects of a KPI and/or an aggregate KPI (e.g., health score) GUI 4653 includes various GUI elements such as KPI/health score widgets 4652A-CA-C which may pertain to one or more services. It should be understood that while by default, upon selecting (e.g., ‘clicking on’) a particular widget the GUI may navigate a deep dive GUI associated with the particular service (e.g., a visual interface that provides an in-depth look at KPI data that reflects how a service or entity is performing over a certain period of time, as described herein), as described herein a user can also be enabled to override or reconfigure such navigation (e.g., in favor of navigation to another interface, report, etc.). For example, the referenced GUI can be reconfigured to depict one or more other items, including but not limited to: another deep dive GUI, a dashboard (e.g., a GUI that incorporates one or more widgets, each of which, for example, provides a representation of one or more aspects, values, etc. associated with a KPI, as described herein) and/or any other URL (which can refer to one or more of the referenced GUI elements and/or others).


The described functionality can be advantageous for various users in various scenarios. For example, certain users may utilize a “glass table” (e.g., GUI 4653 as depicted in FIG. 46GB) in order to build/depict views of various KPIs at different levels of abstractions (e.g., for their organization). For example, a user may wish to monitor three services, each of which includes four KPIs (totaling 12 KPIs and three health scores). In the first glass table the user may wish to depict health scores that reflect a high level overview (which depicts an overall, general sense of health). In a scenario in which a health score on such a page may change (e.g., to yellow, orange, or red), upon selecting such a health score the next level down (by default) may simply present another glass table (which may, for example, depict a process layout for that particular service with the KPI widgets for that particular service clicked on), which is not necessarily of particular interest to the user at that time. Accordingly, as described herein, users can reconfigure/override such defaults, thereby enabling the creation of multi-layered glass tables and also enabling users to drill down to deep dives, dashboards and custom URLs upon selecting various GUI elements.



FIG. 46HA illustrates an example GUI 4655 for editing layers for items, in accordance with one or more implementations of the present disclosure. The modifiable dashboard template 4658 can include multiple layers. The layers are defined by the items (e.g., widget, line, text, image, shape, connector, etc.) in the modifiable dashboard template 4658. In one implementation, the ordering of the layers (e.g., front to back, and back to front) is based on the order for when the items are added to the modifiable dashboard template 4658. In one implementation, the most recent item that is added to the modifiable dashboard template 4658 corresponds to the most forward layer.


One or more items can be overlaid with each other. The layers that correspond to the overlaid items can form a stack of layers in the modifiable dashboard template 4658. For example, items 4656A-H form a stack of layers.


A current layer for an item can be relative to the other layers in the stack. The configuration interface 4659 can include layering buttons 4657A-D for changing the layer for an item that is selected in the modifiable dashboard template 4658. A layering button can change the layer order one layer at a time for an item. For example, there can be a “Bring Forward” button 4657C to bring a selected item one layer forward, and there can be a “Send Backward” button 4657D to send a selected item one layer backward. A layering button can change the layer order more than one layer at a time. For example, there can be a “Bring to Front” button 4657A to bring a selected item to the most forward layer, and there can be a “Send to Back” button 4657B to send a selected item to the most backward layer. For example, item 4656F can be selected and the “Send to Back” button 4657B can be activated. In response to activating the “Send to Back” button 4657B, the items 4656F can be displayed in the most backward layer in the stack. FIG. 46HB illustrates an example GUI 4660 for editing layers for items, in accordance with one or more implementations of the present disclosure. Item 4661 is displayed in the most backward layer in a stack defined by selected items.



FIG. 46I illustrates an example GUI 4665 for moving a group of items, in accordance with one or more implementations of the present disclosure. A group of items 4667 can be defined, for example, by multi-selecting multiple elements in modifiable dashboard template 4669. In one implementation, a shift-click command is used for selecting multiple elements that are to be treated as a group. The group of items 4667 can initially be in location 4666. The items can be moved as a group to location 4668.


GUI 4665 can include a panning button 4675, to enable panning mode for the modifiable dashboard template 4669. When panning mode is enabled, the items in the modifiable dashboard template 4669 can be moved within the modifiable dashboard template 4669 using a panning function. In one implementation, the modifiable dashboard template 4669 is processed as having an infinite size.


GUI 4665 can include an image button 4673, which when selected, can display a GUI for selecting one or more images to import into the modifiable dashboard template 4669. For example, image 4674 has been imported into the modifiable dashboard template 4669. When an image 4674 is selected in the modifiable dashboard template 4669, the image 4674 can be resized based on user interaction with the image. For example, a user can select an image, click a corner of the image and drag the image to resize the image.


The configuration interface 4670 can include a lock position button 4671 for locking one or more selected items in a position in the modifiable dashboard template 4669. In one implementation, when an auto-layout button 4672 is activated, an item that has a locked position is not affected by the auto-layout function.


When the auto-layout button 4672 is activated, the modifiable dashboard template 4669 automatically displays the unlocked widgets (e.g., service-related KPI widgets, adhoc KPI widgets) in a serial order in the modifiable dashboard template 4669. In one implementation, the order is based when the widgets were added to the modifiable dashboard template 4669. In one implementation, the order is based on the layers that correspond to the widgets. In one implementation, when a layer is changes for a widget, the order uses the current layer. In one implementation, the order is based on the last KPI state that is associated with the particular widget. In one implementation, the order is based on any combination of the above.


In one implementation, the modifiable dashboard template 4669 automatically displays one or more items (e.g., widget, line, text, image, shape, connector, etc.) in a serial order in the modifiable dashboard template 4669. In one implementation, the order is based when the items were added to the modifiable dashboard template 4669. In one implementation, the order is based on the layers that correspond to the items. In one implementation, when a layer is changes for an item, the order uses the current layer. In one implementation, the order is based on the type (e.g., widget, line, text, image, shape, connector, etc.) of item. In one implementation, the order is based on any combination of the above.



FIG. 46J illustrates an example GUI 46000 for connecting items, in accordance with one or more implementations of the present disclosure. GUI 46000 can include a connector button 46001. When the connector button 46001 has been activated, a user can select a first item 46005 and a second item 46007 to be connected. The modifiable dashboard template can display a connector 46003 in response to the user selection of the first item 46005 and second item 46007. In one implementation, the connector 46003 is an arrow connector by default.


The direction of the arrow can correspond to the selection of the first item 46005 and the second item 46007. The type of connector (e.g., single arrow, double arrow, and no arrow) and the direction of the connector can be edited based on user input received via the modifiable dashboard template 46009. In one implementation, when one of the connected items (e.g., first item 46005, second item 46007) is moved in the modifiable dashboard template 46009, the connector 46003 moves accordingly.


When a connector 46003 is selected, the configuration interface 46011 can display text boxes and/or lists for editing the connector. For example, the color, stroke width, stoke type (e.g., solid line, dashed line, etc.), and label of a connector 46003 can be edited via user input received via the text boxes and/or lists. For example, the configuration interface 46011 can display a list of colors which a user can select from and apply to the connector.


GUI 46000 can include buttons for adding shape(s) to the modifiable dashboard template 46009. For example, when button 46013 is activated, a rectangular type of shape can be added to the modifiable dashboard template 46009. When button 46015 is activated, an elliptical type of shape can be added to the modifiable dashboard template 46009. When a shape (e.g., square 46007) is selected, the configuration interface 46011 can display text boxes and/or lists for editing the shape. For example, the fill color, fill pattern, border color, border width, and border type (e.g., solid line, dashed line, double line, etc.) of a shape can be edited via user input received via the text boxes and/or lists.


GUI 46000 can include a button 46017 for adding line(s) to the modifiable dashboard template 46009. For example, when button 46017 is activated, a line 46019 can be added to the modifiable dashboard template 46009. When a line 46019 is selected, the configuration interface 46011 can display text boxes and/or lists for editing the line. For example, the fill color, fill pattern, border color, border width, and line type (e.g., solid line, dashed line, double line, etc.) of a line can be edited via user input received via the text boxes and/or lists.



FIG. 46K illustrates a block diagram 46030 of an example for editing a line using the modifiable dashboard template, in accordance with one or more implementations of the present disclosure. A line 46031A can be displayed in the modifiable dashboard template (e.g., modifiable dashboard template 46009 in FIG. 46J). The line 46031A can include one or more control points 46033, which each can be selected and moved to create one or more vertices in the line 46031A. For example, control point 46033 in line 46031A can be dragged to location 46306 to create a vertex, as shown in line 46031B. In another example, control point 46035 in line 46031B can be dragged to location 46307 to create another vertex, as shown in line 46031C. In one implementation, a connector that is displayed in the modifiable dashboard template can include one or more control points, which each can be selected and moved to create one or more vertices in the connector.



FIG. 47A is a flow diagram of an implementation of a method 4750 for creating and causing for display a service-monitoring dashboard, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method is performed by the client computing machine. In another implementation, the method is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 4751, the computing machine identifies one or more key performance indicators (KPIs) for one or more services to be monitored via a service-monitoring dashboard. A service can be provided by one or more entities. Each entity can be associated with machine data. The machine data can include unstructured data, log data, and/or wire data. The machine data associated with an entity can include data collected from an API for software that monitors that entity.


A KPI can be defined by a search query that derives one or more values from machine data associated with the one or more entities that provide the service. Each KPI can be defined by a search query that is either entered by a user or generated through a graphical interface. In one implementation, the computing machine accesses a dashboard template that is stored in a data store that includes information identifying the KPIs to be displayed in the service-monitoring dashboard. In one implementation, the dashboard template specifies a service definition that associates the service with the entities providing the service, specifies the KPIs of the service, and also specifies the search queries for the KPIs. As discussed above, the search query defining a KPI can derive one or more values for the KPI using a late-binding schema that it applies to machine data. In some implementations, the service definition identified by the dashboard template can also include information on predefined states for a KPI and various visual indicators that should be used to illustrate states of the KPI in the dashboard.


The computing machine can optionally receive input (e.g., user input) identifying one or more ad hoc searches to be monitored via the service-monitoring dashboard without selecting services or KPIs.


At block 4753, the computing machine identifies a time range. The time range can be a default time range or a time range specified in the dashboard template. The machine data can be represented as events. The time range can be used to indicate which events to use for the search queries for the identified KPIs.


At block 4755, for each KPI, the computing machine identifies a KPI widget style to represent the respective KPI. In one implementation, the computing machine accesses the dashboard template that includes information identifying the KPI widget style to use for each KPI. As discussed above, examples of KPI widget styles can include a Noel gauge widget style, a single value widget style, a spark line widget style, and a trend indicator widget style. The computing machine can also obtain, from the dashboard template, additional visual characteristics for each KPI widget, such as, the name of the widget, the metric unit of the KPI value, settings for using nominal colors or colors to represent states and/or trends, the type of value to be represented in KPI widget (e.g., the type of value to be represented by value 4407 in a spark line widget), etc.


The KPIs widget styles can display data differently for representing a respective KPI. The computing machine can produce a set of values and/or a value, depending on the KPI widget style for a respective KPI. If the KPI widget style represents the respective KPI using values for multiple points in time in the time range, method 4750 proceeds to block 4757. Alternatively, if the KPI widget style represents the respective KPI using a single value during the time range, method 4750 proceeds to block 4759.


At block 4759, if the KPI widget style represents the respective KPI using a single value, the computing machine causes a value to be produced from a set of machine data or events whose timestamps are within the time range. The value may be a statistic calculated based on one or more values extracted from a specific field in the set of machine data or events when the search query is executed. The statistic may be an average of the extracted values, a mean of the extracted values, a maximum of the extracted values, a last value of the extracted values, etc. A single value widget style, a Noel gauge widget style, and trend indicator widget style can represent a KPI using a single value.


The search query that defines a respective KPI may produce a single value which a KPI widget style can use. The computing machine can cause the search query to be executed to produce the value. The computing machine can send the search query to an event processing system. As discussed above, machine data can be represented as events. The machine data used to derive the one or more KPI values can be identifiable on a per entity basis by referencing entity definitions that are aggregated into a service definition corresponding to the service whose performance is reflected by the KPI.


The event processing system can access events with time stamps falling within the time period specified by the time range, identify which of those events should be used (e.g., from the one or more entity definitions in the service definition for the service whose performance is reflected by the KPI), produce the result (e.g., single value) using the identified events, and send the result to the computing machine. The computing machine can receive the result and store the result in a data store.


At block 4757, if the KPI widget style represents the respective KPI using a set of values, the computing machine causes a set of values for multiple points in time in the time range to be produced. A spark line widget style can represent a KPI using a set of values. Each value in the set of values can represent an aggregate of data in a unit of time in the time range. For example, if the time range is “Last 15 minutes”, and the unit of time is one minute, then each value in the set of values is an aggregate of the data in one minute in the last 15 minutes.


If the search query that defines a respective KPI produces a single value instead of a set of multiple values as required by the KPI widget style (e.g., for the graph of the spark line widget), the computing machine can modify the search query to produce the set of values (e.g., for the graph of the spark line widget). The computing machine can cause the search query or modified search query to be executed to produce the set of values. The computing machine can send the search query or modified search query to an event processing system. The event processing system can access events with time stamps falling within the time period specified by the time range, identify which of those events should be used, produce the results (e.g., set of values) using the identified events, and send the results to the computing machine. The computing machine can store the results in a data store.


At block 4761, for each KPI, the computing machine generates the KPI widget using the KPI widget style and the value or set of values produced for the respective KPI. For example, if a KPI is being represented by a spark line widget style, the computing machine generates the spark line widget using a set of values produced for the KPI to populate the graph in the spark line widget. The computing machine also generates the value (e.g., value 4407 in spark line widget 4400 in FIG. 44) for the spark line widget based on the dashboard template. The dashboard template can store the selection of the type of value that is to be represented in a spark line widget. For example, the value may represent the first data point in the graph, the last data point the graph, an average of all of the points in the graph, the maximum value from all of the points in the graph, or the mean of all of the points in the graph.


In addition, in some implementations, the computing machine can obtain KPI state information (e.g., from the service definition) specifying KPI states, a range of values for each state, and a visual characteristic (e.g., color) associated with each state. The computing machine can then determine the current state of each KPI using the value or set of values produced for the respective KPI, and the state information of the respective KPI. Based on the current state of the KPI, a specific visual characteristic (e.g., color) can be used for displaying the KPI widget of the KPI, as discussed in more detail above.


At block 4763, the computing machine generates a service-monitoring dashboard with the KPI widgets for the KPIs using the dashboard template and the KPI values produced by the respective search queries. In one implementation, the computing machine accesses a dashboard template that is stored in a data store that includes information identifying the KPIs to be displayed in the service-monitoring dashboard. As discussed above, the dashboard template defines locations for placing the KPI widgets, and can also specify a background image over which the KPI widgets can be placed.


At block 4765, the computing machine causes display of the service-monitoring dashboard with the KPI widgets for the KPIs. Each KPI widget provides a numerical and/or graphical representation of one or more values for a corresponding KPI. Each KPI widget indicates how an aspect of the service is performing at one or more points in time. For example, each KPI widget can display a current KPI value, and indicate the current state of the KPI using a visual characteristic associated with the current state. In one implementation, the service-monitoring dashboard is displayed in a viewing/investigation mode based on a user selection to view the service-monitoring dashboard is displayed in the viewing/investigation mode.


At block 4767, the computing machine optionally receives a request for detailed information for one or more KPIs in the service-monitoring dashboard. The request can be received, for example, from a selection (e.g., user selection) of one or more KPI widgets in the service-monitoring dashboard.


At block 4759, the computing machine causes display of the detailed information for the one or more KPIs. In one implementation, the computing machine causes the display of a deep dive visual interface, which includes detailed information for the one or more KPIs. A deep dive visual interface is described in greater detail below in conjunction with FIG. 50A.


The service-monitoring dashboard may allow a user to change a time range. In response, the computing machine can send the search query and the new time range to an event processing system. The event processing system can access events with time stamps falling within the time period specified by the new time range, identify which of those events should be used, produce the result (e.g., one or more values) using the identified events, and send the result to the computing machine. The computing machine can then cause the service-monitoring dashboard to be updated with new values and modified visual representations of the KPI widgets.



FIG. 47B illustrates an example service-monitoring dashboard GUI 4700 that is displayed based on the dashboard template, in accordance with one or more implementations of the present disclosure. GUI 4700 includes a user selected background image 4702 and one or more KPI widgets for one or more services that are displayed over the background image 4702. GUI 4700 can include other elements, such as, and not limited to text, boxes, connections, and widgets for ad hoc searches. Each KPI widget provides a numerical or graphical representation of one or more values for a corresponding key performance indicator (KPI) indicating how an aspect of a respective service is performing at one or more points in time. For example, GUI 4700 includes a spark line widget 4718 which may be for an aspect of Service-B, and a Noel gauge widget 4708 which may be for another aspect of Service-B. In some implementations, the appearance of the widgets 4718 and 4708 (as well as other widgets in the GUI 4700) can reflect the current state of the respective KPI (e.g., based on color or other visual characteristic).


Each service is provided by one or more entities. Each entity is associated with machine data. The machine data can include for example, and is not limited to, unstructured data, log data, and wire data. The machine data that is associated with an entity can include data collected from an API for software that monitors that entity. The machine data used to derive the one or more values represented by a KPI is identifiable on a per entity basis by referencing entity definitions that are aggregated into a service definition corresponding to the service whose performance is reflected by the KPI.


Each KPI is defined by a search query that derives the one or more values represented by the corresponding KPI widget from the machine data associated with the one or more entities that provide the service whose performance is reflected by the KPI. The search query for a KPI can derive the one or more values for the KPI it defines using a late-binding schema that it applies to machine data.


In one implementation, the GUI 4700 includes one or more search result widgets (e.g., widget 4712) displaying a value produced by a respective search query, as specified by the dashboard template. For example, widget 4712 may represent the results of a search query producing a stats count for a particular entity.


In one implementation, GUI 4700 facilitates user input for displaying detailed information for one or more KPIs. A user can select one or more KPI widgets to request detailed information for the KPIs represented by the selected KPI widgets. The detailed information for each selected KPI can include values for points in time during the period of time. The detailed information can be displayed via one or more deep dive visual interfaces. A deep dive visual interface is described in greater detail below in conjunction with FIG. 50A.


Referring to FIG. 47B, GUI 4700 facilitates user input for changing a time range. The machine data used by a search query to produce a value for a KPI is based on a time range. As described above in conjunction with FIG. 43A, the time range can be a user-defined time range. For example, if the time range “Last 15 minutes” is selected, the last 15 minutes would be an aggregation period for producing the value. GUI 4700 can be updated with one or more KPI widgets that each represent one or more values for a corresponding KPI indicating how a service provided is performing at one or more points in time based on the change to the time range.



FIG. 47C illustrates an example service-monitoring dashboard GUI 4750 that is displayed in view mode based on the dashboard template, in accordance with one or more implementations of the present disclosure. In one implementation, when a service-monitoring dashboard is in view mode, the service-monitoring dashboard cannot be edited. GUI 4750 can include a button 4755, which when selected, can display a dashboard creation GUI (e.g., GUI 4620 in FIG. 46C) for editing a service-monitoring dashboard.


GUI 4750 can display the items 4751 (e.g., service-related KPI widgets, adhoc KPI widgets, images, connectors, text, shapes, line etc.) as specified using the KPI-selection interface, modifiable dashboard template, configuration interface, and customization tool bar.


In one implementation, one or more widgets (e.g., service-related KPI widgets, adhoc KPI widgets) that are presented in view mode can be selected by a user to display one or more GUIs presenting more detailed information, for example, in a deep dive visualization, as described in greater detail below.


For example, a service-related KPI widget for a particular KPI can be displayed in view mode. When the service-related KPI widget is selected, a deep dive visualization can be displayed that presents data pertaining to the service-related KPI. The service-related KPI is related to a particular service and one or more other services based on dependency. The data that is presented in the deep dive visualization can include data for all of the KPIs that are related to the particular service and/or all of the KPIs from one or more dependent services.


When an adhoc KPI widget is displayed in view mode, and is selected, a deep dive visualization can be displayed that presents data pertaining to the adhoc search for the adhoc KPI.


GUI 4750 can include a button 4753 for displaying an interface (e.g., interface 4312 in FIG. 43B) for specifying an end date and time for a time range to use when executing a search query defining a KPI displayed in GUI 4750.


Dashboard with Service Swapping (Variants)


In one embodiment, a service-monitoring dashboard template that has already been defined may serve as a base template from which other dashboard templates and/or displays may be derived. In one embodiment, a service represented in the base dashboard may be swapped for another service, with an automatic process identifying comparable KPIs to use as widget data sources in the derived or variant dashboard. Methods used to identify new dashboard subjects and perform automatic widget adaptations are not limited to services as the dashboard subjects, KPIs as the widget data sources, and data sources as the widget attributes being automatically adjusted, but a discussion of an embodiment in those general terms is valuable to teach inventive aspects which one of skill understands are not so limited.


Service monitoring system (SMS) embodiments practicing inventive aspects of dashboard template variants as described herein may be expected to experience a reduced storage burden as a single copy of a base dashboard template may provide a substantial portion of the definitional information for many derivative dashboards without the need to duplicate that information as many times. Similarly, SMS embodiments practicing inventive aspects of dashboard template variances described herein may be expected to experience a reduced computing resource burden for administrative, management, and control user interface functions, as a relatively small amount of information may be needed from the user to effectuate a complete derivative dashboard, as compared to the amount needed to define the dashboard completely without derivation. Similarly, SMS embodiments practicing inventive aspects of dashboard template variances described herein may be expected to experience a reduced computing resource burden over the life of a service monitoring work session of a user as the sharing and relatedness of multiple dashboards likely to be used in close sequence to one another offers increased exploitation of data caching, for example. Other advantages and improvement of the SMS may be understood by consideration of the teachings that follow.


FIG. 47D1 illustrates a flow diagram for a method of dashboard template service swapping in one embodiment. Method 47000 of this embodiment implements a submethod including blocks 47010 through 47022 that enables the creation, display, and editing of a swapped service dashboard display/template. Method 47000 of this embodiment is further shown to implement a submethod including blocks 47030 through 47034 that principally enables the display of a dashboard with a swapped service, including the dynamic determination of comparable widget data source KPIs. Blocks 47002a and 47002b depict exemplary user interface devices as may be exercised during the performance of the method or its submethods, and may represent the same device in an implementation. Block 47005 depicts a dashboard template information store as may be used to transiently or persistently, locally or remotely, in a consolidated or in a distributed fashion, or otherwise store information defining or otherwise related to base dashboard templates and/or variant dashboard templates. The information of dashboard template information store 47005 may belong to a larger collection of command/control/configuration information that directs the operations of a service monitoring system (SMS) in an embodiment, for example, information that controls certain outputs generated during the ongoing operation of the SMS. Dashboard template information store 47005 may include, for example, dashboard template definitional information stored by the processing of block 3515 of FIG. 35, discussed earlier.


At block 47010, a base dashboard template is identified. The base dashboard template may be identified from among one or more existing dashboard templates that may be found, for example, in dashboard template information store 47005. Information of dashboard template information store 47005 may be used in the processing of block 47010 to present a list of available base dashboard templates via the user interface device such as 47002a, via which a user may also be able to indicate a selection of, or otherwise indicate an identification of, a base dashboard template for a service swap. At block 47012, the computing machine receives a user indication or identification of the service to be swapped for the base dashboard service in the variant dashboard. At block 47014, the computing machine determines KPIs of the swap-in service that may be used as comparable data sources to the KPIs of the widgets for the swap-out service. The processing of block 47014 may utilize information such as service identities and KPI identifiers as may be contained in definitional information of the identified base dashboard template as well as other command/control/configuration information of the SMS, such as service and KPI definitional information of the swap-out and swap-in services (not specifically shown) to determine the comparable, matching, substitute, counterpart, or otherwise corresponding KPIs of the swap-in service as data sources for the widgets of the dashboard template. The processing of block 47014 may be further considered in relation to FIG. 47D2 and the related discussion that appears later in this description.


At block 47016 of FIG. 47D1, the swapped variant dashboard is caused to be displayed, perhaps via a user interface device such as 47002a. The swapped service dashboard display may share the same general appearance of the base dashboard display owing to the portions of the base dashboard template definitional information commonly used by them, for example: the number, location, layout, and/or style of visual elements on the dashboard, such as widgets and images. Service and KPI identifier elements (e.g., text labels) and/or data-driven visual elements or attributes of widgets (e.g., color, visibility, spark lines, and progress/level/proportion/value indicators) can likely differ in their presented appearance as the swapping process will have determined replacement or substitute portions of dashboard template definitional information effecting the swap. The display produced at block 47016 may be interactive and a user may be unabled to indicate various edits, commands, or instructions desired. In one embodiment, for example, a user may be able to indicate a change or override to an automatic KPI swap for a widget data source determined by earlier processing. User input for such indications is received and processed at block 47020. Cyclic arrow 47018 depicts the iterative nature of the processing blocks 47016 and 47018 in one embodiment, where the swapped dashboard is displayed, user action indicators are received and processed, possibly resulting in changes to the dashboard which is then redisplayed. The user input received and processed at block 47020 may implicitly or explicitly indicate that certain identifying or other definitional information for a swapped variant dashboard should be saved, recorded, or stored, and processing proceeds to block 47022. In one embodiment, saved variant dashboard template information may include a dashboard name and an identification of the swap-in service. In one embodiment, saved variant dashboard template information may further include some identification of KPIs of the swap-in service that are determined to be comparable. In one embodiment, saved variant dashboard template information may be recorded in the form of a URL that may be unique to the variant dashboard. Other embodiments are possible. Block 47022 is shown to be the end of processing for a first submethod of 47000.


Block 47030 is shown to begin the processing for a second submethod of 47000. At block 47030, the computing machine receives an identification of a swapped variant dashboard. The identification, in one embodiment, may be a user indication received from user interface device 47002b. In one embodiment, the identification may include parameters to construct definitional information for a variant dashboard on-the-fly or on-demand, such as identifiers for the base dashboard template and the swap-in service. In one embodiment, the identification may include a unique identifier for a swapped variant dashboard template stored in dashboard template information store 47005. These and other embodiments are possible. Processing may then proceed to block 47032 where comparable KPI data sources are determined for the widgets of the dashboard. The processing of block 47032 may be the same processing as that of block 47014 and may be skipped here, at block 47032, in an embodiment or instance where the processing of block 47030 can identify the comparable KPIs by retrieval from the dashboard template information store 47005, for example. Having determined comparable KPIs for the widgets of the dashboard, processing may proceed to block 47034. At block 47034, a display of the swapped variant dashboard is made, possibly via user interface device 47002b. The processing of block 47034 may be the same processing as that of block 47016. Block 47016 shown to conclude the processing of the second submethod of 47000.


FIG. 47D2 illustrates a flowchart of a method for automatically determining comparable widget KPIs in one embodiment. The processing of this method is what might be used for the processing of block 47014 of FIG. 47D1. At block 47050 of FIG. 47D2, the KPI is identified that serves as the data source for a widget of a service/subject of the base dashboard. In one embodiment, this may be accomplished by scanning definitional information of the base dashboard template as may be stored in a dashboard template information store, for example. In an embodiment, identification of the widget data source at block 47050 may further include ascertaining attributes, characteristics, properties, or the like of the KPI, that are perhaps contained in command/control/configuration information data stores of the SMS, and perhaps include the search query and/or other information that defines the KPI.


At block 47052, one or more candidate KPI widget data sources associated with a second, swap-in service/subject is identified. In one embodiment, this may be accomplished by scanning definitional information related to the swap-in service, as may perhaps be contained in command/control/configuration information of the SMS. As a swap-in service may have many KPIs as potential candidates to substitute as a widget data source, the processing of block 47052 may include creating or identifying a list of such potential candidates, in an embodiment. The processing of block 47052, in such an embodiment, may include a first stage screening or filtering of the KPIs of the swap-in-service to reduce the size of the candidate list. For example, where the widget of the base template is a graphical percentage indicator bar, the first stage filtering may limit the candidate list to KPIs whose values are limited to the 0 to 100 range. Such prescreening may reduce the processing burden of evaluating candidates unlikely or incapable of successfully serving as a substitute data source for the widget.


In an embodiment, the processing of block 47052 may identify a KPI to pass on to an evaluation process. The identified KPI in an embodiment may be the only KPI of the swap-in service, one of a number of KPIs associated with the swap-in service, a KPI from a list of KPIs identified or created by the processing of block 47052, or another source. In an embodiment, the processing of block 47052 may pass forward an entire list of candidates into an evaluation process. In an embodiment, a candidate KPI may be considered a matching, comparable, substitute, counterpart, or equivalent KPI/data source, in an embodiment, where it satisfies one or more predefined matching criteria or rules. The matching criteria/rules may be simple, compounded, layered, or the like, in an embodiment. Evaluation of a candidate for a match for the illustrated embodiment begins at block 47060.


At block 47060, computer processing determines whether a candidate KPI/data source is a match to serve for a swap-in widget representation based on user-defined matching criteria or rules. In an embodiment, giving the user-defined determination first or early precedence in the processing order, as shown, may permit the user-defined criteria to supersede or override standard system criteria. In one embodiment, user-defined matching criteria may be provided via a text format configuration file. In one embodiment, user-defined matching criteria may be entered into the command, control and configuration information of the SMS by a user via an SMS user interface provided for that purpose. Other embodiments are, of course, possible. In an embodiment, all candidate KPIs may be evaluated for a user-defined match at block 47060, and multiple matching candidates may be identified. In the illustrated embodiment, if any user-defined match is positively determined at block 47060 processing proceeds to block 47080, otherwise processing proceeds to block 47061.


At block 47061, computer processing determines whether one or more candidate KPIs/data sources is a match to serve as the data source for a swap-in widget based on having the same, or a sufficiently similar, title as the corresponding swap-out KPI/data source (i.e., the widget data source of the base dashboard template). In an embodiment, all or part of the title may be used to determine sameness or sufficient similarity. For example, a portion of the title containing the service name may be disregarded in the comparison. In an embodiment, the titles may be parsed and/or tokenized and the determination of sameness or sufficient similarity may be based on comparing the parsing results such as the number, order, or content of one or more tokens, for example. In the illustrated embodiment, if any candidate KPI matches on title/identifier criteria or rules, processing proceeds to block 47080, otherwise processing proceeds to block 47062. At block 47062, computer processing determines whether one or more candidate KPI/data sources is a match to serve as the data source for a swap-in widget based on having a shared base search with the corresponding swap-out KPI/data source. If such a matching candidate is found, processing proceeds to block 47080, otherwise processing proceeds to block 47063. At block 47063, computer processing determines whether one or more candidate KPI/data source is a match to serve as the data source for a swap-in widget based on having a sufficient relationship to a same common model. In one embodiment, a common model may be a common information model (CIM). In one embodiment, a common model may be a domain add-on facility or module that may include KPI templates. Other common models are possible. In one embodiment, a sufficient relationship exists where there is an identical use of the common model. In one embodiment, a sufficient relationship exists where the swap-in and swap-out KPIs is each a lineal descendent of a common model item. Other criteria may be used to determine the sufficiency of the relationship. If a matching candidate is found at block 47063, processing proceeds to block 47080, otherwise processing proceeds to block 47064. At block 47064, computer processing determines whether one or more candidate KPI/data sources is a match to serve as the data source for a swap-in widget based on having the same, or a sufficiently similar, description as the corresponding swap-out KPI/data source. In an embodiment, all or part of the description may be used to determine sameness or sufficient similarity. For example, a portion of the KPI description containing the service name may be disregarded in the comparison. In an embodiment, the KPI descriptions may be parsed and/or tokenized and the determination of sameness or sufficient similarity may be based on comparing the parsing results such as the number, order, or content of one or more tokens, for example. If a matching candidate is found at block 47064, processing proceeds to block 47080, otherwise processing proceeds to block 47065.


At block 47065, computer processing determines whether one or more candidate KPIs/data sources is a match to serve as the data source for a swap-in widget based on matching some other criteria or rule defined in the system or by a user. If a matching candidate is found at block 47065, processing proceeds to block 47080, otherwise processing proceeds to block 47070. At block 47070, computer processing determines whether other candidates exist that have not been evaluated for a match. Such a condition may arise in an embodiment, for example, where the identification of candidates at block 47052 results in a list of candidates, and the candidates of the list are fed individually through the matching evaluation represented by blocks 47060 through 47065. In such an embodiment, if the processing of block 47070 determines that no other candidates exist, processing proceeds to block 47088 where the process may conclude by signaling to other computer processes or programming that there has been a failure to identify a comparable or matching KPI for the widget data source of the base dashboard template. If the processing of block 47070 determines that other candidates do exist, processing shown to return to block 47052 where the next candidate can be identified and fed into the matching evaluation process.


The processing of block 47080 is entered where one or more matching candidates has been identified. At block 47080, computer processing determines whether more than one matching candidates have been determined. If so, these multiple candidates that are each a potential match for the swap-out data source may be passed on to the processing of block 47082 where a selection of one of the candidates is determined as the single matching candidate. In an embodiment, the processing of block 47082 may determine a selected match using scoring, ranking, prioritizing, precedence, or other criteria existing within the SMS. In an embodiment, the processing of block 47082 may determine a selected match by receiving an indication of a selection made by the user, perhaps via a graphical user interface, and perhaps in response to a displayed interactive selection list of the potential matches via the graphical user interface. Other embodiments are possible. Whether a single match is determined by the processing of block 47080, or a single match is selected as the result of the processing of block 47082, processing proceeds to block 47084 where the illustrated method may terminate by signaling the match, including its identification, to other computer processing or programming.


One of skill will appreciate from the foregoing that a candidate KPI need not be identical to the swap-out KPI in even one criterion to be considered a match. A candidate KPI that is identical to the swap-out KPI for at least one criterion or criteria group may be considered an exact match, in an embodiment. A candidate KPI that is not an exact match but exhibits a high degree of similarity to the swap-out KPI, for example, a similarity of half or more by some measure, may be considered to be a substantial match, in an embodiment. For example, a candidate may be considered a substantial match in an embodiment where two out of three criteria are identical. As another example, a candidate may be considered a substantial match in an embodiment where it achieves a score of 75 points out of 100 in a scoring system that evaluates and weighs a number of factors. A candidate that is less than a substantial match but is nonetheless determined to be a match, in one embodiment, may be considered to be an acceptable match.


The processing of the method illustrated in FIG. 47D2 may be repeated for each widget to be swapped in the base dashboard template, in an embodiment. One of skill will understand that the method illustrated and described in relation to FIG. 47D2 is an example used to illustrate and teach inventive aspects and its details do not dictate the scope of possible embodiments that practice inventive aspects. For example, the processing described in relation to the various distinct blocks appearing in FIG. 47D2 may be omitted, combined, reordered, or the like, in various embodiments.


FIG. 47D3 illustrates a block diagram of a system for dashboard swapping in one embodiment. System 47100 is shown to include swap processor 47102 and dashboard display processor 47104 which, in an embodiment, may each be implemented using hardware and/or software such as a microprocessor with stored program instructions, for example. System 47100 is shown to further have logical data constructs such as Service1 definition 47110, Service2 definition 47114, service monitoring dashboard (SMD) template definition 47120, and SMD Template’ definition 47140, which, in an embodiment, may each be implemented as any logical and/or physical collection, set, group, organization, structure, or the like, of constituent data elements represented uniformly or variously among one or more volatile or persistent, local or remote, consolidated or distributed, or such variations of storage mechanisms accessible to the computer system, and which each may be part of the command/control/configuration information that determines the operation of a service monitoring system (SMS). System 47100 is shown to further have Service1 Dashboard display 47160a and Service2 Dashboard display 47160b which, in an embodiment, may each be implemented with a properly constructed data structure or stream and perhaps communication mechanisms to cause the desired display on a receiving device, such as a user interface device with a display screen.


Each of service definitions 47110 and 47114 is shown to include definitional information about a number of KPIs of the service. (Each service definition may, of course, include information beyond that shown here for sake of illuminating the immediate topic, and further discussion may be considered in relation to service definition 460 of FIG. 4, KPI definitions 406A-406N of FIG. 4, and service definition 1720 of FIG. 17 B, for example.) Service1 definition 47110 is shown to include information for four KPIs (KPI-1, KPI-2, KPI-3, and KPI-4) and is used in this illustrative example to represent a service that is the subject of the base service monitoring dashboard template (the swap-out service). Service2 definition 47114 is shown to include information for three KPIs (KPI-A, KPI-B, and KPI-C) and is used in this illustrative example to represent the service that is the subject of a derived variant service monitoring dashboard template (the swap-in service).


SMD Template 47120 is shown to include information for a service 47122, information for a first widget instance 47124, information for an Nth widget instance 47126, and other information 47128. Nth widget instance information 47126 illustrates that many widgets may be associated with a dashboard template. Other information 47128 illustrates that a dashboard template may have more information than what is specifically shown here to illuminate the immediate topic, and may include information about the dashboard on the whole (e.g., window size), information about other dashboard elements (e.g., picture or drawing items), or other information, for example. Widget1 information 47124 is shown to include Location, Other, Service, and KPI/D[ata]S[ource] information, illustrative of the types of information that may be associated with a widget in a dashboard.


SMD Template’ 47140 illustrates a derived dashboard template that may result from the processing performed by swap processor 47012. SMD Template’ 47140 illustrates a derivative/variant/swapped SMD template that may be actually, virtually, constructively, or putatively produced by processing of swap processor 47102, which may include processing illustrated and discussed in reference to blocks 47014 and 47032 of FIG. 47D1, for example. In one embodiment swap processor 47102 receives as input from internal or external processing an identification for, or a complete copy of, or other information to locate, the definition of the base dashboard template to be swapped 47120, and an identification for, or a complete copy of, or other information to locate, the service definition for a swap-in service 47114. Swap processor 47102 accesses information of base template 47120 to determine a designated KPI data source 47130 for a first widget to be swapped. Swap processor 47102 may access additional information about the KPI that may be useful when finding the matching KPI of the swap-in-service. In one embodiment, the swap processor has sufficient KPI information 47130 from the base template definition 47120 in order to directly access additional information 47112, such as definitional information, about the swap-out KPI (KPI-2 in this example). In one embodiment, the swap processor indirectly accesses additional information 47112 about the swap-out KPI by using an identification of the swap-out service 47132 associated with the widget, itself, or an identification of the swap-out service 47122 associated with the dashboard overall, to access the service definition 47110, and uses KPI information 47130 to locate and access additional information 47112 about the swap-out KPI from the service definition 47110. Other implementations are possible.


Having sufficient information available to characterize the swap-out KPI for the purpose of determining a comparable matching swap-in KPI, one embodiment of swap processor 47102 accesses information of the Service2 definition 47114 to identify potential swap-in KPI candidates, as may be part of the processing of block 47052 of FIG. 47D2, for example. For purposes of this illustration, swap processor 47102 of FIG. 47D3 is assumed to have performed processing such as described for the method of FIG. 47D2 to identify KPI-C 47116 of FIG. 47D3 as the KPI of swap-in Service2 matching the swap-out KPI, KPI-2 47112, defined at 47130 as the data source of Widget 1 of SMD Template 47120. Having identified a swap-in KPI for the widget, swap processor 47102 in one embodiment inserts that information into the derived swap variant dashboard definition designated as SMD Template’ 47140. SMD Template’ 47140 can be seen by consideration of FIG. 47D3 to be a parallel or shadow instance of base SMD template 47120, from which it is derived. SMD template’ 47140 is shown to include a Service’ component 47142, a Widget 1′ component 47144, a Widget N′ component 47146, and an Other’ component 47148 that correspond to the Service 47122, Widget 1 47124, Widget N 47126, and Other 47128 components, respectively, of base SMD template 47120. Variant SMD Template’ 47140 is in accordance with base SMD template 47120, having the same content necessary to produce a dashboard display with the same general layout and appearance, but with differing values for swapped information items to produce a dashboard display reflecting a different service or subject. For example, the Other’ information component 47148 of derived SMD template 47140 may be identical to, copied from, or incorporated by reference to the Other information component 47128 of base SMD template 47120. Similarly, the Widget 1′ Location’ information component of derived SMD template 47140 may be identical to, copied from, or incorporated by reference to the Location information component of Widget 1 of base SMD template 47120, as the widget's location in the displayed template does not change because of a service swap, in an embodiment. As different examples, Service’ 47152 and KPI/DS' 47150 components of the derived template find accord in Service 47132 and KPI/DS 47130 components of the base template, however the values as between the templates are not identical because of the swapping operation performed by swap processor 47102. The storage in computer memory/storage of a derived template by the swap processor in an embodiment may vary with regard to conditions around the creation of the derived template or expectations for its use. For example, a swap processor performing the processing of block 47032 of FIG. 47D1 may utilize volatile and unconsolidated storage for an on-demand, transient dashboard variant, while persistent, consolidated storage may be utilized for a dashboard variant that is resolved once for repeated use over time as might be expected by a swap processor performing the processing of block 47014 of FIG. 47D1, in an embodiment.


Dashboard display processor 47104, in a first instance, may cause the display of a service monitoring dashboard reflecting the base template definition, such as display 47160a, or may cause the display, in a second instance, of a second service monitoring dashboard that mimics the first, reflecting the variant template definition, such as display 47160b. Each of the dashboard displays of an embodiment may include the same elements conforming to the same layout, such as corresponding title text elements 46162a-b, corresponding KPI value-trend widgets 47164a-b, and corresponding KPI sparkline widgets 47166a-b. Displayed values and data-driven visual elements or attributes (such as background color), in contrast, can be expected to differ even for simultaneous displays of the base and variant dashboards because of the swapping performed to produce the derived dashboard template 47140, including the swapping or substitution of KPI's used as widget data sources.


FIG. 47D4 illustrates an example user interface for creating and/or updating a service monitoring dashboard. Much of user interface 47200 may be appreciated and understood by consideration of other interfaces disclosed herein related to service monitoring dashboards. For example, one may consider FIG. 36B and FIG. 46C, and their related discussions. Interface 47200 of FIG. 47D4 is shown to include system title and header bar 47202, application title and header bar 47204, and main display area 47206. Main display area 47206 further includes header and action bar 47208, activity toolbar 47210 (including tool icons such as text tool icon 47212), and dashboard/template display area 47220. 47220 includes a presentation of a dashboard template that may have a modifiable layout in an add/edit mode, or a fixed layout in a display/consumption mode, in one embodiment. A portion of the dashboard template visible in display area 47220 includes four widgets (47222, 47224, 47226, and 47228). Notably, header and action bar 47208 includes “Enable Service Swapping” action button 47230. The enable service swapping action button 47230 of the interface is an interactive action button element that permits a user to indicate to the computing machine a desire to enable service swapping for the present dashboard. In such a case, the present dashboard is used as the base service monitoring dashboard template from which variant swap dashboards are derived. In one embodiment, a user interaction with action button 47230 such as by a mouse click action or touchscreen finger press, may result in the presentation of an interactive user interface display that enables a user to provide certain command, control, and configuration information for the SMS related to service swapping for the dashboard. FIG. 47D5 is an example of such an interface.


FIG. 47D5 illustrates an example user interface used for service swapping in one embodiment. Interface 47250 of the present embodiment enables a user to select one or more services that are eligible as the swap-in service of a variant dashboard derived from the base service monitoring dashboard template. Interface 47250 is shown to include selection list information and control bar 47252, selection list area 47270, cancel action button 47280, and enable action button 47282. Selection list information and control bar 47252 is shown to further include total and selected services count display 47254, presentation options control 47256, filter component 47258, and page navigation control area 47260. Filter component 47258 may display or enable the user entry or modification of criteria that filter the services displayed in the selection list area 47270. Selection list area 47270 is shown to include selection list column header bar 47272 and selection list entry rows, of which 47274 and 47276 are examples. The selection list depicted in area 47270 is shown as a tabular format having checkbox, “Title”, and “KPIs” columns as indicated by column header bar 47272. In the illustrated embodiment, entry rows in the selection list present an interactive checkbox in the checkbox column, the name, title, or other identifier for a service in the “Title” column, and a count of the KPIs associated with the service in the “KPIs” column, moving from left to right. User interaction with the checkbox in an entry row results in a toggling of the checkbox between a checked and an unchecked state for indicating to the computing machine a desire to select or not select, respectively, the corresponding service as a swap-in candidate for a variant dashboard. For example, the checkbox of entry row 47274 as shown is checked indicating that service “aws3.customers.coca-cola” should be enabled as a candidate for service swapping of the dashboard, while the checkbox of entry row 47276 is shown as unchecked indicating that “serviceB.services.itsi.com” should not be enabled as a candidate for service swapping of the dashboard. Interface 47250 may enable a user to freely navigate through the one or more pages of the filtered/unfiltered services list to indicate the selection of the services to be enabled for swapping with the dashboard. In an embodiment, when selection is complete a user may interact with “Enable” button 47282 to indicate the same to the computing machine. In one embodiment, the computing machine may record the list of services selected by the user, may validate the list, and may store it as command, control, and configuration information of the SMS. In one embodiment, the computing machine may record the list of services selected by the user, may validate the list, may generate a variant swapped dashboard template for each of the selected services, and may store the list of services and the generated templates as command, control, and configuration information of the SMS. Other embodiments are possible. Embodiments may vary, too, in how the command, control, and configuration information of the SMS produced by the processing for user interface 47250 is utilized and perhaps surfaced during the operation of the SMS. An example follows.


FIG. 47D6 illustrates an example user interface displaying a base dashboard template with swapping enabled. Interface 47300 largely replicates the layout and content of interface 47200 of FIG. 47D4. Notably, the “Enable Service Swapping” action button 47230 of interface 47200 is replaced in interface 47300 of FIG. 47D6 with service selection drop down box 47302. In an embodiment, when interface 47300 is first displayed, selection drop down box 47302 may show a default selection of the service associated with the base service monitoring dashboard template 47302, here shown as “DB-Primary-Oracle”. User interaction with drop-down button 47306 of selection drop-down box 47302 may result in the appearance of selection list 47310. The selection list in an embodiment may be populated with the name of the service associated with the base SMD template as an entry 47312, and with the names of one or more service selected to be enabled for service swapping such as may be achieved with the use of interface 47250 of FIG. 47D5. The selection list entries for “Service A” through “Service H” of 47310 of FIG. 47D6 are, perhaps, such examples. The selection list in an embodiment may be further populated with an action entry such as 47316 indicating an “Edit Service Swapping” action. User interaction with selection list entry 47316 may result in the presentation of a user interface such as 47250 of FIG. 47D5 whereby a user may modify the selected set of services eligible for swapping of the base dashboard, and it may be readily understood that modifications there may have a downstream impact on the entries in a subsequent appearance of a selection box such as 47310 of FIG. 47D6. User interaction with a selection list entry such as “Service D” 47314 may result in the computing machine causing the display of the variant dashboard template associated with the service identified in the entry as depicted in FIG. 47D7.


FIG. 47D7 illustrates an example user interface displaying a swapped dashboard template in one embodiment. Interface 47330 largely replicates the layout and content of interface 47300 of FIG. 47D6. Notably, the service selection drop down box 47332 is shown with “Service D” as the current selection, and without the attendant drop down selection list. Dashboard template widgets 47342, 47344, 47346, and 47348, are depicted with differences in their appearances from their counterparts in earlier FIGS. 47D4 and 47D6 in order to illustrate effects as might be expected from the substitution or swapping of widget data sources, and particularly where swap-in KPIs were successfully matched for each widget. If, instead, no swap-in KPI could be successfully matched to a widget such as 47348, a modified widget, a placeholder for the widget, or an alert message at the widget position, may in appear in the dashboard presentation of an embodiment. FIG. 47D8 illustrates an example user interface portion indicating a failed data source/KPI match for a dashboard widget in one embodiment. Interface portion 47360 may replace interface portion 47334 of FIG. 47D7 under the condition that Service D has no KPI comparable to the KPI identified as the data source for widget corresponding to widget 47348 in the base dashboard. “Widget” 47368 of FIG. 47D8 may be a widget with appearance modified to reflect the not-matched condition, a generic “Not Matched” image, or the like, in an embodiment.



FIG. 48 describes an example home page GUI 4800 for service-level monitoring, in accordance with one or more implementations of the present disclosure. GUI 4800 can include one or more tiles each representing a service-monitoring dashboard. The GUI 4800 can also include one or more tiles representing a service-related alarm, or the value of a particular KPI. In one implementation, a tile is a thumbnail image of a service-monitoring dashboard. When a service-monitoring dashboard is created, a tile representing the service-monitoring dashboard can be displayed in the GUI 4800. GUI 4800 can facilitate user input for selecting to view a service-monitoring dashboard. For example, a user may select a dashboard tile 4805, and GUI 4700 in FIG. 47 can be displayed in response to the selection. GUI 4800 can include tiles representing the most recently accessed dashboards, and user selected favorites of dashboards.



FIG. 49A describes an example home page GUI 4900 for service-level monitoring, in accordance with one or more implementations of the present disclosure. GUI 4900 can include one or more tiles representing a deep dive. In one implementation, a tile is a thumbnail image of a deep dive. When a deep dive is created, a tile representing the deep dive can be displayed in the GUI 4900. GUI 4900 can facilitate user input for selecting to view a deep dive. For example, a user may select a deep dive tile 4907, and the visual interface 5300 in FIG. 55 can be displayed in response to the selection. GUI 4900 can include tiles representing the most recently accessed deep dives, and user selected favorites of deep dives.


Home Page Interface



FIG. 49B is a flow diagram of an implementation of a method for creating a home page GUI for service-level and KPI-level monitoring, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 4910 is performed by a client computing machine. In another implementation, the method 4910 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 4911, the computing machine receives a request to display a service-monitoring page (also referred to herein as a “service-monitoring home page” or simply as a “home page”). In one implementation, the service monitoring page includes visual representations of the health of a system that can be easily viewed by a user (e.g., a system administrator) with a quick glance. The system may include one or more services. The performance of each service can be monitored using an aggregate KPI characterizing the respective service as a whole. In addition, various aspects (e.g., CPU usage, memory usage, response time, etc.) of a particular service can be monitored using respective aspect KPIs typifying performance of individual aspects of the service. For example, a service may have 10 separate aspect KPIs, each monitoring a various aspect of the service.


As discussed above, each KPI is associated with a service provided by one or more entities, and each KPI is defined by a search query that produces a value derived from machine data pertaining to the one or more entities. A value of each aggregate KPI indicates how the service in whole is performing at a point in time or during a period of time. A value of each aspect KPI indicates how the service in part (with respect to a certain aspect of the service) is performing at a point in time or during a period of time.


At block 4912, the computing machine can determine data associated with one or more aggregate KPI definitions and one or more aspect KPI definitions, useful for creating the home page GUI. In an implementation, determining the data can include referencing service definitions in a data store, and/or referencing KPI definitions is a data store, and/or referencing stored KPI values, and/or executing search queries to produce KPI values. In an implementation, determining the data can include determining KPI-related information for each of a set of aggregate KPI definitions and for each of a set of aspect KPI definitions. The KPI-related information for each aggregate or aspect KPI definition may include a KPI state. At block 4912, the computing machine may determine an order for both the set of aggregate KPI definitions and the set of aspect KPI definitions. (Information related to the KPI definition may vicariously represent the KPI definition in the ordering process such that if the information related to the KPI definition is ordered with respect to the information related to other KPI definitions, the KPI definition is considered equivalently ordered by implication.) Many criteria are possible on which to base the ordering of a set of KPI definitions including, for example, the most recently produced KPI value or the most recently indicated KPI state.


At block 4913, the computing machine causes display of the requested service-monitoring page having a services summary region and a services aspects region. The service summary region contains an ordered plurality of interactive summary tiles. In one implementation, each summary tile corresponds to a respective service and provides a character or graphical representation of at least one value for an aggregate KPI characterizing the respective service as a whole. The services aspects region contains an ordered plurality of interactive aspect tiles. In one implementation, each aspect tile corresponds to a respective aspect KPI and provides a character or graphical representation of one or more values for the respective aspect KPI. Each aspect KPI may have an associated service and may typify performance for an aspect of the associated service.


The requested service-monitoring page may also include a notable events region presenting an indication of one or more correlation searches that generate the highest number of notable events in a given period of time. In one implementation, the notable events region includes the indication of each correlation search, a value representing the number of notable events generated in response to execution of each correlation search, and a graphical representation of the number of notable events generated over the given period of time.


In one implementation, the computing machine is a client device that causes display of the requested service-monitoring page by receiving a service monitoring web page or a service monitoring UI document from a server and rendering the service monitoring web page using a web browser on the client device or rendering the service monitoring UI document using a mobile application (app) on the client device. Alternatively, the computing machine is a server computer that causes display of the requested service-monitoring page by creating a service monitoring web page or a service monitoring UI document, and providing it to a client device for display via a web browser or a mobile application (app) on the client device.


In one implementation, creating a service monitoring web page or a service monitoring UI document includes determining the current and past values of the aggregate and aspect KPIs, determining the states of the aggregate and aspect KPIs, and identifying the most critical aggregate and aspect KPIs. In one implementation, various aspects (e.g., CPU usage, memory usage, response time, etc.) of a particular service can be monitored using a search query defined for an aspect KPI which is executed against raw machine data from entities that make up the service. The values from the raw machine data that are returned as a result of the defined search query represent the values of the aspect KPI. An aggregate KPI can be configured and calculated for a service to represent an overall summary of a service. (The overall summary of a service, in an embodiment, may convey the health of the service, i.e., its sufficiency for meeting, or satisfaction of, operational objectives.) In one example, a service may have multiple separate aspect KPIs. The separate aspect KPIs for a service can be combined (e.g., averaged, weighted averaged, etc.) to create an aggregate KPI whose value is representative of the overall performance of the service. In one implementation, various thresholds can be defined for either aggregate KPIs or aspect KPIs. The defined thresholds correspond to ranges of values that represent the various states of the service. The values of the aggregate KPIs and/or aspect KPIs can be compared to the corresponding thresholds to determine the state of the aggregate or aspect KPI. The various states have an ordered severity that can be used to determine which KPIs should be displayed in service-monitoring page. In one implementation, the states include “critical,” “high,” “medium,” “normal,” and “low,” in order from most severe to least severe. In one implementation, some number of aggregate and aspect KPIs that have the highest severity level according to their determined state may be displayed in the corresponding region of the service-monitoring page. Additional details of thresholding, state determination and severity are described above with respect to FIGS. 31A-G.


At block 4914, the computing machine performs monitoring related to the homepage. Such monitoring may include receiving notification of an operating system event such as a timer pop, or receiving notification of a GUI event such as a user input. Blocks 4915 through 4917 each signify a determination as to whether a particular monitored event has occurred and the processing that should result if it has. In one embodiment, each of blocks 4915-4917 may be associated with the execution of an event handler. At block 4915, a determination is made whether notification has been received indicating that dynamic update or refresh of the homepage should occur. The notification may ensue from a user clicking a refresh button of the GUI, or from the expiration of a refresh interval timer established for the homepage, for example. If so, processing returns to block 4912 in one embodiment. At block 4916, a determination is made whether notification has been received indicating that a display mode for the homepage should be changed. The notification may ensue from a user clicking a display mode button of the GUI, such as one selecting a network operations center display mode over a standard display mode, for example. If so, processing returns to block 4913 where the homepage is caused to be displayed in accordance, presumably, with the user input. At block 4917, a determination is made whether notification has been received indicating some other user interaction or input. If so, processing proceeds to block 4918 where an appropriate response to the user input is executed.



FIG. 49C illustrates an example of a service-monitoring page 4920, in accordance with one or more implementations of the present disclosure. In one implementation, service-monitoring page 4920 includes services summary region 4921 and services aspects region 4924. Each of services summary region 4921 and services aspects region 4924 present dynamic visual representations including character and/or graphical indications of the states of various components in the system, including respective services in the system, as shown in services summary region 4921, and individual aspect KPIs associated with one or more of the services, as shown in services aspects region 4924. The information provided on service-monitoring page 4920 may be dynamically updated over time, so as to provide the user with the most recent available information. In one implementation, the visual representations on service-monitoring page 4920 are updated each time the underlying aggregate KPIs and aspect KPIs are recalculated according to the defined schedule in the corresponding KPI definition. In another implementation, the visual representations can be automatically updated in response to a specific user request, when the aggregate KPIs and aspect KPIs can be recalculated outside of their normal schedules specifically for the purpose of updating service-monitoring page 4920. In yet another implementation, the visual representations can be static such that they do not change once initially displayed. The aggregate KPIs and aspect KPIs can be determined in response to the initial user request to view the service-monitoring page 4920, and then displayed and refreshed at predefined time intervals or in real time once new values are calculated based on KPI monitoring parameters discussed above. Alternatively, the aggregate KPIs and aspect KPIs can be displayed, but not updated until a subsequent request to view the service-monitoring page 4920 is received.


In one implementation, the visual representations in services summary region 4921 contain an ordered plurality of interactive summary tiles 4922. Each of interactive summary tiles 4922 corresponds to a respective service in the system (e.g., Activesync, Outlook, Outlook RPC) and provides a character or graphical representation of at least one value for an aggregate KPI characterizing the respective service as a whole. In one implementation, each of interactive summary tiles 4922 includes an indication of the corresponding service (i.e., the name or other identifier of the service), a numerical value indicating the aggregate KPI, and a sparkline indicating how the value of the aggregate KPI has changed over time. In one implementation, each of interactive summary tiles 4922 has a background color indicative of the state of the service. The state of the service may be determined by comparing the aggregate KPI of the service to one or more defined thresholds, as described above. In addition, each of interactive summary tiles 4922 may include a numerical value representing the state of the aggregate KPI characterizing the service and/or a textual indication of the state of the aggregate KPI (e.g., the name of the current state). In one implementation, only a certain number of interactive summary tiles 4922 may be displayed in services summary region 4921 at one time. For example, some number (e.g., 15, 20, etc.) of the most critical services, as measured by the severity of the states of their aggregate KPIs, may be displayed. In another implementation, tiles for user selected services may be displayed (i.e., the most important services to the user). In one implementation, which services are displayed, as well as the number of services displayed may be configured by the user through menu element 4927.


The interactive summary tiles 4922 of service monitoring page 4920 are depicted as rectangular tiles arranged in an orthogonal array within a region, without appreciable interstices. Another implementation may include tiles that are not rectangular, or arranged in a pattern that is not an orthogonal array, or that has interstitial spaces (grout) between tiles, or some combination. Another implementation may include tiles having no background color such that a tile has no clearly visible delineated shape or boundary. Another implementation may include tiles of more than one size. These and other implementations are possible.


In one implementation, services summary region 4921 further includes a health bar gage 4923. The health bar gage 4923 may indicate distribution of aggregate KPIs of all services across each of the various states, rather than just the most critical services. The length of a portion of the health bar gage 4923, which is colored according to a specific KPI state, depends on the number of services with aggregate KPIs in that state. In addition, the health bar gage 4923 may have numeric indications of the number of services with KPIs in each state, as well as the total number of services in the system being monitored.


In one implementation, the visual representations in services aspects region 4924 contain an ordered plurality of interactive aspect tiles 4925. Each of interactive aspect tiles 4925 corresponds to a respective aspect KPI and provides a character or graphical representation of one or more values for the respective aspect KPI. Each aspect KPI may have an associated service and may typify performance for an aspect of the associated service. In one implementation, each of interactive aspect tiles 4925 includes an indication of the corresponding aspect KPI (i.e., the name or other identifier of the aspect KPI), an indication of the service with which the aspect KPI is associated, a numerical value indicating the current value of the aspect KPI, and a sparkline indicating how the value of the aspect KPI has changed over time. In one implementation, each of interactive aspect tiles 4925 has a background color indicative of the state of the aspect KPI. The state of the aspect KPI may be determined by comparing the aspect KPI to one or more defined thresholds, as described above. In addition, each of interactive aspect tiles 4925 may include a numerical value representing the state of the aspect KPI and/or a textual indication of the state of the aspect KPI (e.g., the name of the current state). In one implementation, only a certain number of interactive aspect tiles 4925 may be displayed in services aspects region 4924 at one time. For example, some number (e.g., 15, 20, etc.) of the most critical aspect KPIs, as measured by the severity of the states of the KPIs, may be displayed. In another implementation, tiles for user selected aspect KPIs may be displayed (i.e., the most important KPIs to the user). In one implementation, which aspect KPIs are displayed, as well as the number of aspect KPIs displayed may be configured by the user through menu element 4928.


In one implementation, services aspects region 4924 further includes an aspects bar gage 4926. The aspects bar gage 4926 may indicate the distribution of all aspect KPIs across each of the various states, rather than just the most critical KPIs. The length of a portion of the aspects bar gage 4926 that is colored according to a specific state depends on the number of aspect KPIs in that state. In addition, the aspects bar gage 4926 may have numeric indications of the number of aspect KPIs in each state, as well as the total number of aspect KPIs in the system being monitored.


The tiles of a region (e.g., 4922 of 4921, 4925 of 4924) each occupy an ordered position within the region. In one embodiment, the order of region tiles proceeds from left-to-right then top-to-bottom, with the first tile located in the leftmost, topmost position. In one embodiment, the order of region tiles proceeds from top-to-bottom then left-to-right. In one embodiment, the order of region tiles proceeds from right-to-left then top-to-bottom. In one embodiment, different regions may have different ordering arrangements. Other ordering is possible. A direct use of the ordered positions of tiles within a region is for making the association between a particular KPI definition and the particular tile for displaying information related to it. For example, a set of aspect KPI definitions with a determined order such as discussed in relation to block 4912 of FIG. 49B can be mapped in order to the successively ordered tiles (4925) of an aspects region (4924).


In one embodiment service-monitoring page 4920 includes a display mode selection GUI element 4929 enabling a user to indicate a selection of a display mode. In one embodiment, display mode selection element 4929 enables the user to select between a network operations center (NOC) display mode and a home display mode. In one embodiment, tiles displaying KPI-related information while in NOC mode are larger (occupy more relative display area) than corresponding tiles displayed while in home mode. In an embodiment, display area is acquired to accommodate the larger tiles by a combination of one or more of reducing the total tile count, reducing or eliminating interstitial space between tiles or between displayed elements of the GUI, generally, reducing or eliminating GUI elements (such as any auxiliary regions area), or other methods. The transformation of the GUI display from home to NOC mode changes the size of tiles relative to one or more other GUI elements and, so, is not a simple zoom function applied to the service-monitoring page 4920. In one embodiment, an indicator within a tile displaying KPI-related information while in NOC mode is larger (occupies more relative display area) than the corresponding indicator displayed while in home mode. For example, a character-type indicator within a tile may display using a larger or bolder font while in NOC mode than while in home mode. In one embodiment, display area is acquired to accommodate the larger indicator by a combination of reducing or eliminating other indicators appearing within the tile. Embodiments with more than two display mode selection options, such as associated with GUI element 4929, are possible.



FIG. 49D illustrates an example of a service-monitoring page 4920 including a notable events region 4930, in accordance with one or more implementations of the present disclosure. Depending on the implementation, notable events region 4930 may be displayed adjacent to, beneath, above or between services summary region 4921 and services aspects region 4924. In another implementation, notable events region 4930 may be displayed on a different page or in a different interface than services summary region 4921 and services aspects region 4924. In one implementation, notable events region 4930 contains an indication (such as a list) of one or more correlation searches (also referred to herein as “rules”) that generate the highest number of notable events in a given period of time. A notable event may be triggered by a correlation search associated with a service. As discussed above, a correlation search may include search criteria pertaining to one or more KPIs (e.g., an aggregate KPI or one or more aspect KPIs) of the service, and a triggering condition to be applied to data produced by a search query using the search criteria. A notable event is generated when the data produced by the search query satisfies the triggering condition. A correlation search may be pre-defined and provided by the system or may be newly created by an analyst or other user of the system. In one implementation, the correlation searches can be run continuously or at regular intervals (e.g., every hour) to generate notable events. Generated notable events can be stored in a dedicated “notable events index,” which can be subsequently accessed to create various visualizations, including notable events region 4930 of service-monitoring page 4920.


In one implementation, the notable events region 4930 includes the indication (e.g., the name) of each correlation search 4931, a value representing the number of notable events generated in response to execution of each correlation search 4932, and a graphical representation (e.g., a sparkline) of the number of notable events generated over the given period of time 4933. In one implementation, the correlation searches shown in notable events region 4930 may be sorted according to the data in each of columns 4931, 4932, and 4933.


In one implementation, only a certain number of correlation searches may be displayed in notable events region 4930 at one time. For example, some number (e.g., 5, 10, etc.) of the correlation searches that generate the most notable events in a given period of time may be displayed. In another implementation, all correlation searches that generated a minimum number of notable events in a given period of time may be displayed. In one implementation, which correlation searches are displayed, as well as the number of correlation searches displayed may be configured by the user.


In an embodiment, notable events region 4930 may be replaced by, or supplemented with, one or more other information regions. For example, one embodiment of an other-information region may display most-recently-used items, such as most-recently-viewed service-monitoring dashboards, or most-recently-used deep dive displays. Each most-recently-used item may contain the item name or some other identifier for the item. Any notable event regions and other information regions in a GUI display may be collectively referred to as auxiliary regions. In one embodiment, items displayed in auxiliary regions support user interaction. User interaction may, for example, provide an indication to the computing machine of a user's desire to navigate to a GUI component related to the item with which the user interacted. For example, a user may click on a notable event name in the notable event region to navigate to a GUI displaying detailed information about the event. For example, a user may click on the name of a most-recently-viewed service-monitoring dashboard in an other-information region to navigate to the dashboard GUI. In one embodiment, auxiliary regions are displayed together in an auxiliary regions area. An auxiliary regions area may be located in a GUI display as described above for the notable events region 4930.



FIGS. 49E-F illustrate an example of a service-monitoring page 4920, in accordance with one or more implementations of the present disclosure. As shown in FIG. 49E, a particular tile 4940 of the plurality of interactive aspect tiles 4925 in services aspects region 4924 has been activated. The user may activate tile 4940, for example, by hovering a cursor over the tile 4940 or tapping the tile 4940 on a touchscreen. Once the tile 4940 is activated, a selectable graphical element 4941, such as a check box, radio button, etc., may be displayed for the chosen tile 4940. Further user interaction with the selectable graphical element 4941, such as a mouse click or additional tap, may activate the selectable graphical element 4941 and cause the corresponding tile 4940 to be selected for further viewing. Upon selection of tile 4940, a similar selectable graphical element may be displayed for each of interactive aspects tiles 4925 in services aspects region 4924, as shown in FIG. 49F. In one implementation, additional white space may be displayed between each of interactive aspect tiles 4925. If the user desires, they may select one or more additional tiles by similarly interacting with the corresponding selectable graphical element of any of the other interactive aspect tiles 4925. In one implementation, the selected tiles may have the selectable graphical element highlighted, or otherwise emphasized, to indicate that the corresponding tile has been selected. In addition, the appearance (e.g., color, shading, etc.) of the selected titles may change to further emphasize that they have been selected.


In response to one or more of interactive aspect tiles 4925 being selected, menu elements 4942 and 4943 may be displayed in service-monitoring page 4920. Menu element 4942 may be used to cancel the selection of any interactive aspects tiles 4925 in services aspects region 4924. Activation of menu element 4942 may cause the selected tiles to be unselected and revert to the non-selected state as shown in FIG. 49C. Menu element 4943 may be used to view the selected aspect KPIs in a deep dive visual interface, which includes detailed information for the one or more selected aspect KPIs. The deep dive visual interface displays time-based graphical visualizations corresponding to the selected aspect KPIs to allow a user to visually correlate the aspect KPIs over a defined period of time. A deep dive visual interface is described in greater detail below in conjunction with FIG. 50A.


Example Deep Dive


Implementations of the present disclosure provide a GUI that provides in-depth information about multiple KPIs of the same service or different services. This GUI referred to herein as a deep dive displays time-based graphical visualizations corresponding to the multiple KPIs to allow a user to visually correlate the KPIs over a defined period of time.



FIG. 50A is a flow diagram of an implementation of a method for creating a visual interface displaying graphical visualizations of KPI values along time-based graph lanes, in accordance with one or more implementations of the present disclosure. The method may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 5000 is performed by a client computing machine. In another implementation, the method 5000 is performed by a server computing machine coupled to the client computing machine over one or more networks.


At block 5001, the computing machine receives a selection of KPIs that each indicates a different aspect of how a service (e.g., a web hosting service, an email service, a database service) provided by one or more entities (e.g., host machines, virtual machines, switches, firewalls, routers, sensors, etc.) is performing. As discussed above, each of these entities produces machine data or has its operation reflected in machine data not produced by that entity (e.g., machine data collected from an API for software that monitors that entity while running on another entity). Each KPI is defined by a different search query that derives one or more values from the machine data pertaining to the entities providing the service. Each of the derived values is associated with a point in time and represents the aspect of how the service is performing at the associated point in time. In one implementation, the KPIs are selected by a user using GUIs described below in connection with FIGS. 51, 52 and 57-63.


At block 5003, the computing machine derives the value(s) for each of the selected KPIs from the machine data pertaining to the entities providing the service. In one implementation, the computing machine executes a search query of a respective KPI to derive the value(s) for that KPI from the machine data.


At block 5005, the computing machine causes display of a graphical visualization of the derived KPI values along a time-based graph lane for each of the selected KPIs. In one implementation, the graph lanes for the selected KPIs are parallel to each other and the graphical visualizations in the graph lanes are all calibrated to the same time scale. In one implementation, the graphical visualizations are displayed in the visual interfaces described below in connection with FIGS. 53-56 and 64A-70.



FIG. 50B is a flow diagram of an implementation of a method for generating a graphical visualization of KPI values along a time-based graph lane, in accordance with one or more implementations of the present disclosure.


At block 5011, the computing machine receives a request to create a graph for a KPI. Depending on the implementation, the request can be made by a user from service-monitoring dashboard GUI 4700 or from a GUI 5100 for creating a visual interface, as described below with respect to FIG. 51. At block 5013, the computing machine displays the available services that are being monitored, and at block 5015, receives a selection of one of the available services. At block 5017, the computing machine displays the KPIs associated with the selected service, and at block 5019, receives a selection of one of the associated KPIs. In one implementation, the KPIs are selected by a user using GUIs described below in connection with FIGS. 51, 52 and 57-63. At block 5021, the computing machine uses a service definition of the selected service to identify a search query corresponding to the selected KPI. At block 5023, the computing machine determines if there are more KPI graphs to create. If the user desires to create additional graphs, the method returns to block 5013 and repeats the operations of blocks 5013-5021 for each additional graph.


If there are no more KPI graphs to create, at block 5025, the computing machine identifies a time range. The time range can be defined based on a user input, which can include, e.g., identification of a relative time or absolute time, perhaps entered through user interface controls. The time range can include a portion (or all) of a time period, where the time period corresponds to one used to indicate which values of the KPI to retrieve from a data store. In one implementation, the time range is selected by a user using GUIs described below in connection with FIGS. 54 and 63. At block 5027, the computing device creates a time axis reflecting the identified time range. The time axis may run parallel to at least one graph lane in the create visual interface and may include an indication of the amount of time represented by a time scale for the visual interface (e.g., “Viewport: 1 h 1 m” indicating that the graphical visualizations in the graph lanes display KPI values for a time range of one hour and one minute).


At block 5029, the computing device executes the search query corresponding to each KPI and stores the resulting KPI dataset values for the selected time range. At block 5031, the computing device determines the maximum and minimum values of the KPI for the selected time range and at block 5033 creates a graph lane in the visual interface for each KPI using the maximum and minimum values as the height of the lane. In one implementation, a vertical scale for each lane may be automatically selected using the maximum and minimum KPI values during the current time range, such that the maximum value appears at or near the top of the lane and the minimum value appears at or near the bottom of the lane. The intermediate values between the maximum and minimum may be scaled accordingly.


At block 5035, the computing device creates a graphical visualization for each lane using the KPI values during the selected time period and selected visual characteristics. In one implementation, the KPI values are plotted over the time range in a time-based graph lane. The graphical visualization may be generated according to an identified graph type and graph color, as well as any other identified visual characteristics. At block 5037, the computing device calibrates the graphical visualizations to a same time scale, such that the graphical visualization in each lane of the visual interface represents KPI data over the same period of time.


Blocks 5025-5037 can be repeated for a new time range. Such repetition can occur, e.g., after detecting an input corresponding to an identification of a new time range. The generation of a new graphical visualization can include modification of an existing graphical visualization.



FIG. 51 illustrates an example GUI 5100 for creating a visual interface displaying graphical visualizations of KPI values along time-based graph lanes, in accordance with one or more implementations of the present disclosure. The GUI 5100 can receive user input for a number of input fields 5102, 5104 and selection of selection buttons 5106. For example, input field 5102 can receive a title for the visual interface being created. Input field 5104 may receive a description of the visual interface. The input to input fields 5102 and 5104 may be optional in one implementation, such that it is not absolutely required for creation of the visual interface. Input to fields 5102 and 5104 may be helpful, however, in identifying the visual interface once it is created. In one implementation, if a title is not received in input fields 5102 and 5104, the computing machine may assign a default title to the created visual interface. Selection buttons 5106 may receive input pertaining to an access permission for the visual interface being created. In one implementation, the user may select an access permission of either “Private,” indicating that the visual interface being created will not be accessible to any other users of the system instead being reserved for private use by the user, or “Shared,” indicating that once created, the visual interface will be accessible to other users of the system. Upon, the optional entering of title and description into fields 5102 and 5104 and the selection of an access permission using buttons 5106, the selection of button 5108 may initiate creation of the visual interface. In one implementation, in addition to “Private” or “Shared” there may be additional or intermediate levels of access permissions. For example, certain individuals or groups of individuals may be granted access or denied access to a given visual interface. There may be a role based access control system where individuals assigned to a certain role are granted access or denied access.



FIG. 52 illustrates an example GUI 5200 for adding a graphical visualization of KPI values along a time-based graph lane to a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, in response to the creation of a visual interface using GUI 5100, the system presents GUI 5200 in order to add graphical visualizations to the visual interface. The graphical visualizations correspond to KPIs and are displayed along a time-based graph lane in the visual interface.


In one example, GUI 5200 can receive user input for a number of input fields 5202, 5204, 5212, selections from drop down menus 5206, 5208, and/or selection of selection buttons 5210 or link 5214. For example, input field 5202 can receive a title for the graphical visualization being added. Input field 5204 may receive a subtitle or description of the graphical visualization. The input to input fields 5202 and 5204 may be optional in one implementation, such that it is not absolutely required for addition of the graphical visualization. Input to fields 5202 and 5204 may be helpful, however, in identifying the graphical visualization once it is added to the visual interface. In one implementation, if a title is not received in input fields 5202 or 5204, the computing machine may assign a default title to the graphical visualization being added.


Drop down menu 5206 can be used to receive a selection of a graph style, and drop down menu 5208 can be used to receive a selection of a graph color for the graphical visualization being added. Additional details with respect to selection of the graph style and the graph color for the graphical visualization are described below in connection with FIGS. 57 and 58.


Selection buttons 5210 may receive input pertaining to a search source for the graphical visualization being added. In one implementation, the user may select search source of “Ad Hoc,” “Data Model” or “KPI.” Additional details with respect to selection of the search source for the graphical visualization are described below in connection with FIGS. 57, 59 and 60. Input field 5212 may receive a user-input search query or display a search query associated with the selected search source 5210. Selection of link 5214 may indicate that the user wants to execute the search query in input field 5212. When a search query is executed, the search query can produce one or more values that satisfy the search criteria for the search query. Upon the entering of data and the selection menu items, the selection of button 5216 may initiate the addition of the graphical visualization to the visual interface.



FIG. 53 illustrates an example of a visual interface 5300 with time-based graph lanes for displaying graphical visualizations, in accordance with one or more implementations of the present disclosure. In one example, the visual interface 5300 includes three time-based graph lanes 5302, 5304, 5306. These graph lanes may correspond to the graphical visualizations of KPI values added to the visual interface using GUI 5200 as described above. Each of the graph lanes 5302, 5304, 5306 can display a graphical visualization for corresponding KPI values over a time range. Initially the lanes 5302, 5304, 5306 may not include the graphical visualizations until a time range is selected using drop down menu 5308. Additional details with respect to selection of the time range from drop down menu 5308 are described below in connection with FIG. 63. In another implementation, a default time range may be automatically selected and the graphical visualizations may be displayed in lanes 5302, 5304, 5306.



FIG. 54 illustrates an example of a visual interface 5300 displaying graphical visualizations of KPI values along time-based graph lanes, in accordance with one or more implementations of the present disclosure. In one implementation, each of the time-based graph lanes 5302, 5304, 5306 include a visual representation of corresponding KPI values. The visual representations in each lane may be of different graph styles and/or colors or have the same graph styles and/or colors. For example, lane 5302 includes a bar chart, lane 5304 includes a line graph and lane 5306 includes a bar chart. The graph type and graph color of the visual representation in each lane may be selected using GUI 5200, as described above. Depending on the implementation, the KPIs represented by the graphical visualizations may correspond to different services or may correspond to the same service. In one implementation, multiple of the KPIs may correspond to the same service, while one or more other KPIs may correspond to a different service.


The graphical visualizations in each lane 5302, 5304, 5306 can all be calibrated to the same time scale. That is, each graphical visualization corresponds to a different KPI reflecting how a service is performing over a given time range. The time range can be reflected by a time axis 5410 for the graphical visualizations that runs parallel to at least one graph lane. The time axis 5410 may include an indication of the amount of time represented by the time scale (e.g., “Viewport: 1 h 1 m” indicating that the graphical visualizations in graph lanes 5302, 5304, 5306 display KPI values for a time range of one hour and one minute), and an indication of the actual time of day represented by the time scale (e.g., “12:30, 12:45, 01 PM, 01:15”). In one implementation, a bar running parallel to the time lanes including the indication of the amount of time represented by the time scale (e.g., “Viewport: 1 h 1 m”) is highlighted for an entire length of time axis 5410 to indicate that the current portion of the time range being viewed includes the entire time range. In other implementations, when only a subset of the time range is being viewed, the bar may be highlighted for a proportional subset of the length of time axis 5410 and only in a location along time axis 5410 corresponding to the subset. In one implementation, at least a portion of the time axis 5410 is displayed both above and below the graph lanes 5302, 5304, 5306. In one implementation, an indicator associated with drop down menu 5308 also indicates the selected time range (e.g., “Last 60 minutes”) for the graphical visualizations.


In one implementation, when one of graph lanes 5302, 5304, 5306 is selected (e.g., by hovering the cursor over the lane), a grab handle 5412 is displayed in association with the selected lane 5302. When user interaction with grab handle 5412 is detected (e.g., by click and hold of a mouse button), the graph lanes may be re-ordered in visual interface 5300. For example, a user may use grab handle 5412 to move lane 5302 to a different location in visual interface 5300 with respect to the other lanes 5304, 5306, such as between lanes 5304 and 5306 or below lanes 5304 and 5306. When another lane is selected, a corresponding grab handle may be displayed for the selected lane and used to detect an interaction of a user indicative of an instruction to re-order the graph lanes. In one implementation, a grab handle 5412 is only displayed when the corresponding lane 5302 is selected, and hidden from view when the lane is not selected.


While the horizontal axis of each lane is scaled according to the selected time range, and may be the same for each of the lanes 5302, 5304, 5306, a scale for the vertical axis of each lane may be determined individually. In one implementation, a scale for the vertical axis of each lane may be automatically selected such that the graphical visualization spans most or all of a width/height of the lane. In one implementation, the scale may be determined using the maximum and minimum values reflected by the graphical visualization for the corresponding KPI during the current time range, such that the maximum value appears at or near the top of the lane and the minimum value appears at or near the bottom of the lane. The intermediate values between the maximum and minimum may be scaled accordingly. In one implementation, a search query associated with the KPI is executed for a selected period of time. The results of the query return a dataset of KPI values, as shown in FIG. 45A. The maximum and minimum values from this dataset can be determined and used to scale the graphical visualization so that most or all of the lane is utilized to display the graphical visualization.



FIG. 55A illustrates an example of a visual interface 5300 with a user manipulable visual indicator 5514 spanning across the time-based graph lanes, in accordance with one or more implementations of the present disclosure. Visual indicator 5514, also referred to herein as a “lane inspector,” may include, for example, a line or other indicator that spans vertically across the graph lanes 5302, 5304, 5306 at a given point in time along time axis 5410. The visual indicator 5514 may be user manipulable such that it may be moved along time axis 5410 to different points. For example, visual indicator 5514 may slide back and forth along the lengths of graph lanes 5302, 5304, 5306 and time axis 5410 in response to user input received with a mouse, touchpad, touchscreen, etc.


In one implementation, visual indicator 5514 includes a display of the point in time at which it is currently located. In the illustrated example, the time associated with visual indicator 5514 is “12:44:43 PM.” In one implementation, visual indicator 5514 further includes a display of a value reflected in each of the graphical visualizations for the different KPIs at the current point in time illustrated by visual indicator 5514. In the illustrated example, the value of the graphical visualization in lane 5302 is “3.65,” the value of the graphical visualization in lane 5304 is “60,” and the value of the graphical visualization in lane 5306 is “0.” In one implementation, units for the values of the KPIs are not displayed. In another implementation, units for the values of the KPIs are displayed. In one implementation, when visual indicator 5514, is located a time between two known data points (i.e., between the vertices of the graphical visualization), a value of the KPI at that point in time may be interpolated using linear interpolation techniques. In one implementation, when one of lanes 5302, 5304, 5306 is selected (e.g., by hovering the cursor over the lane) a maximum and a minimum values reflected by the graphical visualization for a corresponding KPI during the current time range are displayed adjacent to visual indicator 5514. For example, in lane 5304, a maximum value of “200” is displayed and a minimum value of “0” is displayed adjacent to visual indicator 5514. This indicates that the highest value of the KPI corresponding to the graphical visualization in lane 5304 during the time period represented by time axis 5410 is “200” and the lowest value during the same time period is “0.” In other implementations, the maximum and minimum values may be displayed for all lanes, regardless of whether they are selected, or may not be displayed for any lanes.


In one implementation, visual interface 5300 may include an indication when the values for a KPI reach one of the predefined KPI thresholds. As discussed above, during the creation of a KPI, the user may define one or more states for the KPI. The states may have corresponding visual characteristics such as colors (e.g., red, yellow, green). In one implementation, the graph color of the graphical visualization may correspond to the color defined for the various states. For example, if the graphical visualization is a line graph, the line may have different colors for values representing different states of the KPI. In another implementation, the current value of a selected lane displayed by visual indicator 5514 may change color to correspond to the colors defined for the various states of the KPI. In another implementation, the values of all lanes displayed by visual indicator 5514 may change color based on the state, regardless of which lane is currently selected. In another implementation, there may be a line or bar running parallel to at least one of lanes 5302, 5304, 5306 that is colored according to the colors defined for the various KPI states when the value of the corresponding KPI reaches or passes a defined threshold causing the KPI to change states. In yet another implementation, there may be horizontal lines running along the length of at least one lane to indicate where the thresholds defining different KPI states are located on the vertical axis of the lane. In other implementations, the thresholds may be indicated in visual interface 5300 in some other manner.



FIG. 55B is a flow diagram of an implementation of a method for inspecting graphical visualizations of KPI values along a time-based graph lane, in accordance with one or more implementations of the present disclosure. At block 5501, the computing machine determines a point in time corresponding to the current position of lane inspector 5514. The lane inspector 5514 may be user manipulable such that it may be moved along time axis 5410 to different points in time. For each KPI dataset represented by a graphical visualization in the visual interface, at block 5503, the computing machine determines a KPI value corresponding to the determined point in time. In addition, at block 5505, the computing machine determines a state of the KPI at the determined point in time, based on the determined value and the defined KPI thresholds. The determine state may include, for example, a critical state, a warning state, a normal state, etc. At block 5507, the computing device determines the visual characteristics of the determined state, such as a color (e.g., red, yellow, green) associated with the determined state.


At block 5509, the computing machine displays the determined value adjacent to lane inspector 5514 for each of the graphical visualizations in the visual interface. In the example illustrated in FIG. 55A, the value of the graphical visualization in lane 5302 is “3.65,” the value of the graphical visualization in lane 5304 is “60,” and the value of the graphical visualization in lane 5306 is “0.” If the lane inspector 5514 is moved to a new position representing a different time, the operations at blocks 5501-5509 may be repeated.


At block 5511, the computing machine receives a selection of one of the lanes or graphical visualizations within a lane in the visual interface. In one implementation, one of graph lanes 5302, 5304, 5306 is selected by hovering the cursor over the lane. At block 5513, the computing machine determines the maximum and minimum values of the KPI dataset associated with the selected lane. In one implementation, a search query associated with the KPI is executed for a selected period of time. The results of the query return a dataset of KPI values, as shown in FIG. 45A. The maximum and minimum values from this dataset can be determined. At block 5515, the computing machine displays the maximum and minimum values adjacent to lane inspector 5515. For example, in lane 5304, a maximum value of “200” is displayed and a minimum value of “0” is displayed adjacent to lane inspector 5514.



FIG. 55C illustrates an example of a visual interface with a user manipulable visual indicator spanning across multi-series time-based graph lanes, in accordance with one or more implementations of the present disclosure. In one implementation, time-based graph lane 5520 is a multi-series graph lane including visual representations of multiple series of corresponding KPI values. The multiple series may be the result of a search query corresponding to the KPI that is designed to return multiple values at any given point in time. For example, the search could return the processor load on multiple different host machines at a point in time, where load on each individual host is represented by a different one of the multiple series. Each graphical visualization in multi-series lane 5520 can be calibrated to the same time scale.


In one implementation, visual indicator 5525 includes a display of the point in time at which it is currently located. In the illustrated example, the time associated with visual indicator 5514 is “01:26:47 PM.” In one implementation, visual indicator 5525 further includes a display of a value reflected in each of the graphical visualizations, including multi-series lane 5520, at the current point in time illustrated by visual indicator 5525. In one implementation, in multi-series lane 5520, the visual indicator 5525 displays the maximum, minimum, and average values among each of the multiple series at the given point in time. In the illustrated example, the graphical visualizations in lane 5525 have a maximum value of “4260.11” and a minimum value of “58.95.” In one implementation, an indication of the series to which the maximum and minimum values correspond may also be displayed (e.g., the hosts named “vulcan” and “tristanhydra4,” respectively). Further, the visual indicator 5525 may display the average value of the multiple series at the given point in time (e.g., “889.41”).



FIG. 56 illustrates an example of a visual interface 5300 displaying graphical visualizations of KPI values along time-based graph lanes with options for editing the graphical visualizations, in accordance with one or more implementations of the present disclosure. In one implementation, when one of graph lanes 5302, 5304, 5306 is selected (e.g., by hovering the cursor over the lane), a GUI element such as a gear icon 5616 is displayed in association with the selected lane 5306. When user interaction with gear icon 5616 is detected, a drop down menu 5618 may be displayed. Drop down menu 5618 may include one or more user selectable options including, for example, “Edit Lane,” “Delete Lane,” “Open in Search,” or other options. Selection of one of these options may cause display of a graphical interface to allow the user to edit the graphical visualization in the associated lane 5306, delete the lane 5306 from the visual interface 5300, or display the underlying data (e.g., events, machine data) from which the KPI values of the associated graphical visualization are derived. Additional details with respect to editing the graphical visualization are described below in connection with FIG. 57. When another lane is selected, a corresponding gear icon, or other indicator, may be displayed for the selected lane. In one implementation, a gear icon 5616 is only displayed when the corresponding lane 5306 is selected, and hidden from view when the lane is not selected.



FIG. 57 illustrates an example of a GUI 5700 for editing a graphical visualization of KPI values along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, in response to the selection of the “Edit Lane” option in drop down menu 5618, the system presents GUI 5700 in order to edit the corresponding graphical visualization.


In one implementation, GUI 5700 can receive user input for a number of input fields 5702, 5704, 5712, selections from drop down menus 5706, 5708, or selection of selection buttons 5710 or link 5714. In one implementation, input field 5702 can be used to edit the title for the graphical visualization. Input field 5204 may be used to edit the subtitle or description of the graphical visualization. In one implementation drop down menu 5706 can be used to edit the graph style, and drop down menu 5708 can be used to edit the graph color for the graphical visualization. For example, upon selection of drop down menu 5708, a number of available colors may be displayed for selection by the user. Upon selection of a color, the corresponding graphical visualization may be displayed in the selected color. In one implementation, no two graphical visualizations in the same visual interface may have the same color. Accordingly, the available colors displayed for selection may not include any colors already used for other graphical visualizations. In one implementation, the color of a graphical visualization may be determined automatically according to the colors associated with defined thresholds for the corresponding KPI. In such an implementation, the user may not be allowed to edit the graph color in drop down menu 5708.


Selection buttons 5710 may be used to edit a search source for the graphical visualization. In the illustrated implementation, an “Ad Hoc” search source has been selected. In response, an input field 5712 may display a user-input search query. The search query may include search criteria (e.g., keywords, field/value pairs) that produce a dataset or a search result of events or other data that satisfy the search criteria. In one implementation, a user may edit the search query by making changes, additions, or deletions, to the search query displayed in input field 5712. The Ad Hoc search query may be executed to generate a dataset of values that can be plotted over the time range as a graphical visualization (e.g., as shown in visual interface 5300). Selection of link 5714 may indicate that the user wants to execute the search query in input field 5712. Upon the editing of data and/or the selection menu items, the selection of button 5716 may indicate that the editing of the graphical visualization is complete.



FIG. 58 illustrates an example of a GUI 5700 for editing a graph style of a graphical visualization of KPI values along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, drop down menu 5706 can be used to edit the graph style of the graphical visualization. For example, upon selection of drop down menu 5706, a list 5806 of available graph types may be displayed for selection by the user. In one implementation, the available graph types include a line graph, an area graph, or a column graph. In other implementations, additional graph types may include a bar cart, a plot graph, a bubble chart, a heat map, or other graph types. Upon selection of a graph type, the corresponding graphical visualization may be displayed in the selected graph type. In one implementation, each graphical visualization on the visual interface has the same graph type. Accordingly, when the graph type of one graphical visualization is changed, the graph type of each remaining graphical visualization in the visual interface is automatically changed to the same graph type. In another implementation, each graphical visualization in the visual interface may have a different graph type. In one implementation, the graph type of a graphical visualization may be determined automatically based on the corresponding KPI or service. In such an implementation, the user may not be allowed to edit the graph type in drop down menu 5706.



FIG. 59 illustrates an example of a GUI 5700 for selecting the KPI corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, selection buttons 5710 may be used to edit a search source for the graphical visualization. In the illustrated implementation, the “KPI” search source has been selected. In response, drop down menus 5912, 5914 and input field 5916 may be displayed. Drop down menu 5912 may be used to select a service, the performance of which will be represented by the graphical visualization. Upon selection, drop down menu 5912 may display a list of available services. Drop down menu 5914 may be used to select the KPI that indicates an aspect of how the selected service is performing. Upon selection, drop down menu 5914 may display a list of available KPIs. Input field 5916 may display a search query corresponding to the selected KPI. The search query may derive one or more values from machine data pertaining to one or more entities providing a service. In one implementation, a user may edit the search query by making changes, additions, or deletions, to the search displayed in input field 5916. Selection of link 5918 may indicate that the user wants to execute the search query in input field 5916.



FIG. 60 illustrates an example of a GUI 5700 for selecting a data model corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, selection buttons 5710 may be used to edit a search source for the graphical visualization. In the illustrated implementation, the “Data Model” search source has been selected. In response, drop down menus 6012, 6014 and input fields 6016, 6018 may be displayed. Drop down menu 6012 may be used to select a data model on which the graphical visualization will be based. Upon selection, drop down menu 6012 may display a list of available data models. Additional details with respect to selection of a data model are described below in connection with FIG. 61. Drop down menu 6014 may be used to select a statistical function for the data model. Upon selection, drop down menu 6014 may display a list of available functions. Additional details with respect to selection of a data model function are described below in connection with FIG. 62A. Input field 6016 may display a “Where clause” that can be used to further refine the search associated with the selected data model and displayed in input field 6018. The where clause may include, for example the WHERE command followed by a key/value pair (e.g., WHERE host=Vulcan). In one implementation, “host” is a field name and “Vulcan” is a value stored in the field “host.” The WHERE command may further filter the results of the search query associated with the selected data model to only return data that is associated with the host name “Vulcan.” As a result, the search can filter results based on a particular entity or entities that provide a service. In one implementation, a user may also edit the search query by making changes, additions, or deletions, to the search displayed in input field 6018. The data model search query may be executed to generate a dataset of values that can be plotted over the time range as a graphical visualization (e.g., as shown in visual interface 5300). Selection of link 6020 may indicate that the user wants to execute the search query in input field 6018.



FIG. 61 illustrates an example of a GUI 6100 for selecting a data model corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, upon selection of drop down menu 6012, GUI 6100 is displayed. GUI 6100 allows for the selection and configuration of a data model to be used as the search source for the graphical visualization. In GUI 6100, a user may select an existing data model from drop down menu 6102. Additionally, a user may select one of objects 6104 of the data model. In one implementation, an object is a search that defines one or more events. The data model may be a grouping of objects that are related. Furthermore, a user may select one of the fields 6106 to derive one or more values for the graph. Additional details regarding data models are provided below.



FIG. 62A illustrates an example of a GUI 5700 for editing a statistical function for a data model corresponding to a graphical visualization along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, drop down menu 6014 may be used to select statistical function for the data model. For example, upon selection of drop down menu 6014, a list 6214 of available statistical functions may be displayed for selection by the user. In one implementation, the available statistical functions include average, count, distinct count, maximum, minimum, sum, standard deviation, median or other operations. The selected statistical function may be used to produce one or more values for display as the graphical visualization. In one implementation, the available statistical functions may be dependent on the data type of the selected field from fields 6106 in GUI 6100. For example, when the selected field has a numerical data type, any of the above listed statistical functions may be available. When the selected field has a string data type, however, the only available operations may be count and distinct count, as the arithmetic operations cannot be performed on a string data type. In one implementation, the statistical function may be determined automatically based on the corresponding data model. In such an implementation, the user may not be allowed to edit the statistical function in drop down menu 5214.



FIG. 62B illustrates an example of a GUI 6220 for editing a graphical visualization of KPI values along a time-based graph lane in a visual interface, in accordance with one or more implementations of the present disclosure. In one implementation, in response to the selection of the “Edit Lane” option in drop down menu 5618, the system presents GUI 6220 in order to edit the graph rendering options for the corresponding graphical visualization. In one implementation, the graph rendering options include the vertical axis scale 6222 and the vertical axis boundary 6224 for the corresponding lane. Options for the vertical axis scale 6222 include linear and logarithmic. Depending on the selection, the vertical axis of the corresponding lane will be displayed with either a linear or a logarithmic scale. Options for the vertical axis boundary 6224 include data extent, zero extent, and static. When data extent is selected, the range of values shown on the vertical axis of the corresponding lane will be set to include the full range of KPI values during the selected time period (i.e., the vertical axis will range from the maximum to the minimum KPI value). When zero extent is selected, the range of values shown on the vertical axis of the corresponding lane will be set to range from the maximum KPI value to zero (or to a negative value, if such a value exists in the data). When static is selected, the user can enter a custom range of values which will be shown on the vertical axis of the corresponding lane.



FIG. 63 illustrates an example of a GUI 6300 for selecting a time range that graphical visualizations along a time-based graph lane in a visual interface should cover, in accordance with one or more implementations of the present disclosure. In one implementation, drop down menu 5308 may be used to select a time range for the graphical visualizations in the visual interface 5300 of FIG. 53. For example, upon selection of drop down menu 5308, a GUI 6300 for selection of the time range may be displayed. In one implementation, the time range selection options may include a real-time period 6302, a relative time period 6304 or some other time period 6306. For real-time execution, the time range for machine data can be a real-time period 6302 (e.g., 30-second window, 1-minute window, 1-hour window, etc.) from the execution time (e.g., each time the query is executed, the events with timestamps within the specified time window from the query execution time will be used). In real-time execution, a search query associated with the KPI may be continually executed (or periodically executed at a relatively short period (e.g., 1 second)) to continually show a graphical visualization reflecting KPI values from the last one hour (or other real-time period) of time. Thus, if the 1 hour window initially covers from 12 pm to 1 pm, at 1:30, the 1 hour window may cover from 12:30 pm to 1:30 pm. In other words, the time period may be considered a rolling time period, as it constantly changes as time moves forward. For relative execution, the relative time period 6304 can be historical (e.g., yesterday, previous week, etc.) or based on a specified time window from the request time or scheduled time (e.g., last 15 minutes, last 4 hours, etc.). For example, the historical time range “Yesterday” can be selected for relative execution. In another example, the window time range “Last 15 minutes” can be selected for relative execution. In relative execution, the search query associated with the KPI may only be executed upon a request for updated values from the user. Thus, if the 1 hour window covers from 12 pm to 1 pm, that time period will not change until the user requests an update, at which point the most recent 1 hour of values will be displayed. In one implementation, the other time period may include, for example, all of the time where KPI values are available for the corresponding service. Additional time range options may allow the user to specify a particular date or time range over which the KPI values are to be displayed as graphical visualizations.



FIG. 64A illustrates an example of a visual interface 5300 for selecting a subset of a time range that graphical visualizations along a time-based graph lane in a visual interface cover, in accordance with one or more implementations of the present disclosure. In one implementation, visual indicator 5514 may be used to select a subset 6402 of the time range represented by time axis 5410, and the corresponding portions of the graphical visualizations in lanes 5302, 5304, 5306. In one implementation, a user may use a mouse or other pointing device to position visual indicator 5514 at a starting position along time axis 5410, then click and drag to select the desired subset 6402. In one embodiment, the selected subset 6402 is shown as shaded in the visual interface 5300. In another implementation, all areas except the selected subset 6402 are shown as shaded. The selection of subset 6402 may be an indication that the user wishes to more closely inspect the KPI values of the graphical visualizations during the time period represented by the subset 6402. As a result, in response to the selection, the subset 6402 may be emphasized, enlarged, or zoomed in upon to allow closer inspection.



FIG. 64B is a flow diagram of an implementation of a method for enhancing a view of a subset a subset of a time range for a time-based graph lane, in accordance with one or more implementations of the present disclosure. At block 6401, the computing device determines a new time range based on the positions of lane inspector 5514. In one implementation, lane inspector 5514 may be used to select a subset 6402 of the time range represented by time axis 5410, and the corresponding portions of the graphical visualizations in lanes 5302, 5304, 5306. At block 6403, the computing device identifies a subset of values of each KPI that correspond to the new time range. In one embodiment, each value in the KPI dataset may have a corresponding time value or timestamp. Thus, the computing device can filter the dataset to identify values with a timestamp included in the selected subset of the time range.


At block 6405, the computing device determines the maximum and minimum values in the selected subset of values for each KPI, and at block 6407 adjusts the time axis of the lanes in the graphical visualization to reflect the new time range. In one implementation, the subset 6402 is expanded to fill the entire length or nearly the entire length of graph lanes 5302, 5304, 5306. The horizontal axis of each lane may be scaled according to the selected subset 6402. At block 6409, the computing device adjusts the height of the lanes based on the new maximum and minimum values. In one implementation, the vertical axis of each lane is scaled according to the maximum and minimum values reflected by the graphical visualization for a corresponding KPI during the selected subset 6402. At block 6411, the computing device modifies the graphs based on the subsets of values and calibrates the graphs to the same time scale based on the new time range. Additional details are described with respect to FIG. 65.



FIG. 65 illustrates an example of a visual interface displaying graphical visualizations of KPI values along time-based graph lanes for a selected subset of a time range, in accordance with one or more implementations of the present disclosure. In response to the selection of subset 6402 using visual indicator 5514, the system may recalculate the time range that the graphical visualizations in graph lanes 5302, 5304, 5306 should cover. In one implementation, the subset 6402 is expanded to fill the entire length or nearly the entire length of graph lanes 5302, 5304, 5306. The horizontal axis of each lane is scaled according to the selected subset 6402 and the vertical axis of each lane is scaled according to the maximum and minimum values reflected by the graphical visualization for a corresponding KPI during the selected subset 6402. In one implementation, the maximum value appears at or near the top of the lane and the minimum value appears at or near the bottom of the lane. The intermediate values between the maximum and minimum may be scaled accordingly.


In one implementation, time access 5410 is updated according to the selected subset 6402. The time axis 5410 may include an indication of the amount of time represented by the time scale (e.g., “Viewport: 5 m” indicating that the graphical visualizations in graph lanes 5302, 5304, 5306 display KPI values for a time range of five minutes), and an indication of the actual time of day represented by the original time scale (e.g., “12:30, 12:45, 01 PM, 01:15”). In one implementation, a bar running parallel to the time lanes including the indication of the amount of time represented by the time scale (e.g., “Viewport: 1 h 1 m”) is highlighted for a proportional subset of the length of time axis 5410 and only in a location along time axis 5410 corresponding to the subset. In the illustrated embodiment, the highlighted portion of the horizontal bar indicates that the selected subset 6402 occurs sometime between “01 PM” and “01:15.” In one implementation, at least a portion of the time axis 5410 is displayed above the graph lanes 5302, 5304, 5306 as well. This portion of the time axis indicates the actual time of day represented by the selected subset 6402 (e.g., “01:05, 01:06, 01:07, 01:08, 01:09”). In one implementation, a user may return to the un-zoomed view of the original time period by clicking the non-highlighted portion of the horizontal bar in the time axis 5410.



FIG. 66 illustrates an example of a visual interface 5300 displaying twin graphical visualizations of KPI values along time-based graph lanes for different periods of time, in accordance with one or more implementations of the present disclosure. In one implementation, each of graph lanes 5302, 5304, 5306 has a corresponding twin lane 6602, 6604, 6606. The twin lanes 6602, 6604, 6606 may display a second graphical visualization in parallel with the first graphical visualization in graph lanes 5302, 5304, 5306. The KPI values reflected in the second graphical visualization may correspond to the same KPI (or other search source) for a different period of time than the values reflected in the first graphical visualization. In one implementation, a user may add the twin lanes 6602, 6604, 6606 by selecting drop down menu 6608. In one implementation, drop down menu 6608 can be used to select the period of time for the values reflected in the second graphical visualizations. For example, upon selection of drop down menu 6608, a list 6610 of available time periods may be displayed for selection by the user. In one implementation, the available time periods may include periods of time in the past when KPI data is available for one or more of the graphical visualizations. In one implementation, a twin lane may be created for each of the lanes in the visual interface, and a search query of each KPI can be executed using the specified time range to produce one or more time values for the second graphical visualization of a corresponding KPI. Because the new time range is associated with a different point(s) in time, the machine data or events used by the search query for the second graphical visualization will be different than the machine data that was used by the search query for the original graphical visualization, and therefore the values produced for the second graphical visualization are likely to be different from the values that were produced for the original graphical visualization. In another implementation, a twin lane may be created only for one or more selected lanes in the visual interface, and only search queries of those KPIs can be executed. In one implementation, if past KPI data is not available for the selected time range, no second graphical visualization may be displayed in the twin lane 6606.



FIG. 67 illustrates an example of a visual interface with a user manipulable visual indicator 5514 spanning across twin graphical visualizations of KPI values along time-based graph lanes for different periods of time, in accordance with one or more implementations of the present disclosure. Visual indicator 5514, also referred to herein as a “lane inspector,” may include, for example, a line or other indicator that spans across the graph lanes 5302, 6602, 5304, 6604, 5306, 6606 at a given point in time along time axis 5410. The visual indicator 5514 may be user manipulable such that it may be moved along time axis 5410 to different points. For example, visual indicator 5514 may slide back and forth along the lengths of graph lanes and time axis 5410 in response to user input received with a mouse, touchpad, touchscreen, etc.


In one implementation, visual indicator 5514 includes a display of the point in time at which it is currently located both in original lanes 5302, 5304, 5306 and twin lanes 6602, 6604, 6606. In the illustrated example, the times associated with visual indicator 5514 are “Thu Sep 4 01:35:34 PM” for the original lanes and “Wed Sep 3 01:35:34 PM” for the twin lanes. Thus, the twin lanes show values of the same KPI from the same time range on the previous day. In one implementation, visual indicator 5514 further includes a display of a value reflected in each of the graphical visualizations for the different KPIs at the point in time corresponding to the position of visual indicator 5514. In the illustrated example, the value of the graphical visualization in lane 5302 is “0,” the value of the graphical visualization in lane 6302 is “1.52,” the value of the graphical visualization in lane 5304 is “36,” the value of the graphical visualization in lane 6304 is “31,” the value of the graphical visualization in lane 5306 is “0,” and lane 6306 has no data available. In one implementation, the graphical visualizations in twin lanes 6302, 6304, 6306 have the same graph type and a similar graph color as the graphical visualizations in the corresponding graph lanes 5302, 5304, 5306. In another implementation, the second graphical visualizations are configurable such that the user can adjust the graph type and the graph color. In one implementation, rather than being displayed in twin parallel lanes, the second graphical visualizations may be overlaid on top of the original graphical visualizations.



FIG. 68A illustrates an example of a visual interface 5300 displaying a graph lane 6806 with inventory information for a service or entities reflected by KPI values, in accordance with one or more implementations of the present disclosure. In one implementation, an additional lane 6806 is displayed in parallel to at least one of graph lanes 6802 and 6804. Graph lanes 6802 and 6804 may be similar to graph lanes 5302, 5304, 5306 described above, such that they may display graphical visualizations of corresponding KPI values. Additional lane 6806, however, may be a different type of lane, which does not display graphical visualizations. In one implementation, additional lane 6806 may display inventory information for the service or for the one or more entities providing the service reflected by the KPI corresponding to the graphical visualization in the adjacent lane 6804. The additional lane 6806 may include textual information, or other non-graphical information. The inventory information may include information about the service or the entities providing the service, such as an identifier of the entities (e.g., a host name, server name), a location of the entities (e.g., rack number, data center name), etc. In one implementation, the inventory information displayed in lane 6806 may be populated from information provided during the entity definition process. In one embodiment, the inventory information displayed in additional lane 6806 may change according to the position of visual indicator 5514 along time axis 5410. When the inventory information is time stamped, or otherwise is associated with a time value, the inventory information may be different at different points in time. Accordingly, in one implementation, the inventory information available at the time associated with the position of visual indicator 5514 may be displayed in additional lane 6806. In one implementation, additional lane 6806 may be continually associated with an adjacent lane 6804, such that if the lanes in visual interface 5300 are reordered, additional lane 6806 remains adjacent to lane 6804 despite the reordering.



FIG. 68B illustrates an example of a visual interface displaying an event graph lane with event information in an additional lane, in accordance with one or more implementations of the present disclosure. In one implementation, time-based graph lane 6810, is an event lane having a visual representation of the number of events occurring over a given period of time. The visual representation may include a heat map, whereby the entire period of the lane is segmented into smaller equally sized buckets, each representing a subset of the period of time and having a colored rectangle. The color of the rectangle may correspond to the number of events pertaining to a particular entity or service that occurred during the period of time represented by the bucket. In one implementation, darker colors/shades represent a higher number of events, while lighter colors/shades represent a lower number of events. Additional lane 6812 may be a different type of lane, which does not display graphical visualizations. In one implementation, additional lane 6812 may display additional information corresponding to the events represented in the adjacent event lane 6810. The additional lane 6812 may include textual information, or other non-graphical information. In one implementation, when one of the buckets in event lane 6810 is selected, additional lane 6812 may include a listing of each event that is associated with the selected bucket. Information about each event that is displayed in the list may include, for example, an identifier of the event, a timestamp of the event, an identifier of corresponding entities (e.g., a host name, server name), a location of the entities (e.g., rack number, data center name), etc. In one implementation, additional lane 6812 may be continually associated with an adjacent lane 6810, such that if the lanes in visual interface 6800 are reordered, additional lane 6812 remains adjacent to lane 6810 despite the reordering.



FIG. 69 illustrates an example of a visual interface 5300 displaying a graph lane with notable events occurring during a timer period covered by graphical visualization of KPI values, in accordance with one or more implementations of the present disclosure. In one implementation, an additional lane 6908 is displayed in parallel to at least one of graph lanes 6902, 6904, 6906. Graph lanes 6902, 6904, 6906 may be similar to graph lanes 5302, 5304, 5306 described above, such that they may display graphical visualizations of corresponding KPI values. Additional lane 6908, however, may be a different type of lane designed to display indications of the occurrences of notable events. “Notable events” are system occurrences that may be likely to indicate a security threat or operational problem. These notable events can be detected in a number of ways: (1) an analyst can notice a correlation in the data and can manually identify a corresponding group of one or more events as “notable;” or (2) an analyst can define a “correlation search” specifying criteria for a notable event, and every time one or more events satisfy the criteria, the application can indicate that the one or more events are notable. An analyst can alternatively select a pre-defined correlation search provided by the application. Note that correlation searches can be run continuously or at regular intervals (e.g., every hour) to search for notable events. Upon detection, notable events can be stored in a dedicated “notable events index,” which can be subsequently accessed to generate various visualizations containing security-related information.


In one implementation, the notable events occurring during the period of time represented by time axis 5410 are displayed as flags 6910 or bubbles in a bubble chart in additional lane 6908. The flags 6910 may be located at a position along time axis 5410 corresponding to when the notable event occurred. In one implementation, the flags 6910 may be color coded to vindicate the severity or importance of the notable event. In one implementation, when one of the flags 6910 is selected (e.g., by clicking on the flag or hovering the cursor over the flag), a description of the notable event may be displayed. As illustrated in FIG. 69, the description 6912 may be displayed in a horizontal bar along the bottom of lane 6908. In another implementation, as illustrated in FIG. 70, the description 7012 may be displayed adjacent to the selected flag 6910. In one implementation, user-manipulable visual indicator 5514 may be used to select a particular flag 6910. For example, when visual indicator 5514 is slid along the length of lane 6908, a description 7012 of a corresponding notable event at the same time may be displayed.


In some implementations, search queries for KPIs and correlation searches can derive values using a late-binding schema that the search queries apply to machine data. Late-binding schema is described in greater detail below. The systems and methods described herein above may be employed by various data processing systems, e.g., data aggregation and analysis systems. In various illustrative examples, the data processing system may be represented by the SPLUNK® ENTERPRISE system produced by Splunk Inc. of San Francisco, Calif., to store and process performance data.


KPI Entity Breakdown


KPIs are often derived using machine data of a number of entities, possibly a large number. The consolidation of machine data from many entities into a KPI score to reflect an entire service can be extremely useful in system monitoring, diagnostics, and troubleshooting. Some embodiments may include further capability to consolidate, as necessary, the data associated with the individual entities into per-entity KPI values and to accumulate, transform, process, and visualize per-entity KPI values and information derived therefrom. The determination of KPI values on a per-entity basis may be known as entity breakdown or KPI entity breakdown, and aspects of (KPI) entity breakdown include aspects of an embodiment that create, modify, use, derive from, are derived from, support, are supported by, or are otherwise related to, per-entity KPI values. Illustrative embodiments will now be described.



FIG. 70A is a flow diagram of an implementation of a method for generating and using per-entity information. Method 70100 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method 70100 may be performed by a client computing machine. In another implementation, the method 70100 may be performed by a server computing machine coupled to the client computing machine over one or more networks. These and other implementations are possible and within the grasp of one of skill in the art.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks). However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, acts can be subdivided or combined. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media.


Method 70100 may begin at block 70110 and proceed to block 70120 representing the production and accumulation of per-entity KPI values. As discussed herein in relation to FIG. 29C a search query that defines a KPI may have a determination component that may derive a value for the KPI from some aggregation of data. When the aggregation of data is representative of all of the relevant entities that perform the service, the value derived by the determination component may be the KPI value for the service overall. When the aggregation of data is representative of a single entity involved in the performance of the service, that value derived by the determination component may be a per-entity KPI value related to the service and may represent the contribution of the entity to the overall service KPI. Accordingly, when the KPI search is executed, perhaps according to a schedule or regular frequency as indicated by 70122, the determination component of the KPI search can process relevant machine data on a per-entity basis to produce per-entity KPI values that are used for, or are incidental to, the production of the overall service KPI. The per-entity KPI values may be stored/represented/reflected/recorded in computer-readable storage 70194, and may be stored in association with a time value. The time value may reflect the approximate time of the production of the per-entity KPI value, the time of a point in time or indicative of a period of time associated with machine data used to derive the value, or another relevant time value. The time value may be reflected directly in storage 70194, indirectly in storage, or may be determinable from other information, for example, the position of the per-entity value in a list or array, or an offset or index value associated with the per-entity value considered in conjunction with a base time or start time. Subsequently produced per-entity KPI values can be similarly stored.


In another embodiment, the per-entity KPI values for the time series of set 70160 are produced by executing a search query once per entity to produce a per-entity KPI value for a single entity each time. The multiple executions of the single-entity search query may be executed in parallel or in rapid succession in some embodiments. In one embodiment the same search query is used for each entity, differing naturally in any selection or filter criteria used to identify the appropriate entity machine data. In another embodiment different search queries may be used for different entities to tailor the production of the per-entity KPI value based on particular entity characteristics or other factors. In one embodiment the per-entity KPI values may be used to derive an overall service KPI value, in another embodiment they are not, and in yet another embodiment and overall service KPI value for a KPI is not produced.


The accumulation of per-entity KPI values in storage 70194 effectively produces a set of per-entity time series of KPI values represented in FIG. 70A by block 70160. (The accumulation may take place over time in some embodiments or, in other embodiments, almost instantly, such as may result from applying KPI value determination to accumulated data at once, such as to a machine data history store.) Two time series 70162, 70164 are shown as examples of time series in set 70160. Per-entity time series 70162 illustrates a time series associated with an illustrative first entity. Time series 70162 by its appearance is suggestive of a time series stored in a highly structured format with each of the per-entity KPI values for the first entity recorded as consecutive stored elements of a data structure. Per-entity time series 70164 illustrates a time series associated with an illustrative second entity. Time series 70164 by its appearance is suggestive of a time series not rigidly stored, with each of the per-entity KPI values for the second entity (the round nodes of 70164) recorded somewhere in storage 70194. The per-entity KPI values stored for the second entity (the round nodes of 70164) may at any point be ordered according to time (as suggested by the straight line segments connecting the round nodes of 70164) into a sequential time series. Such ordering of the individual per-entity KPI values stored for the second entity may occur at the time of storage and be reflected in storage, at the time of use, or at some other time. The difference in storage representation shown in FIG. 70A and discussed for time series 70162 and 70164 is intended to indicate that the storage organization, representation, format and/or structure, is not critical to this embodiment of method 70100, and that the individual per-entity KPI values for a particular entity form a logical time series regardless of their stored representation. In one embodiment, per-entity KPI time series 70162 and 70164 are representative of the same time span. In one embodiment, each per-entity KPI value of time series 70162 has a corresponding per-entity KPI value of time series 70164, and the values of each such pair of corresponding values may be representative of the same point in time or period of time. In one embodiment, a set of time series such as set 70160 includes a time series for each entity included in the KPI. These and other variations are possible.


At block 70130 of the presently described embodiment, one or more statistical metrics are determined from the set of per-entity KPI time series 70160. In one implementation, per-entity KPI values that correspond to a particular sample time, such as a point in time or a period of time, are identified from across the multiple time series of 70160. The identified per-entity KPI values are transformed by the computing machine to one or more statistical metric values associated with the sample time. For example, the identified per-entity KPI values may be transformed into the statistical metric of a mean, median, or mode average value. Similarly, the identified per-entity KPI values may be transformed into the statistical metric of a standard deviation value (for example, the value of the standard deviation, itself, or the mean plus or minus some number of standard deviations). Similarly, the identified per-entity KPI values may be transformed into the statistical metric of a quantile value (for example, the value corresponding to the 5th percentile, the 3rd quartile, or the 9th decile). Similarly, the identified per-entity KPI values may be transformed into the statistical metric of an extreme value, such as the maximum or minimum. Similarly, the identified per-entity KPI values may be transformed into the statistical metric of a range value. These and other statistical metrics are possible in an embodiment and may be implemented as the determination of any meaningful value from the identified per-entity KPI values.


The identification of per-entity KPI values and their transformation to one or more statistical metrics as just described takes place for multiple sample times, and the transformation results are recorded in storage 70194. Recording values for different sample times for a particular statistical metric, such as an average, results in the computer storage containing the data of a time series for the particular statistical metric. This is true regardless of whether the time series is organized in advance in storage as such. FIG. 70A shows storage 70194 having a set 70170 of statistical metric time series, such as 70172 and 70174. Statistical metric time series 70172 may be a time series of maximum values while time series 70174 may contain minimum values, for example. As another example, statistical metric time series 70172 may be a time series of 25th-percentile values while time series 70174 may contain 75th-percentile values. As another example, statistical metric time series 70172 may be a time series of mean-minus-1SD (standard deviation) values while time series 70174 may contain mean-plus-1SD (standard deviation) values. The set of statistical metric time series 70170 may contain time series for any number of statistical metrics.


In one embodiment, time series for different statistical metrics may cover the same period of time. In one embodiment, a particular time reference such as a point in time or a period of time may have a corresponding value in multiple time series representing multiple statistical metrics. In one embodiment, each value in the time series for a particular statistical metric may be synchronized with a corresponding value in the time series for each of one or more different statistical metrics. In one embodiment, times associated with the values of a particular statistical metric time series may be related to a frequency or schedule for the associated KPI, that is, the KPI related to the underlying per-entity data.


In an embodiment, block 70140 represents the visualization of statistical metric time series data. The visualization may include causing the display of a graphical user interface (GUI) on a computer monitor or display, such as the display of computer 70192. The visualization represented by block 70140 may in another embodiment also include certain of the per-entity time series data (such as illustrated and discussed in relation to block 70160). In an embodiment, statistical metric time series data may be visualized in a time-based graph lane. (The related disclosure herein on time-based graph lane visualizations for KPI information as depicted and discussed in relation to FIGS. 50A-59, and 66-77, for example, is useful here.) In an embodiment, statistical metric time series data may be depicted as a line on a graph, or as a boundary/edge of an area in an area graph, or as points plotted individually, or by some other representation, or by some combination of the above. Embodiments may vary not only in the method of displaying statistical metric time series data but also in the number of statistical metric time series displayed, the method of display for each, the number, if any, of graph lanes employed, the number and selection of statistical metric time series per graph lane, the inclusion of per-entity time series data with the statistical metric data, the time frame or time frames employed and their representation, the combination and inclusion of GUI elements to enable user interaction, and other factors. An appreciation will be developed by consideration of FIGS. 70B-701 and the related discussion that appears further below.


Visualization of statistical metrics about per-entity data as disclosed herein provides an analyst with information about a key performance indicator for a service without losing sight of the fact that individual entities perform the service, and without overloading the visualization with all the per-entity detail by instead presenting a statistically digested representation. Such a statistically digested representation reveals much about performance, performance problems, and operational parameters, and provides a meaningful context or backdrop against which to view and assess specific per-entity data.


Viewing the visualization may address the immediate concern of the analyst or may provide the analyst with insights into promising avenues for further investigation. Method 70100 of FIG. 70A provides for handling navigation requests from a user to refine, update, revise, augment or otherwise change a current visualization, or other aspect or component part of a GUI in which it is included; or to engage, invoke, initiate, or otherwise interact with processes, functions, features, or capabilities available from or to a system performing the method.


Navigation processing of method 70100 starts at block 70150. An embodiment may receive inputs for navigation action as the result of user interaction with the GUI. For example, a GUI displaying graphical depictions of statistical metric time series data on the display of computer 70192 may also include GUI elements, components, controls, or the like that enable a user to provide input through a keyboard, mouse, touchpad, touchscreen, microphone, position sensor, or other mechanisms of computer 70192 adapted for human interaction.


Some navigation inputs received at block 70150 may be directed toward updating the existing visualization. For example, a user may click on a GUI button to delete a displayed graph lane, or click a checkbox control to toggle the display of some data. Such inputs, while not requesting navigation away from the current application context, are made to direct specific processing within the current application context and, as such, are navigation inputs in regard to method 70100.


In contrast, some navigation inputs received at block 70150 may be directed toward engaging processing outside of the immediate application context. For example, a user may click on a GUI button to present a blank screen where a new search query can be defined. As another example, a user may click on an icon to navigate to a home screen. The processing for such options outside of the immediate application context are represented in FIG. 70A by navigation Option block 70154.


Some inputs received at block 70150 may direct processing to block 70152 which may cause the display of a navigation GUI component on computer 70192, for example, that permits the user to select from a displayed set of navigation targets. The set displayed at any occurrence may be context-sensitive and customizable through the use of configuration information as may be found in a configuration file 70182, for example. Furthermore, the navigation selection processing of block 70152 may permit navigation to a processing option outside of the immediate application context while carrying forward certain information from the current context. An appreciation will be developed by consideration of FIG. 70K and the related discussion appearing below.



FIGS. 70B-70C illustrate examples of a GUI for editing a graph style of a graphical visualization of KPI-related values along a time-based graph lane in a visual interface, including aspects related to KPI entity breakdown. The illustrated examples may embody an extension to or alternative of GUI examples illustrated and discussed in relation to FIGS. 57-58, for example. FIG. 70B depicts a GUI with a selection list 70220 of available graph types. In operation, the list 70220 may appear in response to a user interaction with drop-down box 70212 used to specify the graph type. Other GUI components may be employed, examples including a list box or a combo box, and the user interaction may be conducted using any of a number of human interface devices such as a keyboard, a mouse, a touchpad, a touchscreen, a microphone, a position sensor, a video camera, or the like. Selection list 70220 is shown presenting four available options for the graph type 70220a-d with a check mark indicating that option 70220d, “Distribution Stream,” has been selected by the user. The distribution stream graph type, as will be shown and discussed below, is an aspect of KPI entity breakdown inasmuch as the distribution stream graph type of present embodiments visualizes statistical metric time series data derived from KPI entity breakdown data. Accordingly, an embodiment of GUI 70200 may inactivate, disable, or omit the “Distribution Stream” option 70220d from selection list 70220 when displayed in respect to a KPI for which the KPI entity breakdown is not active, enabled, or defined. Selection by the user of the distribution stream option 70220d in an implementation may cause the display of the GUI as depicted in FIG. 70C.



FIG. 70C depicts GUI 70300 for editing the Distribution Stream graph style of a graphical visualization of KPI-related values along a time-based graph lane in a visual interface. GUI 70300 includes GUI elements related to a KPI title 70342, the overwriting of the KPI title 70344, a subtitle 70346, a graph type 70212, Distribution Stream mode options 70330a-b, graph color 70352, the associated service 70354, the relevant KPI of the associated service 70356, and the KPI search 70358. In one implementation, an earlier user action to establish Distribution Stream as the graph type results in the display, inclusion, enablement, or activation of Distribution Stream mode options 70330a-b. Distribution Stream mode option buttons 70330a-b enable user interaction to select a particular distribution stream mode for a visualization. User interaction with option buttons 70330a-b may be conducted using any of a number of human interface devices such as a keyboard, a mouse, a touchpad, a touchscreen, a microphone, a position sensor, a video camera, or the like. Clicking, touching, “pressing,” or otherwise activating one of the option buttons 70330a-b in an embodiment may deselect or inactivate the other option buttons. The selection of a Distribution Stream mode made by a user may affect the statistical metric time series data that is created and/or used in relation to a distribution stream visualization. The “Quantile” and the “Standard Deviation” modes indicated for selection buttons 70330a and 70330b, respectively, are representative examples, and other modes are possible for a distribution stream visualization. The effect of selecting the quantile mode or the standard deviation mode for the visualization of data related to a KPI, and more particularly to the entity breakdown data related to a KPI, in one embodiment may be seen in consideration of the visualization depicted in FIG. 70D.



FIGS. 70D-70F illustrate examples of a visual interface displaying graphical visualizations along time-based graph lanes, including aspects related to KPI entity breakdown. FIG. 70D depicts a visual interface displaying graphical visualizations of statistical metric time series data for two KPIs; one displaying statistical metric data associated with a quantile mode and one displaying statistical metric data associated with a standard deviation mode. Visual interface display 70400 includes two graph lane areas 70440 and 70450. Graph lane area 70440 includes information area 70440a and time-based graph lane 70440b. Graph lane area 70450 includes information area 70450a and time-based graph lane 70450b. Manipulable visual indicator 70460 spans across the time-based graph lanes 70440b and 70450b. Time indicator 70468, scale high indicator 70462a, scale low indicator 70462b, and point values 70464 and 70466 appear in conjunction with the manipulable visual indicator 70460, also referred to as a “lane inspector.” Visual interface display 70400 further includes GUI components enabling user interaction to add a lane 70412, enable the display of threshold information 70414, refresh the display 70417, enable the display of comparative data 70418, and navigate to a GUI display such as 70300 of FIG. 70C for editing the definition of GUI display 70400 of FIG. 70D, now under discussion. Timescale area 70492 displays time scale information corresponding to a time dimension, such as the horizontal axis, of time-based graph lanes 70440b and 70450b.


Time-based graph lane 70440b displays a distribution stream from statistical metric time series data associated with a quantile mode. As the name suggests, the statistical metric time series data includes quantile metrics. In addition to displaying data for the 25th and 75th percentiles (quantiles), the minimum and maximum values (the 0th and 100th percentiles) are also displayed. An embodiment may further display the 5th and 95th percentiles, or display fewer, or display more—in these or other combinations. The combinations may be made by the user or selected from a set of predefined combinations. The possible embodiments are not limited by the examples illustrated and discussed.


Curving line 70488 appearing as the dotted line along the bottom of the graphed data in time-based graph lane 70440b corresponds to the statistical time series data for the minimum metric. Curving line 70486 appearing as the dotted line along the top of the graphed data in time-based graph lane 70440b corresponds to the statistical time series data for the maximum metric. Curving line 70484, for example, appearing as the boundary between (or, edge at the juncture of) areas 70474 and 70476 of the graphed data in time-based graph lane 70440b corresponds to the statistical time series data for the 25th percentile metric. Curving line 70482, for example, appearing as the boundary between (or, edge at the juncture of) areas 70474 and 70472 of the graphed data in time-based graph lane 70440b corresponds to the statistical time series data for the 75th percentile metric. Accordingly, graphed data area 70476 is representative of entities with per-entity KPI values in the lowest 25% of entities (0th to 25th percentiles); graphed data area 70474 is representative of entities with per-entity KPI values in the middle 50% of entities (25th to 75th percentiles); and graphed data area 70472 is representative of entities with per-entity KPI values in the highest 25% of entities (75th to 100th percentiles).


Note that curving lines as just referenced may occur for different reasons depending on the implementation. In one implementation, curving lines may be produced to represent the data of a time series by applying a smoothing function during rendering. In another implementation without a smoothing function, the resolution and characteristics of the data may be sufficient to produce a smooth, curving appearance. Where a smooth, curved appearance is desired, an implementation may include applying a smoothing function to time series data to reliably produce that appearance.


In one embodiment, graph data areas (such as 70472, 70474, and 70476) of a distribution stream graph may be interactive. In one embodiment, user interaction with a graph data area, such as clicking, touching, or hovering over it, results in the display of a pop-up or modal GUI component displaying a list of all of the entities represented by the data area. In one embodiment, the GUI component merely displays the list of entities. In another embodiment, the list of entities is interactive and allows the user to select one or more entities from the list. Selecting an entity from the list may result in the addition to a display such as 70400 of an overlay for the entity data (discussed below), or may result in the addition to the display of an additional graph lane area showing only data for the selected entity. In one embodiment, user interaction with a graph data area results in the display of a navigation options GUI component as is discussed below in relation to FIG. 70K. Embodiments may vary as to the actions resulting from user interaction with any interactive graph data areas of the distribution stream display.


Many implementations of a quantile mode option are possible. One implementation, for example, may include the display of an average metric (mean, median, or mode) in the area graph. One implementation, for example, may use quartile or decile metrics in contrast to percentile. Implementations may vary in their inclusion of extreme metric data (minimums and maximums) and in the number of quantile metrics displayed (e.g., two, three, four, five, etc.). These and other variations are possible.


Time-based graph lane 70440b depicts a quantile mode distribution stream display of minimum, maximum, 25th percentile, and 75th percentile statistical metrics. The value in the time series for each of the metrics corresponding to the time 70468 indicated at the position of the lane inspector 70460 appears in display 70400 as text point values 70464. Point values 70464, in this example, indicate “max: 0.31” (the maximum value metric), “perc75: 0.31” (the 75th percentile metric), “perc25: 0.28” (the 25th percentile metric), and “min: 0.28” (the minimum value metric). Similarly, point values 70466 shows the values that correspond to time 70468 in the time series for metrics represented in the standard deviation mode distribution stream graph of the lower time-based graph lane 70450b. Point values 70466, in this example, indicate “max: 3.54” (the maximum value metric), “avg: 3.52” (the mean (average) metric), and “min: 3.49” (the minimum value metric). In one embodiment, a text point values display such as 70464 may represent point values for only a subset of the graphed data, for example, for only the 25th and 75th percentiles, or some other subset. The subset may be determined based on user input and customizations, system defaults and settings, dynamic adaptations to the immediate context (such as by considering available screen space), some combination of these, or by other factors or methods.


In an embodiment, the maximum and minimum may be shown as dotted lines or may be otherwise visually distinguished from the appearance of other statistical metrics. In an embodiment, the space between each of the maximum and minimum statistical metrics and its nearest neighboring metric may not be shaded in the normal fashion for an area graph, but the minimum and maximum statistical metrics are rather each displayed in line graph fashion. For example, while an area above a 5th percentile line may be shaded as an area graph of the distribution stream visualization, the area below the 5th percentile line, in this example, and extending to the minimum value line and beyond may have the normal background color. Similarly, in this example, while an area below a 95th percentile line may be shaded as an area graph of the distribution stream visualization, the area above the 95th percentile line and extending to the maximum value line and beyond may have the normal background color. The area defined between a maximum value line and the upper limit of the shaded area graph of the distribution stream visualization of the statistical metric data may be referred to as an “outlier” area. Similarly, the area defined between a minimum value line and the lower limit of the same shaded area graph of the distribution stream visualization may also be referred to as an “outlier” area. An implementation so displaying the extreme values (the minimum and maximum values) as line graphs offset from a central distribution stream area graph may improve the accuracy with which a user perceives the proportion of entities in a given state relative to the KPI. This is because the outlier areas can often extend across a wide range of metric values despite representing relatively few entities. One of skill recognizes, of course, that the width of an outlier area is data dependent and where the data dictates a very small or nonexistent outlier area, the maximum or minimum value line may appear as an outer edge of the distribution stream area graph. In such a case the use of a different representation type for an extreme metric, such as a representation type having a visually evident difference to its characteristic structure (e.g., solid, dotted, dashed, narrow, and wide lines), can be advantageous.


Time-based graph lane 70450b displays a distribution stream from statistical metric time series data associated with a standard deviation mode. As with a quantile mode distribution stream display, many variations are possible. One implementation, for example, may include the display of an average metric, such as the mean, in the area graph. One implementation, for example, may include a maximum value, a minimum value, or both. One implementation, for example may only include the display of negative standard deviation metrics, and another may only include the display of positive standard deviation metrics, and another may include both. Embodiments may vary, for example, in the number and selection of the standard deviation metrics displayed. These and other variations are possible including, for example, variations in the visual representation of metric data, the area edges, and the outermost area boundaries of the distribution stream display. One embodiment, for example, may distinguish area boundaries with a contrasting color while in other embodiments the area edges appear the same as corresponding area interiors. (In this discussion of distribution stream visualization, area edges and boundaries may be considered to be synonymous.)



FIG. 70E shows display 70400 of FIG. 70D as it may appear after user interaction with manipulable visual indicator 70460. FIG. 70E shows manipulable visual indicator 70460 moved to time 70568 “10:25:35 AM” (from time 70468 “10:05:28 AM” of FIG. 70D). The value in the time series for each of the metrics corresponding to the time 70568 (indicated at the position of the lane inspector 70460) as shown in FIG. 70E appears in display 70500 as text point values 70564. Point values 70564, in this example, indicate “max: 0.63” (the maximum value metric), “perc75: 0.63” (the 75th percentile metric), “perc25: 0.47” (the 25th percentile metric), and “min: 0.47” (the minimum value metric). Similarly, point values 70566 shows the values that correspond to time 70568 in the time series for metrics represented in the standard deviation mode distribution stream graph of the lower time-based graph lane. Point values 70566, in this example, indicate “max: 8.55” (the maximum value metric), “avg: 0.63” (the mean (average) metric), and “min: 3.33” (the minimum value metric).



FIG. 70F shows display 70400 of FIG. 70D as it may appear when augmented with the display of per-entity time series data for a particular entity. Display 70600 of FIG. 70F depicts the augmented display. The augmented display includes the addition of a line plot 70680 as an overlay superimposed in time-based graph lane 70640b of upper graph lane area 70640. (Superimposing the line plot with the distribution flow graph permits aspects of both to be viewed together within a common display space.) Line plot overlay 70680 depicts the time series of per-entity KPI values for a particular entity of the KPI associated with the time-based graph lane 70640b. The augmented display further includes the addition of a line plot 70686 as an overlay superimposed in time-based graph lane 70650b of lower graph lane area 70650. Line plot overlay 70686 depicts the time series of per-entity KPI values for a particular entity for the KPI associated with the time-based graph lane 70650b. Implementations are not limited to the inclusion of per-entity KPI data for a single entity and may include per-entity KPI time series data for multiple entities. In one embodiment, the maximum number of per-entity time series that may be presented simultaneously in conjunction with the distribution stream graph is limited to 10, though other embodiments may impose a different limit or be unlimited. An implementation may differ the appearance of the individual time series data for each of multiple entities by varying line color, line style, or some other visual or formatting attribute. An implementation may present data, of a time series or otherwise, and related to a specific entity or otherwise, in a graph lane overlay. Moreover, an overlay may have the appearance of an overlay, and underlay, or other method of displaying the overlay data in conjunction with a distribution stream graph. For example, a graph lane overlay of per-entity time series data may appear as a line graph behind a semi-transparent distribution stream graph. These and other variations of an augmented distribution stream graph display are possible. Moreover, an overlay may be interactive such that user interaction with the overlay, such as clicking, touching, or hovering over a visible portion of the overlay, results in the display of a pop-up or modal GUI component displaying a navigation options GUI component as is discussed below in relation to FIG. 70K. In another embodiment, user interaction with the overlay results in a display of inventory information about an entity associated with the overlay. These and other interactions are possible. The specification for one or more graph lane overlays in a particular implementation or instance may be facilitated by the use of graphical user interfaces (GUIs), examples of which are next discussed.



FIG. 70G-H illustrate GUI examples for graph lane overlay options, including aspects of KPI entity breakdown. FIG. 70G depicts a GUI for specifying graph lane overlay options in one embodiment. GUI 70700 includes GUI components 70712 related to enabling overlays, 70714 related to overlay graph color, 70720 related to an overlay selection mode, 70792 related to successfully concluding a specification session using the GUI, and 70794 related to canceling a specification session using the GUI.


GUI component 70712 includes a set of option buttons permitting a user to specify whether overlay augmentations are enabled for a distribution stream graph display, as may appear in a time-based graph lane for example. In the example illustrated by GUI component 70712, user interaction selecting a “Yes” button enables overlays while similar interaction selecting a “No” button disables overlays.


GUI component 70714 includes a selection box 70714 enabling user selection of a graph color from a drop-down list of available options (not shown). An “Automatic” option, shown as selected for box 70714, specifies that one or more graph color(s) for overlays are determined automatically in accordance with the programming of the computing device, in this example. An embodiment may automatically determine the color by sequentially selecting from a list of available colors, by incrementally adjusting tone, saturation, or luminance properties of a base color, by selecting a color available in a predefined color scheme, or by another method.


GUI component 70720 includes a set of option buttons (buttons 70720a-b in this example) enabling a user to specify an overlay selection mode. User interaction with option button 70720b to select a “Dynamic” overlay selection mode of one embodiment indicates that a distribution graph stream display should be augmented with overlays for each of the top 3 worst-performing entities included in the relevant KPI. Implementations can vary as to the method for automatically determining the top 3 worst-performing entities and may include, for example, determining the 3 entities with the highest average for the per-entity KPI time series, determining the 3 entities with the lowest average for the per-entity KPI time series, determining the 3 entries with the greatest range of values within the per-entity KPI time series, or some other method. Implementations can also vary as to the entities selected in such a “Dynamic” mode. For example, one implementation may select the top 3 worst-performing entities, one implementation may select the top 3 best-performing entities another implementation may select the top 5 worst-performing entities, and another implementation may select entities using different selection criteria altogether. In one implementation, a user interaction with option button 70720b displays a selection list enabling a user to select a category of entities from a list of categories which may include any from among the N-best, N-worst, N-largest, N-smallest, N-newest, N-oldest, and other categories (where N represents a positive integer value which in some embodiments may be 3 or less, 5 or less, 10 or less, 20 or less, or 50 or less). Other categories are possible and may include categories such as critical state entities (up to 10), or warning state entities (up to 10). These and other categories are possible.


In, one implementation the selection of entities in a dynamic selection mode occurs automatically according to defined dynamic mode selection criteria on a one-time basis when a distribution stream graph is first displayed, or at some other point in time, and the selection does not change absent user intervention. In another embodiment the selection of entities in a dynamic selection mode occurs automatically according to defined dynamic mode selection criteria on an ongoing basis, perhaps when a distribution stream graph is first displayed and each time the distribution stream graph display is refreshed, either automatically or on-demand by user request.


User interaction with option buttons 70720a to select a “Static” overlay selection mode of one embodiment indicates that a distribution graph stream display should be augmented with overlays for data sources specified by the user, for example, the data of per-entity time series for entities contributing to the KPI associated with the distribution stream graph. The specification of the data sources by the user may be facilitated by providing appropriate GUI components. In one implementation, a user interaction with option button 70720a displays a selection list enabling a user to select a category of entities from a list of categories which may include any from among LINUX machines, Windows Machines, Chicago machines, or others. Other categories may include categories reflecting static attributes of entities as may be reflected in their entity definitions, for example, in info fields, and may include static attributes such as operating system, manufacturer, location, and others.


In one embodiment, user interaction with “Static” option button 70720a of GUI display 70700 results in an expanded version of the display to provide the appropriate GUI components. FIG. 70H depicts such an expanded version.



FIG. 70H depicts a version of the GUI display of FIG. 70G expanded to include GUI components enabling the user to indicate a selection of data sources for overlays displayed for a static overlay selection mode. Many of the GUI components of GUI display 70800 appear and operate as their counterparts of GUI display 70700 of FIG. 70G. GUI components 70812, 70814, 70820, 70820a-b, 70894, and 70892 of FIG. 70H generally appear and function as their counterparts depicted and described in relation to GUI 70700 of FIG. 70G: 70712, 70714, 70720, 70720a-b, 70794, and 70792, respectively. The appearance of entity selection area 70830 and selected entity area 70850 of FIG. 70H is new.


Entity selection area 70830 presents a user with information about data sources, such as per-entity time series data in this example, and enables the user to select one or more of the data sources to be presented as overlays in a distribution stream graph display (as exemplified in FIG. 70F). In the illustrated embodiment, entity selection area 70830 presents the data source information in a tabular format and includes a column header row 70832, and a page navigation footer row 70834. Between header row 70832 and footer row 70834 there appear multiple entries, one entry per row, and each row presents information about a particular data source. Data source rows include 70840a, 70840b, 70840c, and 70840f, for example. The row for a data source may include a GUI component, such as a checkbox or option button, that enables the user to select or deselect the data source as in the “Selection” column of this example. The row for a data source may include a title, name, or other identifier for the data source as in the “Entity_Title” column of this example. The row for a data source may include an indication of an alert level associated with the data source as in the “Alert_Level” column of this example, and the alert level may be indicated with a color-coded icon as represented in display 70800, or with some other graphical element. The row for a data source may include a graph of data from the data source as in the “sparkline” column of this example.


In the implementation illustrated using FIG. 70H, a user indicates the selection of a data source for use as an overlay displayed in conjunction with the distribution stream graph by marking the checkbox in the “Selection” column of the row displaying the data source information. Rows 70840a-c are shown in display 70800 as having their checkboxes checked. (In contrast, row 70840f, for example, is shown as having its checkbox unchecked.)


Selected entity area 70850 of this embodiment displays a list of all of the data sources (here, entities) that have been selected by virtue of user interaction with entity selection area 70830. Accordingly, the identifiers appearing in the “Entity_Title” column for each of selected rows 70840a-c appear as entries 70860a-c in the selected entities area 70850. Each entry in the selected entities area 70850 is postfixed with an interactive button enabling the user to delete the entry, and so, the data source or entity, from the selection. In one embodiment, clicking on or otherwise interacting with “Done” button 70892, causes the display of a GUI showing a distribution stream graph in conjunction with overlays for the data sources appearing in selected entity list 70850.



FIG. 70I illustrates an example of a visual interface displaying twin graphical visualizations along time-based graph lanes for different periods of time, including aspects of KPI entity breakdown. Twin graphical visualizations along time-based graph lanes for different periods of time are discussed, as well, in relation to FIG. 66, for example. FIG. 70I specifically relates to twin visualizations of distribution stream graphs from KPI entity breakdown data along time-based graph lanes for different periods of time. Many elements appearing in display 70900 of FIG. 70I have a general correspondence to elements appearing in earlier figures. For example, graph lane areas 70940 and 70950, graph lane information areas 70940a and 70950a, time-based graph lanes 70940ba and 70950ba, and comparative data GUI element 70918 have a general correspondence to respective elements 70440, 70450, 70440a, 70450a, 70440b, 70450b, and 70418 in display 70400 of FIG. 70D. Notable differences between FIG. 70I, presently discussed, and earlier FIG. 70D, include the marked checkbox of comparative data GUI element 70918 and the introduction of twin lanes 70940bb and 70950bb.


In one embodiment, the checkbox of GUI element 70918 is marked by a user clicking on or touching the checkbox when in an unmarked state. The marking of the checkbox by the user indicates that the user desires to introduce twin lanes such as 70940bb and 70950bb to appear in conjunction with the primary time-based graph lanes 70940ba and 70950ba, respectively. The distribution flow graph appearing in the twin lane is based on the same source data as the distribution flow graph appearing in the primary lane (here, statistical metric time series derive from KPI entity breakdown data), but for a different time period. The time period is determined by a default value or a user selection from a drop-down list associated with GUI components 70918, in this example. Here, a user selection of the relative time period “60 minutes ago” is displayed for comparative data GUI element 70918. Accordingly, the distribution stream graph appearing in twin lane 70940bb depicts data from an hour earlier than the data depicted in primary time-based graph lane 70940ba, at a given point along the time axis. Similarly, the distribution stream graph appearing in twin lane 70950bb depicts data from an hour earlier than the data depicted in primary time-based graph lane 70950ba, at a given point along the time axis.


Incidentally, primary time-based graph lane 70940ba includes overlays 70982 and 70984, illustrating a distribution stream graph with multiple overlays. In the present embodiment, overlays appearing in a primary time-based graph lane are not automatically replicated into an associated twin lane. One may also note that an embodiment may support the display of multiple “twin” lanes in association with a single primary time-based graph lane, each possibly displaying data for a different time period.


A display such as 70900 depicted in FIG. 70I, as just one example, may be augmented by the visualization of additional information, including information also related to the associated KPI. FIG. 70J depicts one such example. FIG. 70J illustrates an example of a visual interface displaying graphical visualizations along time-based graph lanes including threshold visualization, and aspects of KPI entity breakdown. Visual display 71000 includes a primary time-based graph lane 71010 and a twin lane 71030, each depicting a distribution stream graph over a background visualization of KPI threshold-related data. Graph lane 71010 visualizes the KPI threshold-related data as a background area graph. Each area of the area graph may correspond to a particular KPI state defined by threshold data. For example, area 71022 may correspond to a “Normal” state with its upper boundary, its lower boundary, or both, defined by threshold data. As another example, area 71024 may correspond to a “Warning” state having one or more boundaries defined by threshold data. As another example, area 71026 may correspond to a “Critical” state having one or more boundaries defined by threshold data. Each of areas 71022, 71024, and 71026 may have a visual attribute, property, style, or the like, to visually distinguish it from adjacent areas including the background. For example, display 71000 may be presented using a color attribute to visually distinguish threshold areas 71022, 71024, and 71026 with area 71022 appearing in a green color, 71024 appearing in a yellow color, and 71026 appearing in a red color, as one example. In another embodiment, a visual style such as fill pattern may be used instead of color. In another embodiment, a combination of color and fill pattern may be used, or one or more visual attributes, properties, or styles, altogether different. The number of areas shown in a lane may be different among lanes, and may be determined by the number of thresholds defined in association with a KPI or by a user selection of particular ones of those thresholds. For example, graph lane 71030 is shown with a single threshold area 71042 representing one threshold value associated with the corresponding KPI. These and other embodiments are possible.


A display such as 71000 of FIG. 70J, or 70900 depicted in FIG. 70I, may provide an analyst with the information sought but also may provide clues about promising areas for further investigation. For example, it may cause an analyst concern that entity overlay 70984 of FIG. 70I projects into the upper quantile area of its underlying distribution stream graph at some point. Flow diagram aspects of FIG. 70A, already discussed, shows that a navigation option may be exercised after a visualization such as the visualization display 70900 of FIG. 70I. Navigation away from visualization 70900 to perform further investigation, for example, can benefit where context information already developed in and around a visualization can be carried forward into a next avenue of investigation or inquiry. Using the example of entity overlay 70984 and the concern it may cause in an analyst, the GUI displaying 70900 may enable user interaction with overlay 70984 or some other GUI component to perform a navigation using the context associated with the overlay or other component. For example, a click or mouse-over action by the user in relation to the overlay or other component may result in the display of a navigation selection GUI component as part of the processing associated with the Navigate 70150, Select 70152, and Option 70154 blocks of FIG. 70A. An embodiment related to such navigation is next discussed.



FIG. 70K is a block diagram illustrating aspects of navigation options in one implementation. Navigation options list 71110 illustrates an example GUI component, such as a popup or dropdown, that may be displayed in an embodiment by a user interaction with a visualization such as display 70900 of FIG. 70I and, by more particular example, an interaction with a specific GUI component of the visualization such as entity overlay 70984. Navigation options list 71110 of FIG. 70K is shown to include the title “Active Drill Down Options” and five selectable options 71120a-e. Options 71120a-b represent built-in or hardcoded options. “Open Overlays as Multiple” option 71120a may remove all current overlays from the lane of the distribution flow graph and place each in its own lane in a display area. It may require implementation as a built-in navigation in one embodiment because of its complexity in handling multiple overlays, performing deletions, and opening multiple lanes where such complexity may not be provided by an interface that can be exercised using the contents of a configuration file entry (71172, and 71174 are examples of configuration file entries). “Delete All Unrelated lanes” option 71120b may remove all graph lane areas from a display area that are not related to the KPI or entities associated with a displayed distribution stream graph. Such an option may require certain procedural logic that again may not be available from an interface exercisable using the contents of a configuration file entry. Various embodiments may have varying capability exposed through the use of a configuration file such as configuration file 71182 and accordingly a navigation option necessarily implemented as a built-in navigation option in one embodiment could possibly be implemented as a customizable option in a different embodiment. Accordingly, the foregoing examples, and the examples that follow, are merely illustrative and do not limit the range of embodiments possible that utilize novel aspects taught here.


Options 71120c-d represent customizable options that can be customized by changing information in a configuration file such as 71182, for example. Options 71120c-d may be navigation options that are defined and/or supplied by a system vendor or third-party provider and are intended for customization by themselves through an action changing information in a configuration file or a distribution of configuration file contents, or for customization by a user through changes to a configuration file, or both. “Open Help Desk Ticket” option 71120c may be provided by an IT Service Management (ITSM) software or software-as-a-service (SaaS) vendor to permit a user to navigate to a new window in a service ticket tracking system where the name of the entity service, status information about the KPI, and time range information is prefilled with contextual information carried forward from the visualization environment. “Search Problem Data Base” option 71120d may be provided by the same or different vendor to permit a user to similarly navigate to a window with a search screen for a problem data base that has certain fields of a GUI prefilled with contextual information carried forward from the visualization environment and may display the results from a search of the problem data base that had been specified in whole or in part with information carried forward from the visualization environment.


Option 71120e represents an option created and maintained by the user. “Open Event Search” option 71120e may have been configured by the user to navigate to a search results page populated with the results of a search query executed using an event processing system where the search query include selection criteria based on information carried forward from the visualization environment, such as KPI, entity, and time range information. The use of a configuration file for configurable navigation options in one embodiment is next discussed.


“Open Help Desk Ticket” 71120c is shown by arrow 71132 to be associated with configuration entry 71172. Similarly, “Open Event Search” 71220e is shown by arrow 71134 to be associated with configuration entry 71174. Configuration entries 71172 and 71174 are shown to be included in configuration file 71182. Configuration file 71182 may not correspond to a particular file as defined within an operating system, and may correspond to any one or a combination of computer readable storage methods and mechanisms. Despite potentially different sources and maintenance mechanisms intended for configuration entries 71172 and 71174, the types of information they contain may be identical or nearly so. In one embodiment, a configuration entry contains a display name that may be a text string for display to the user as an option in navigation options list 71110, for example. The configuration entry may also contain target information that identifies and supports an interface to the target or destination of the navigation. Targets may be a system, a URL, a URI, a process, a code entry point, a subsystem, an application, a search instantiated for a new graph lane in a visualization, or any other destination for computer-implemented processing having a defined, discoverable and/or exercisable interface. In one implementation the target may include a search query processing component of an event processing system.


The configuration entry may also contain property carryforward information. The property carryforward information may specify context information available from a visualization environment that is to be carried forward into the application environment/context of the navigation target, and specify information about how to carry it forward. Such information may include, for example, information about how to format it or where to place it to comply with an interface of the target. Property carryforward information may, for example, include substitution tokens identifying context variables existing in a visualization environment. The property carryforward information may also include, for example, the substitution tokens placed within a statement that is syntactically correct to invoke target processing. Some examples are a tokenized URL directed to a web service, a tokenized search query directed to an event processing system, and a tokenized SQL statement directed to a relational database management system. (Note that each of the tokenized statements has its generic form in the configuration file where substitution tokens appear as placeholders in a template/prototype/model statement used to interface with the navigation target, and its substituted, specialized, or instantiated form for sending to the navigation target where the substitution tokens are replaced by the values of information items or variables from the current context.)


The configuration entry may also contain other information relevant to the navigation option. The other information may include, for example, information about conditions in which the menu options should or should not be displayed. For example, a condition may be included specifying that the menu option not be displayed unless the GUI component, interaction with which caused display of options list 71110, is associated with a specific entity. Referring back to FIG. 70I to illustrate, a menu option having the aforementioned condition would be displayed in a navigation options list (71010 of FIG. 70K) if the navigation options list appeared as a result of user interaction with entity overlay 70984, for example, because the entity overlay is associated with one specific entity. In contrast, that same menu option would not be displayed in a navigation options list if the navigation options list appeared as a result of user interaction with the distribution flow graph of twin lane 70940bb because that distribution flow graph is not associated with one specific entity. Another implementation may include other or additional conditioning options such as whether the GUI component is a lane displaying a statistical metric, whether the GUI component is a lane displaying events, and others.


One of skill may appreciate from study of the foregoing disclosure that while described in the context of distribution flow graph visualizations, an implementation of configurable navigation as described in relation to FIG. 70K is not so limited.


Defining a New Correlation Search Based on Graph Lanes


Implementations of the present disclosure may include a mechanism to generate correlation searches based on information displayed in one or more graph lanes. The graph lanes may be selected by a user and may be customized to cover a desired time period. The graph lanes may allow a user to detect, diagnose or solve a problem (e.g., system malfunction, performance degradation) or identify a performance pattern of interest (e.g., increased usage of one or more services by end users). The graph lanes may allow the user to visually inspect a diverse set of information and may enhance the user's ability to identify patterns amongst the graph lanes. Once a user has identified the graph lanes that relate to a problem or a pattern of interest, the user may submit a request to create a new correlation search. The system may then analyze the information represented by the graph lanes to create a definition for a new correlation search. The new correlation search provided by the created definition may then be run to detect a re-occurrence of the problem or the pattern of interest, and to cause an action (e.g., an alert or a notification of the user) to be performed.



FIG. 71 provides an exemplary GUI 7150 that displays a set of graph lanes 7152A-G and a GUI element 7154 for creating new correlations searches. When a user selects GUI element 7154 (e.g., a button), a user's request to create a new correlation search may be generated. Responsive to the user request, the system may iterate through the set of graph lanes 7152A-G to acquire information pertaining to the graph lanes. The graph lanes may be associated with key performance indicators (KPIs) and the system may analyze fluctuations in each KPI to automatically (without any user interaction) generate KPI criteria for individual KPIs. The KPI criteria may then be aggregated to automatically (without any user interaction) create an aggregate triggering condition for the correlation search. As will be discussed in more detail below, the aggregate triggering condition may be used during the execution of the correlation search to identify when the problem or the pattern of interest re-occurs.



FIGS. 72A-C illustrate multiple flow diagrams of exemplary methods 7210, 7220, and 7230. Method 7210 is an example of a method for assisting a user in initiating the creation of a new correlations search. Method 7220 is an example of a method for creating a correlation search definition based on displayed graph lanes. Method 7230 is an example of a method for running the correlation search to identify a re-occurrence of a performance pattern of interest (e.g., a problem in the performance of one or more services). Each of the methods may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system or a dedicated machine), or a combination of both. In one implementation, one or more of the methods may be performed by a client computing machine. In another implementation, one or more of the methods may be performed by a server computing machine coupled to the client computing machine over one or more networks.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks). However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media.


Referring to FIG. 72, method 7210 may begin at block 7211 when the computing machine causes display of a set of graph lanes corresponding to a plurality of KPIs that each indicate how a service is performing over a period of time. Each KPI may comprise multiple KPI values derived from machine data pertaining to one or more entities providing the service. Each KPI value may indicate how the service is performing at a point in time or over a duration of time. The graph lanes may provide graphical visualizations to illustrate the KPI values and changes in the KPI values over time. As discussed above, the KPI values may correspond to different KPI states that are defined based on KPI thresholds. In some implementations, each graph lane may visually illustrate respective states of the KPI over the period of time using a visual indicator (e.g., color, shading, etc.).


The set of graph lanes may include multiple different graphical visualizations including a line graph, an area graph, a bar chart, a heat map or other visualization. The graphical visualizations may be displayed in parallel such that each graph lane may be stacked above one another. The set of graph lanes may also be calibrated to the same time scale and span the same period of time. The time scale may be presented as a timeline along a horizontal axis of the bottom of the lowest graph lane.


The user may adjust which graph lanes should be displayed and the time period being displayed in order to discover or diagnose a problem. Adjusting which graph lanes should be displayed may involve adding or removing graph lanes from the set of graph lanes. Users may add graph lanes by accessing a library of existing graph lanes or creating their own graph lanes. The library of graph lanes may include graph lanes that are packaged with the application as well as graph lanes that may have been configured by an IT administrator within an organization. Users may also create their own graph lanes by using, for example, a graphical user interface or sequence of GUIs (e.g., wizard) that enables a user to select a service, a KPI and a type of graph lane. The computing machine may also remove graph lanes in response to user input. In one example, the user may select one or more graph lanes in the set of graph lanes and enable an “edit” mode that may present the user with an option to delete the selected graph lanes. The computing machine may then update the set of graph lanes to remove the one or more graph lanes. In one example, a user may add and remove graph lanes and the resulting set of graph lanes may cover multiple services and include at least one or more graph lanes corresponding to a first service and at least one or more graph lanes corresponding to a second service. Displaying multiple graph lanes together may allow the user to identify patterns amongst the graph lanes. For example, the user may see that there is a spike in values in one aspect of a service just prior to another service entering a critical state. This may provide insight when performing problem determination techniques, such as root cause analysis. It should be noted that performance problem determination is only one example of how the correlation search discussed herein can be utilized, and that various other patterns of service performance can be detected via the correlation search without loss of generality.


A user may also want to adjust the time frame while performing the problem determination. The user may begin by reviewing the graph lane over a large period of time to identify when the problem began and may subsequently focus on (e.g., zoom-in to) a portion of the graph lane that includes the beginning of the problem (e.g., system malfunction, performance degradation). Based on user input, the computing machine may adjust the duration of time associated with a graph lane. In one example, the computing machine may receive user input to modify a zoom level of the set of graph lanes and in response, the computing machine may update the duration of time being displayed to correspond with the zoom level. For example, if the graph lane is displaying a 24-hour duration of time, the user may zoom-in to display a 12-hour duration of time. In another example, the user may provide input that identifies a portion of one or more graph lanes. For example, the user may select a portion of the graph lanes that corresponds to a four-hour duration from 4 pm through 8 pm, and the computing machine may update the GUI such that the selected portion of the graph lines occupies the entire GUI area designated for the display of graph lanes. In another example, the GUI may include a graphical element (e.g., button, drop down list, etc.) that presents the user with multiple predefined time durations (e.g., 15 min, 60 min, day, week) and the user input may identify one of the predefined time durations.


At block 7212, the computing machine may receive a user request to create a definition of a correlation search based on the set of graph lanes that have been adjusted by the user. The set of graph lanes may visually illustrate a cause or a symptom of a problem and the correlation search may be defined to detect an occurrence of the problem during another period of time. In one example, the correlation search may detect a re-occurrence of the same problem or a similar problem in the future. In another example, the correlation search may detect an occurrence of the same problem or a similar problem in the past. In yet another example, the correlation search may detect when a similar problem has occurred with a separate set of computing resources.


At block 7213, the computing machine may create the definition of the correlation search in response to the user request. Creating the definition of the correlation search may involve processing (e.g., iterating through) the set of graph lanes to automatically determine KPI criteria for the KPIs associated with the set of graph lanes and combining the KPI criteria into an aggregate triggering condition for the correlation search definition. An exemplary method of creating an aggregate triggering condition for a correlation search definition is discussed in more detail below in conjunction with FIG. 72B.


In addition to the aggregate triggering condition, the correlation search includes a search component for producing results to which the aggregate triggering condition should apply. The search component can be applied to KPI data (KPI values and/or KPI states) of the KPIs to extract the KPI data of the KPIs for a given time period. In some implementations, KPI data (e.g., KPI values and/or KPI states) of each KPI is determined using a KPI search query and stored in a data store in association with a unique identifier of the KPI and relevant points in time or durations of time. In such implementations, the search component can specify information for locating the stored KPI data (e.g., using a unique identifier of each KPI and the location information of the data store), and the given time period (or time window). In other implementations, the search component does not refer to the stored KPI data and instead specifies the actual KPI search query that will produce KPI values of the KPIs for a given time window.


The correlation search definition can also include additional information such as the correlation search name, scheduling information and action information. The scheduling information may specify how often the correlation search should be executed. The action information may define an action to be performed when the aggregate triggering condition is satisfied.


In some implementations, the computing machine may automatically generate the search component and the additional information without requiring any subsequent user interaction. For example, the computing machine may gather the search component information from the graph lanes without requiring a user to provide any input. The computing machine can also use a predefined correlation search name, predefined scheduling information and predefined action information.


In other implementations, the user request may initiate another GUI (e.g., dialog box) that allows the user to provide the search component and/or the additional information for the correlation search, such as the correlation search name, the scheduling information and the action information. The exemplary GUI is discussed in more detail below in regards to FIG. 74.



FIG. 72B is a flow diagram of an implementation of method 7220 for creating a definition of a correlation search based on one or more graph lanes, in accordance with one or more implementations of the present disclosure. As discussed above in regards to FIG. 34C, a correlation search definition may be stored in a service monitoring data store as a record that contains information about one or more characteristics of a correlation search related to KPIs. Various characteristics of a correlation search may include, for example, a name of the correlation search, information for a search component, information for a triggering determination (aggregate triggering condition), a defined action that may be performed based on the triggering determination, one or more services that are related to the correlation search, and other information pertaining to the correlation search such as the frequency of executing the correlation search, and duration information.


The duration information may specify the time period that should be used for the search component to extract relevant KPI data (KPI values and/or KPI states). For example, the duration may be the “Last 60 minutes”, and the search component should extract KPI data produced using the time-stamped events from the last 60 minutes. The search component can either specify information for locating stored KPI data of the KPIs or a search query for producing KPI values of the KPIs.


The trigger determination information may include KPI criteria combined into an aggregate trigger condition for evaluating the KPI data obtained by the search component to determine whether to cause a defined action. Each KPI criterion may include one or more contribution threshold components for respective one or more KPI states. Each contribution threshold component may include an operator (e.g., greater than, greater than or equal to, equal to, less than, and less than or equal to), a threshold value, and a statistical function (e.g., percentage, count). For example, the contribution threshold may be “greater than 29.5%”.


The action component may specify an action to be performed when the aggregate triggering condition is considered to be satisfied. An action can include, and is not limited to, generating a notable event, sending a notification, and displaying information in an incident review interface, as described in greater detail above in conjunction with FIGS. 34O-34Z.


Method 7220 may begin at block 7221 when the computing machine may select a graph lane from the set of graph lanes to gather information for one or more KPIs associated with the selected graph lane. The gathered information may include graph lane data (e.g., displayed data) and KPI configuration information. The graph lane data may be data that is being displayed within the graph lane, such as KPI values and/or KPI states, error messages, counts, value changes, or other displayed data. The KPI configuration information may identify the service and KPI associated with the graph lane. The KPI configuration information may be used to produce the graph lane data. The KPI configuration information may specify how to obtain KPI values and/or states of the one or more KPIs (e.g., by specifying a KPI search query data or information on how to locate and access stored KPI values and/or states).


At block 7222, the computing machine may determine a KPI criterion for the one or more KPIs associated with the graph lane based on fluctuations in the associated one or more KPIs during a specified period of time (e.g., a first period of time). The KPI criterion may be determined based on the KPI configuration information, the graph lane information or a combination of both. To determine the KPI criterion, the computing machine may analyze fluctuations of the associated one or more KPIs to identify patterns. The patterns may be based on fluctuations in the state of the associated one or more KPIs or fluctuations in the values of the associated one or more KPIs. As discussed above in regards to FIGS. 29-34, a KPI state (e.g., normal, warning, critical) may be defined by one or more thresholds that define a range of KPI values and values within the range may be associated with the corresponding state.


In one example, a graph lane may illustrate a plurality of KPI states corresponding to the multiple KPI values of a KPI, and the fluctuations in the KPI may be determined based on a proportion of time the KPI is in any of the plurality of KPI states. The plurality of KPI states may be presented visually in the graph lane. For example, when the KPI values are within a first, second and third range (e.g., normal, warning, critical), the graph may be green, yellow and red respectively. The proportion of time the KPI is in each state may be determined by identifying the first period of time. In one example, the first period of time may be a period of time that is common for the set of graph lines and corresponds to the duration of time represented by the set of graph lanes. In another example, the first period of time may be a user-selected duration of time, which may be a subset of the duration of time represented by the set of graph lanes. The computing machine may then calculate the duration of time the KPI is in any of the multiple states. This may involve calculating the duration of time the KPI was in each of the states and comparing the duration of time to the first period of time to identify a proportion. For example, if the first time period is 10 hours and the KPI was in a low state for a total of 7 hours, a warning state for 1 hour and a critical state for 2 hours, the proportion of times would be 70% normal (7 hr/10 hr), 10% warning (1 hr/10 hr) and 20% critical (2 hr/10 hr). The proportion may be defined with respect to percentages, ratios, total values or other numeric or non-numeric representations.


In another example, fluctuations in the KPI may be based on fluctuations of KPI values, and determining the KPI criterion may be based on a statistical distribution of the KPI values during the first period of time. The statistical distribution of KPI values may identify patterns in the KPI values or in changes to the KPI values over time. The statistical distribution may take into account averages, medians, and deviations in the values. The statistical distribution may also identify trends in the KPI values. For example, the statistical distribution may identify the rate of change of the KPI values over time, which may include both positive rates, in which case the KPI values are increasing in value as well as negative rates where the KPI values are decreasing in value. The statistical distribution may also account for variations in the rates of change (e.g., acceleration of KPI values). This may be useful because some problems may be identified by a steady increase in KPI values, which may represent a change in a linear manner (e.g., constant acceleration), while other problems may be identified by a rapid increase in KPI values, which may represent a change that accelerates. The differences in trends may be used to identify and distinguish fluctuations of the KPIs.


At block 7223, the computing machine may determine if there are other graph lanes in the set of graph lanes. If there are additional graph lanes, the computing machine may branch back to block 7221 to identify the KPI criterion for another graph lane. If there are no more graph lanes, the computing machine may proceed to block 7224 to combine information for the graph lanes into a new correlation search.


At block 7224, the computing machine may generate an aggregate triggering condition using KPI criteria determined for the plurality of KPIs associated with the graph lanes. Once the computing machine has determined the fluctuations of the one or more KPIs associated with the graph lane, the computing machine may convert these fluctuations into logical statements to be used as a KPI criterion. There may be multiple logical statements and the logical statements may be organized into a series or sequence of logical statements that resolve to true or false. A simplified example of a KPI criterion may include logical statements that evaluate to “True” when the latency of a network is between 250 and 350 milliseconds. The KPI criterion derived from each graph lane may be used to generate an aggregate triggering condition. The aggregate triggering condition may contain KPI criteria corresponding to the set of graph lanes. In addition to analyzing the fluctuations in the KPI, the computing machine may also gather KPI configuration information associated with each graph lane.


At block 7225, the computing machine may create a search component for producing KPI values and/or KPI states of the KPIs for a time window defined by the duration of the first period of time. In some implementations, the search component includes search-processing language representing a query to produce KPI values and/or KPI states of the KPIs for a time window defined by the duration of the first period of time.


At block 7226, the computing machine may add the aggregate triggering condition and the search component to the definition of the correlation search. The search component may identify the plurality of KPIs and specify how to obtain KPI data (e.g., KPI values and/or states) of each of the plurality of KPIs for a given time period (e.g., by including a query to search the data store having the KPI data or by including a KPI search query to produce KPI values of each KPI over the given time window). In some examples, the aggregate triggering condition may be associated with a tolerance range editable by a user. The tolerance range may affect how precisely the aggregate triggering conditions are evaluated, for example, a tolerance range of 10% may consider values within 10% of one or more thresholds in the KPI criterion to satisfy the KPI criterion. The tolerance range may allow a new correlation search to account for variations between occurrences of a situation (e.g., problem) to optimize the ability of the correlation search to correctly identify the same or similar situation at another point in time. The tolerance range may be based on relative values, absolute values or percentage values. For example, the tolerance level may be plus or minus N percent, wherein N is 0, 1, 5, 10, or other percentage value.


A user may modify the tolerance level to enhance the ability for the correlation search to identify other similar situations without producing excessive false positives. In one example, the tolerance range may be set to 0 and the correlation search may identify only one instance of the same situation. When the user customizes the tolerance range to a value of 1%, 5% and 10%, the correlation search may respectively identify 2 occurrences, 5 occurrences and 15 occurrences of similar situations within the past year. The user may know from experience that there have been five similar occurrences. Therefore, the use of the 1% tolerance range may not be optimal because it detects only two of the five occurrences. The use of a 10% tolerance range may also not be optimal because it detected 15 occurrences, which means at least 10 are false positives. Therefore, a user may choose the 5% tolerance level because it most closely reflects the actual number of situations that occurred. In one example, the user may be provided with a graphical user interface for displaying the number of similar situations that a correlation search identifies for each user selected tolerance range.



FIG. 72C is a flow diagram of an implementation of a method 7230 for performing a correlation search based on the correlation search definition to identify the same or similar situations (e.g. problems), in accordance with one or more implementations of the present disclosure. Method 7230 may begin at block 7231 when the computing machine may access a correlation search definition. In one example, the correlation search definition may be created by a client machine and may be stored on a remote machine (e.g., database server). Therefore, the computing machine may access the remote machine to access the correlation search definition and proceed to block 7232.


At block 7232, the computing machine may run a search component to obtain KPI values and/or KPI states of the KPIs for a time window defined by the duration of the first period of time. In one example, the search component may access stored KPI data, which may be raw KPI values (e.g., KPI points) or KPI states derived from the KPI values. Alternatively, the search component may obtain the KPI values by executing a KPI search query, which may be a combined search query to produce KPI values of all KPIs associated with the set of graph lanes, or by executing multiple search queries that each can produce KPI values of a distinct KPI associated with a respective graph lane from the set of graph lanes. In either example, the KPI values may be derived from time-stamped events that may each include at least a portion of raw machine data. The set of KPI values may also be derived from machine data at least in part using a late-binding schema.


At block 7233, the computing machine may evaluate the aggregate triggering condition in view of fluctuations in the KPI that occur during a second period of time. The duration of the second period of time is the same as the duration of the first period of time but the second period of time may be before the first period of time or after the first period of time. Evaluating earlier periods of time may allow the correlation search to identify previous problems, which may allow the user to increase their understanding of the problem and assist with identifying symptoms or causes of the problem. Evaluating later periods of time may allow the user to detect a subsequent problem prior to being informed by customers, which may allow the user to take remedial action to address the problem or keep the problem from getting worse.


Evaluating the aggregate triggering condition involves determining whether each of the plurality of KPIs satisfies a respective KPI criterion from the aggregate triggering condition during the second period of time. This determination may involve processing the KPI data obtained for each KPI at block 7232 to determine if the KPI data obtained for each KPI satisfies a KPI criterion of the respective KPI (from the aggregate triggering condition). In one example, the computing machine may iterate through each of the KPIs associated with the correlation search definition. In another example, the computing machine may evaluate the KPIs in parallel or distribute them to other machines to be evaluated in parallel.


If at block 7234, the computing machine determines that the aggregate triggering condition is not satisfied, method 7230 may end. If at block 7234, the computing machine determines that the aggregate triggering condition is satisfied, the method may proceed to block 7235.


At block 7235, the computing machine may perform an action and the action may be responsive to identifying another occurrence of the problem. The action may notify an entity (e.g., system or user) that the problem has occurred. The notification may be in the form of a notable event, which may be viewable in an event viewer (e.g., dashboard) and may be associated with a severity level. The action may also involve sending a message (e.g., email, text, RSS) or creating an incident ticket to alert a user (e.g. system administrator or IT manager). The message or incident ticket may include description information about the problem or contact information for the user that detected or addressed the problem in the past as well as potential root causes. The action may also involve taking remedial action without additional user interaction, such as for example, running a script or executing a command that resolves the problem (e.g., restarting a service, rebooting a machine).



FIGS. 73A-75B include exemplary GUIs for implementing the methods and systems discussed above and are described below using an example use case. The example use case involves a set of graph lanes corresponding to three interrelated services (e.g., a website service, an application service and a database service). Each graph lane may be associated with one or more KPIs and may graphically display KPI values and/or states of the associated one or more KPIs to enable a user to monitor the various aspects of the services. The user may utilize the graph lanes to identify a root cause of a problem, and then the user may select a button to create a correlation search to alert the user if the problem re-occurs.



FIGS. 73A-F depict exemplary GUIs 7350A-E that illustrate how a user may modify the set of graph lanes to diagnose a problem. As shown by FIG. 73A, a user may provide input via GUI 7350A to display a set of graph lanes 7352A-G that correspond to multiple services including a website service, a database service and an application service. The website service may represent an e-commerce website, which may be a customer-facing website tracked by multiple KPIs associated with multiple graph lanes. The multiple graph lanes may include graph lane 7352A corresponding to shopping cart transactions and graph lane 7352B corresponding to a number of unique visitors. The database service may support the website service and the application service and may be tracked using multiple KPIs associated with multiple graph lanes. The graph lanes may include, for example, graph lane 7352C corresponding to database storage space, and graph lane 7352D corresponding to memory usage. The application service may provide business logic to support the transactions of the eCommerce website and may be accessible to the website via an application programming interface (API). The application service may be tracked by multiple KPIs associated with multiple graph lanes. The graph lanes may include graph lane 7352F corresponding to application server latency in milliseconds and graph lane 7352G corresponding to time out errors.


Referring to FIG. 73B, a user may access GUI 7350B in response to receiving a message from a support member of an organization indicating that customers are unable to complete transactions and are complaining via phone. When GUI 7350B is initially invoked, it may only include a few graph lanes (e.g., 7352A-B) that graphically represent the multiple KPIs associated with the website service. The user may visually inspect the graph lanes to confirm that there is a large number of shopping cart transactions. The user may activate a threshold indication feature, which may display the KPI states that correspond to the KPIs over time. This may provide a visual indicator 7353A to illustrate that the KPI values are within a normal state and visual indicator 7353B to illustrate that the KPI values are within a critical state. Activating this feature may indicate that the KPI transitioned from normal state to a warning state in the recent past (e.g., within the past hour).


The user may hypothesize that the problem may have occurred because customers are re-attempting transactions and may test the hypothesis by analyzing graph lane 7352B that displays the number of unique visitors. Both graph lanes 7352A and 7352B may be calibrated to the same time scale and may allow the user to compare the graph lanes side by side to see whether the number of unique users increased during the time that the number of shopping cart transactions increased. As shown, the increase in the number of shopping cart transactions may not have been caused by an increase in the number of visitors because the number of unique visitors remains constant (e.g., substantially horizontal). This may indicate that the same customers are repeatedly performing shopping cart transactions.


Referring to FIG. 73C, the user may add graph lane 7352C corresponding to the database service to investigate whether the problem could be related to the database service. The user may inspect graph lane 7352C to review database storage utilization and may see it is within a normal state.


Referring to FIG. 73D, the user may then add graph lane 7352D-G and see that the KPI displaying the application server latency (e.g., graph lane 7352F) has entered the critical state. The user may also add a graph lane that can display some of the error messages (not shown). This graph lane may be unique compared to the other graph lanes because it may display textual information corresponding to one or more error messages. The user may customize the graph lane to filter the errors by type and to discover they relate to the network time-out messages.


Together the graph lanes may enable the user to determine that the network used by the application service to communicate credit card transactions with the credit card companies has malfunctioned due to a component failure in the network. The user may review the graph lanes displayed to identify which graph lanes relate to the problem and which ones do not and remove the graph lanes that are unrelated. As shown in FIG. 73E, the user may select a time period portion 7355 that includes a portion of the time the problem was occurring. Selecting time period portion 7355 may cause GUI 7350E to zoom-in to the selected portion.


Referring to FIG. 73F, the display may now include the appropriate graph lanes and may be focused (e.g., zoomed-in) on the appropriate time period and the user may wish to configure the system to monitor for similar problems and to perform an action (e.g. alert) if the problem occurs again. This may be accomplished by selecting graphical element 7354, which will initiate the creation of a correlation search definition. The correlation search definition may be based on the selected period of time and currently displayed graph lanes (e.g., graph lanes 7352A-G). The correlation search definition may automatically be configured to generate an alert when the KPI data (e.g., KPI states or KPI values) has a pattern of fluctuations that is similar to that currently displayed.


The method of creating the correlation search definition may be the same or similar to methods 7320 of FIG. 72B. In this example, the method may identify the time period being displayed by the set of the graph lanes and iterate over each of the displayed graph lanes. As the method iterates over the graph lanes, it may identify fluctuations in the KPI by calculating the proportion of the time period that KPI was in a particular state or the statistical distribution of the KPI values during the time period. The method may convert proportions into a series of logical statements resolving to true when the proportion is satisfied. The logical statements may be stored as KPI criteria in an aggregate triggering condition within a correlation search definition.


Referring now to FIG. 74, the system may display GUI 7400 to enable a user to provide identification and configuration information to be associated with the correlation search definition. GUI 7400 may be initiated in response to the user request to create a new correlation search and may be displayed before, after or during the creation of the correlation search definition. GUI 7400 may include a search name field 7401, a description field 7403, a schedule type field 7405, a frequency field 7407, a time period field 7411 and a severity field 7413.


Search name field 7401 and description field 7403 may enable the user to enter a name and a description that may explain how and what the correlation search is used to identify. In the example use case, the user may set the name of the correlation search to “Web Service Down” and add description information describing the problem and potential root causes as well as contact information for the network administrator that may be able to address the problem.


Schedule type field 7405 and frequency field 7407 enable a user to specify when the correlation search should be run to check for the problem. Schedule type field 7405 may allow the user to select between a “Basic” schedule in which the user may provide a repeated cycle, for example, every 30 minutes or select a “Cron” schedule in which case the user may select a specific time within a day, week, year or other duration to run the correlation search.


Time period field 7411 may enable a user to set the time period to a specific duration of time. The time period field 7411 may default to the duration of time viewable in the graph lane and may allow the user to modify the value to a smaller or larger period of time.


Severity field 7413 may enable a user to set the default severity of the alert. This may correspond to the severity of the problem, which the correlation search is configured to detect. The selected severity may be associated with a notable event created as a result of the correlation search.



FIGS. 75A and 75B may display additional GUIs 7510 and 7520 that may be presented to the user during or after the creation of the correlation search definition and may be used to customize portions of the correlations search definition. GUI 7510 and 7520 may be included as part of a correlation search wizard. For example, GUI 7510 may be the first GUI of a correlations search wizard and GUI 7510 may be the last GUI of the correlations search wizard. The correlation search wizard may be pre-populated with statement 7522 (e.g., in search processing language) specifying the KPI criteria 7522, the search component and other information derived from the set of graph lanes and may allow the user to modify the information to be included in the correlation search definition.


As discussed herein, the disclosure describes various mechanisms for creating a correlation search definition based on one or more graph lanes. The disclosure describes graphical user interfaces that enable a user to select specific graph lanes and a specific duration of time on which the correlation search should be based. The disclosure also includes methods for running the correlation search to identify similar problems that occur at other points in time.


Topology Navigator for it Services


Implementations of the present disclosure may include a graphical user interface (GUI) for a topology navigator that enables a user to view multiple services associated with an environment such as multiple IT services associated with an IT environment. The topology navigator GUI (also referred to as a “topology navigator”) may include multiple display components for displaying information about the services. A first display component may be a topology graph component that displays the multiple IT services as service nodes within an interconnected graph. The connections within the graph may represent dependencies between the services and each service node may include one or more visual attributes representing one or more characteristics of the service such as performance characteristics of the service. An in-focus service node refers to a node that is highlighted via one or more visual attributes to indicate that it is a central point of focus within the topology graph component. A second display component may be a details display component that provides information for a service represented by the in-focus service node, which may be selected by the user. The information may include one or more key performance indicators (KPIs) associated with the service represented by the in-focus service node, or one or more historic selections or actions by the user in regard to the in-focus service, or some other information related to the in-focus service.


The topology navigator may enable a user to visually inspect characteristics such as the performance of multiple services to identify one or more dependent services with interesting characteristics (e.g., low performance). A user may navigate through the service nodes by selecting one or more service nodes. When a node is selected, it may become the in-focus node at which point both the topology graph component and the details display component may be updated to correspond to the new in-focus node. The topology graph component may be updated to display the selected node as the in-focus service node, such as by placing it at a certain location, and re-arrange, rebuild, or reformat the graph to display dependencies of the service represented by the new in-focus service node. The details display component may be updated to display the information associated with the service represented by the new in-focus service node. The user may repeatedly select different service nodes to navigate through the dependent services to view their performance and/or other characteristics and information. In one example, the topology navigator may be used in collaboration with the deep dive GUI, discussed in regards to FIGS. 50A-70. The deep dive GUI may include multiple time-based graph lanes visually illustrating how one or more services are performing during a period of time and the topology navigator may enable a user to add more time-based graph lanes to illustrate performance of additional services during the same period of time.


An advantage of the topology navigator as described in a context of service performance is that it may enable the user to investigate the performance of multiple services by navigating through its dependent services to detect or diagnose abnormal activity (e.g., performance degradation, system malfunction) or to identify a performance pattern of interest (e.g., increased usage of one or more services by end users). In one example, the abnormal activity may be a decrease in performance of a service caused by malfunctioning resources, overburdened resources, or non-optimized resource configurations. As the user navigates through the service nodes, the topology navigator may visually illustrate the aggregate performance of the one or more dependent services to enable the user to identify which dependent services have abnormal activity and may be adversely impacting a service of interest to the user.



FIG. 75C illustrates an example of a graphical user interface for a topology navigator 75300 that displays multiple service nodes and information related to the service nodes, in accordance with one or more implementations of the present disclosure. Topology navigator 75300 may include a topology graph component 75310, a details component 75320 and a service control element 75330.


Topology graph component 75310 may include a graphical visualization (e.g., graph) that illustrates the dependencies between the services. Topology graph component 75310 may include service nodes 75312A-F and connections 75314A-E. Service nodes 75312A-F may graphically represent multiple services configured to operate in the IT environment. Service nodes 75312A-F may be any shape, such as a circle, triangle, square or other shape capable of representing a particular service or set of services, and the shape of any node may be considered one of its visual attributes. Each service may be provided by one or more entities and may be defined by a service definition that may associate entity definitions for the entities that provide the service. Service definitions are discussed above in regards to FIGS. 11-17B. As shown in FIG. 75C, the services may include one or more web servers that function to provide IT services such as web hosting services (e.g., “HR Portal”, “Customer Portal”), database management services (e.g., “DB Site 1” and “DB Site 2”), or any other IT services (e.g., “Email”).


Connections 75314A-E represent the dependencies between the services of the IT environment. In one embodiment, a dependency is a relationship in which one service relies on or interacts with another service during its operation and may be bidirectional or unidirectional. A bidirectional dependency relationship is one where two services rely on one another such that an aspect of a first service relies on an aspect of the second service and an aspect of the second service relies on an aspect of the first service. With a bidirectional dependency, a performance problem with the operation of one service may adversely affect the other service. A unidirectional dependency relationship is one in which a first service relies on a second service but the second service may not rely on the first service. With a unidirectional dependency, a performance problem with the operation of second service may adversely affect the first service but a performance problem with the first service may not adversely affect the first service. In one embodiment, only unidirectional dependencies exist between services. In one embodiment, a dependency between services is related to other than performance, such as a sequencing or queuing dependency.


Details display component 75320 may display information related to a service represented by an in-focus service node within topology graph component 75310 and may be configured to update the information when different service nodes are in focus. As shown in FIG. 75C, service node 75312D is in-focus and as a result, details component 75320 displays information related to the corresponding service (e.g., service “Application Servers”). The information specifies multiple KPIs (KPIs 75322A-C) related to the service represented by the in-focus service node and provides details about these KPIs in the form of KPI widgets 75324A-C. Each KPI may indicate how a service is performing at a point in time or over a period of time and may comprise multiple KPI values derived from machine data pertaining to one or more entities providing the service. KPI widgets 75324A-C are graphical visualizations that may illustrate the KPI values and changes in the KPI values over time and will be discussed in more detail in regards to FIG. 75E.


Service control element 75330 may display the service that is represented by the current focal point (e.g., in-focus service node) within the topology navigator and may allow the user to select other service nodes to be the focal point. Service control element 75330 may be a graphical control element that transitions the in-focus service node identification from a current in-focus service node to another service node. In one example, service control element 75330 may visually identify an initial in-focus service node, which may be related to the service identified by a user when invoking the graphical user interface and may enable the user to transition the focus back from a subsequent in-focus service node to the initial in-focus service node without navigating through intermediate service nodes.



FIG. 75D illustrates an exemplary topology graph component 75410 that includes visual attributes that illustrate the aggregate KPI values (e.g., heath scores) of the service nodes, in accordance with one or more implementations of the present disclosure. This may be advantageous because it may enable the user to visually inspect multiple service nodes before, during or after navigation to identify services that are performing in a manner of interest to the user (e.g., low performance). Topology graph component 75410 may provide a graphical view of the dependencies and may enable the user to navigate between dependent nodes similar to topology graph component 75310 but may also include service nodes 75412A-F with visual attributes 75416A-C positioned in a service node arrangement 75417.


Service node arrangement 75417 may be an arrangement of service nodes where the positions or relative positions of service nodes 75412 A-F indicate the direction of dependency relationships of their respective services. As discussed above, the connections between service nodes indicate an existence of a dependency relationship but the connections may not indicate the direction of the dependency. In service node arrangement 75417, each of the connections (e.g., 75414A) indicates a unidirectional dependency and the position of a particular service node relative to the in-focus service node may indicate the direction of the dependency relationship. A particular service node may be positioned in one direction (e.g., above, below, left, right, angled) relative to the in-focus service node to indicate that a service represented by the particular service node depends on (e.g., is impacted by) a service represented by the in-focus service node and may be positioned in a different direction (e.g., opposite direction) to indicate the service represented by the particular service node is depended on by (e.g., impacts) the service represented by the in-focus service node. In one example, a first service node may be positioned above the in-focus service node to indicate that the service represented by the in-focus service node is dependent upon a service represented by the first service node, and a second service node may be positioned below the in-focus service node to indicate that a service represented by the second service node is dependent upon the service represented by the in-focus service node. In other examples, a third service node may be positioned below the in-focus service node to indicate that the service represented by the in-focus service node is dependent upon a service represented by the third service node and a fourth service node may be positioned above the in-focus service node to indicate that a service represented by the fourth service node is dependent upon the service represented by the in-focus service node. Positioning that defines the dependency direction may be predetermined or configurable.


As shown in FIG. 75D, service node arrangement 75417 may base the dependencies on direction and also organize the service nodes into multiple levels, such as first level 75418A, second level 75418B and third level 75418C which may be substantially parallel with one another. In-focus service node 75412D may be located in second level 75418B (e.g., middle level), first level 75418A may be in one direction (e.g., below) relative to the in-focus service node 75412D and the third level 75418C may be in the opposite direction (e.g., above) relative to the in-focus service node 75412D. In one example, the service nodes within the first and third levels may be positioned adjacent to one another along straight lines (e.g., rows) and the straight lines of each level may be parallel to one another so that the distance between a level (e.g., row of service nodes) and the level of the in-focus service node is a constant distance. In other examples, the distance between the service nodes and the in-focus service node may vary depending on the amount of the dependency between a service represented by a particular service node and the service represented by the in-focus service node. The dependency amount may be assessed based on the quantity of interaction between the service represented by the in-focus service node and a service represented by the particular service node relative to its interaction with other services, or other similar relationship metrics.


Visual attributes 75416A-C may affect the appearance of a service node to illustrate information related to the underlying service. Visual attributes 75416A-C may be based on color, shape, size, pattern, shade, overlay or other attribute capable of distinguishing nodes. Visual attribute 75416A may relate to a fill of a service node and may indicate a value of an aggregate KPI for the corresponding service. As discussed above in regards to FIG. 32-34A, the aggregate KPI may characterize the performance of a service by aggregating the values of multiple KPIs associated with the service, and the multiple KPIs may include all or substantially all of the active KPIs for the service. The aggregate KPI may indicate the performance of the service at a point in time or over a period of time. Each service may be associated with an aggregate KPI and the value of the aggregate KPI or its derivative may be illustrated by the fill of the corresponding service node. As shown in FIG. 75D, the service nodes may be different colors (e.g., red, yellow, green) which is represented in the figures as variations in the fill pattern. For example, service node 75412D has a visual attribute 75416A which may be associated with an aggregate KPI value within a middle range (e.g., represented by yellow fill), which may indicate the performance of the corresponding service is in a warning state. By comparison, service nodes 75412E and 75412F may include a visual attribute indicating that aggregate KPI values of their corresponding services are within a low range (e.g., represented by green fill) or high range (e.g., represented by red fill) respectively. This may enable a user in one embodiment to visually identify which of the dependent services has lower performance (e.g., red service node 75412F) and therefore enhances the user's ability to identify which dependent service is adversely affecting the service represented by the in-focus service node 75312D.


Visual attribute 75416B may be any visual attribute that identifies a service node as an initial in-focus service node. The initial in-focus service node may be the service node that is in-focus when the graphical user interface is initially invoked and may have been identified by the user prior to invoking topology navigator 75300. Visual attribute 75412B may be any visual attribute, such as an overlay, that modifies the service node to enable the user to identify the service node as an initial in-focus service node.


Visual attribute 75416C may be any visual attribute indicating that a service node is an in-focus service node (e.g., service node 75412D). Visual attribute 75416C may include associating a halo, highlighting, bolding or any other visual indicator that would signify that the service node is in-focus, for example, that it has been selected by a user.



FIG. 75E illustrates an exemplary details display component 75520 for topology navigator 75300, in accordance with one or more implementations of the present disclosure. Details display component 75520 may be similar to details display component 75320 of FIG. 75C and may display information related to a service represented by an in-focus service node. Details display component may include a title 75521, KPI names 75522A-Z, KPI widgets 75524A-Z and selection element 75525A-Z.


Title 75521 may identify the service and type of information being displayed. As shown, the service may be the “Application Servers” service and the type of information may be related to “KPI” information. Other types of information may be included within details display component 75520, such as any information related to the service represented by the in-focus service node (e.g., information from the service definition, entity definition, KPI definition or other source of related information).


KPI names 75522A-Z may identify one or more KPIs associated with the service represented by the in-focus service node. Each KPI may indicate a different aspect of how a respective service provided by one or more entities is performing at a point in time or during a period of time. In one example, all the KPIs associated with a service may be displayed within details display component 75520. In other examples, only a subset of the KPIs associated with a service may be displayed, such as only those KPIs previously selected by the current user or within a certain state or having a certain range of values. KPIs 75522A-C may be listed within details display component 75520 and may be associated with KPI widgets.


KPI widgets 75524A-Z may illustrate information about KPI 75522A-Z to enable a user to identify one or more relevant KPIs from the multiple KPIs listed in details display component 75520. In one example, KPI widgets 75524A-Z may be spark line widgets as shown. Spark line widgets are discussed in regards to FIG. 44 and may include portion 75526A and portion 75526B. Portion 75526A may include a graph (e.g., line graph, bar chart) that includes multiple data points and may be colored using a color representative of the state (e.g., normal, warning, critical) of which a corresponding data point falls into. Portion 75526B may include a numeric value that is associated with the KPI, where the numeric value may be a specific data point or a statistical value (e.g., average, median) derived from the one or more data points.


KPI widgets 75524A-Z may also include or be adjacent to one of selection elements 75525A-Z. The existence of a selection element may indicate to the user that the KPI may be selected to be added to another display component. In one example, selection element may include a plus sign and in other examples it may include another control element (e.g., radio button, check mark).


Details display component 75520 may also include a tabbed interface 75528 that may provide a user access to multiple tabs having different types or arrangements of information. Tab 75529 may include the KPI information discussed above and other tabs may include other information or similar information in different formats. For example, the KPIs may be displayed (e.g., listed) in a table format with rows and columns that may enable a user to sort or rearrange the information.



FIG. 75F illustrates an exemplary graphical user interface 75600 having a topology navigator 75300 and deep dive component 75640 including multiple time-based graph lanes, in accordance with one or more implementations of the present disclosure. Topology navigator 75300 may enable a user to navigate dependent services and to identify KPIs to be added as time-based graph lanes to the existing time-based graph lanes in deep dive component 75640. Topology navigator 75300 may include a first display component 75610 (e.g., topology graph component) and a second display component 75620 (e.g., details display component)


Deep dive component 75640 may be similar to a deep dive graphical user interface discussed in regards to FIGS. 50A-70 and may include multiple time-based graph lanes 75642A-D. Time-based graph lanes 75642A-D may provide a graphical visualization of KPI values over a time range. Each time-based graph lane 75642A-D may have different graph styles or colors or the same graph styles and colors. For example, some graph lanes may include line graphs whereas others may include bar charts. Time-based graph lanes 75642A-D may correspond to different services or may correspond to the same services and may be calibrated to the same time range and time scale. The time range and scale may be reflected by a time axis that runs parallel to at least one graph lane. The time axis may include an indication of the amount of time represented by the time scale and an indication of the actual time of day represented by the time scale. In one implementation, a bar running parallel to the graph lanes includes an indication of the amount of time represented by the time scale (e.g., 1 hour). Time-based graph lanes 75642A-D may be the same or similar to time-based graph lanes discussed elsewhere in this disclosure.


First and second display components 75610 and 75620 may be collectively referred to as topology navigator 75300 and may interact with deep dive component 75640 to enable a user to affect the content and/or appearance of deep dive component 75640, such as by adding KPIs or other information to deep dive component 75640. For example, a user may navigate through multiple dependent services and select one of the dependent services using first display component 75610. The user may also select a KPI (e.g., KPI 75322A) from a list of KPIs within second display component 75620. In response to selecting a KPI, deep dive component 75640 may be updated to include a time-based graph lane corresponding to the KPI selected by the user.


Topology navigator 75300 may include a control element 75642 that expands (e.g., invokes, maximize, restores) or hides (e.g., minimize, close) topology navigator 75300. As shown, the topology navigator 75300 may be in an expanded mode and control element 75642 may appear as an arrow. After the topology navigator 75300 is expanded, the arrow may point toward the right (e.g., greater-than symbol) and enable the user to select the control element 75642 to close topology navigator 75300. When topology navigator 75300 is minimized or hidden, the arrow may point toward the left (e.g., less-than symbol) and enable the user to expand topology navigator 75300.


Topology navigator 75300 may be positioned near to or adjacent to deep dive component 75640. As shown in FIG. 75F, topology navigator 75300 is located to the right of deep dive component 75640 and therefore in the right portion of graphical user interface 75600. In other examples, it may be above, below, to the left or any other position relative to deep dive component 75640. In an alternative GUI layout, first and second display components 75610 and 75620 may be on opposite sides of deep dive component 75640.



FIGS. 75G and 75H depict flow diagrams of exemplary methods 75700 and 75800 for creating and updating a topology navigator, in accordance with one or more implementations of the present disclosure. Method 75700 is a method of displaying the topology navigator and updating its display components in response to user input, in accordance with some aspects of the present disclosure. Method 75800 is directed to utilizing the topology navigator for adding time-based graph lanes to a deep dive component, in accordance with some aspects of the present disclosure. Methods 75700 and 75800 may be performed by processing devices that may comprise hardware (e.g., circuitry, dedicated logic), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. Methods 75700 and 75800 and each of their individual functions, routines, subroutines, or operations may be performed by one or more processors of a computer device executing the method.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks, steps). Acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media. In one implementation, methods 75700 and 75800 may be performed to produce a GUI as shown in FIGS. 76C-F.


Referring to FIG. 75G, method 75700 may be performed by processing devices of a server device or a client device and may begin at block 75710. At block 75710, the processing device may cause for display a graphical user interface with a first display component depicting multiple service nodes and their dependencies and a second display component depicting information related to a service represented by an in-focus service node. Each service node within the first display component may represent one or more services. A service may be provided by one or more entities and each entity may correspond to an entity definition (e.g., FIG. 10B and FIGS. 5-10A) having an identification of machine data from or about the entity. Each service may correspond to a service definition (e.g., FIG. 17B and FIGS. 11-15) associating the entity definitions for the entities that provide the service and having a key performance indicator (KPI) defined by a search query (e.g., FIG. 34D) that derives a value indicating performance of the service from machine data identified in the associated entity definitions.


The first display component (e.g., topology graph component 75310) may graphically depict the plurality of service nodes as a connected graph of nodes. The graph may be fully connected so that each service node is connected to at least one other service node or it may be partially connected where there may be one or more service nodes that are not connected to another node. In one example, the connected graph may be limited to service nodes having a distance of one from the in-focus service node. In another example, the connected graph may be limited to service nodes having a distance of two or less from the in-focus service node. Accordingly, in such an example, the connected graph may display only a localized portion of a larger, logical graph that includes the many services and interdependencies defined for an environment. Various embodiments could display varying portions or the whole of such a logical graph.


The position of each service node relative to the in-focus service node or another attribute may indicate a direction of a dependency between a service represented by the in-focus service node and a service represented by the respective service node. In one example, a first service node may be positioned above the in-focus service node to indicate that the service represented by the in-focus service node is dependent upon the service represented by the first service node and a second service node may be positioned below the in-focus service node to indicate the service represented by the second service node is dependent upon the service represented by the in-focus service node. In another example, service nodes may be displayed in multiple levels of interconnected service nodes. A first level may include service nodes representing services that depend upon the service represented by the in-focus service node and a second level may include only the in-focus service node. A third level may include service nodes representing services that the service represented by the in-focus service node depends upon.


The service nodes within the first display component may include one or more visual attributes. In one example, a visual attribute may indicate a value of an aggregate key performance indicator (KPI) characterizing activity (e.g., performance, health) of the respective service at a point in time or during a period of time. The value of the aggregate KPI may be calculated in view of multiple KPI values, each of the multiple KPI values may be derived by executing a search query associated with a respective KPI. The visual attribute may be associated with each and every service node displayed in the first display component or may only be associated with a subset of the service nodes, such as only the dependent nodes and the in-focus service node. In another example, a visual attribute may identify one of the service nodes as an initial in-focus service node. The initial in-focus service node may be a service node that is in-focus when the graphical user interface is first invoked. This may be a default service node (e.g., root node) or may be identified by a user prior to invoking the first display component. In yet another example, a visual attribute may distinguish the in-focus service node from other service nodes, wherein the visual attribute comprises a halo around the in-focus service node.


The second display component (e.g., display component 75320) may include information related to the service represented by the in-focus service node. The information may include multiple key performance indicators (KPIs) associated with a service represented by the in-focus service node. In one example, the information within the second display component may include multiple spark line widgets and each spark line widget may include a graph (e.g., line graph, bar chart) of the respective KPI. The spark line widget may be an image (e.g., thumbnail image) that is static or may be an image that is updated continuously or periodically with different or additional information. In other examples, the information within the second display component may include historical information related to the one or more KPIs, such as information that indicates the KPI has been previously selected for inclusion within a display component.


At block 75712, the processing device may receive a user selection of a service node. The user may select a service node using a variety of different methods. A first method may involve the user selecting one of the nodes from the graph. This selection may involve clicking a mouse or tabbing through the nodes until the appropriate node is selected. A second method may involve the user selecting a graphical control element (e.g., service control element 75330 of FIG. 75C) at which point the graphical control element may provide a list of services and enable a user to select one of the services to transition focus to the corresponding service node. Once a selection has been made, the method may proceed to block 75714.


At block 75714, the processing device may transition the in-focus service node identification within the first display component from a first service node to a second service node in response to the user selection. The in-focus service node identification may refer to visually identifying a service node as a central point of focus using the visual attributes of the service node and/or the position of the service node within the first display component. Transitioning the in-focus service node identification may involve the first display component updating the visual attributes and/or the location of the one or more service nodes. For example, the first display component may update the first service node to remove the focus visual attribute and may update the second service node to add the focus visual attribute. In another example, transitioning the in-focus service node identification may involve repositioning the service nodes and adding or removing service nodes. The first display component may reposition the second service node to a middle region of the first display component and reposition the first node to be above or below depending on its dependency relationship. The first display component may remove service nodes that are not within a distance of one and therefore do not have a direct dependency relationship with the second service node. The first display component may add service nodes that are within a distance of one and therefore have a direct dependency relationship with the second service node. In one example, method 75700 may identify which service nodes to add and remove by analyzing one or more service definitions associated with the service represented by the in-focus service node to determine dependencies between this service and other services. For example, the service definition may include links to services that have a dependency relationship with the service represented by the in-focus service node.


At block 75716, in response to the user selection of a new in-focus service node, the processing device may update the second display component with information related to a service represented by the new in-focus service node (e.g., other service node). Updating the second display component may involve replacing the KPIs associated with a previous in-focus service node with the KPIs associated with the new in-focus service node.


At block 75718, the processing device may check to see if it has received a user selection of another service node. If the processing device has received another user selection, the method may branch back to block 75714 and transition to the newly selected service node. Responsive to completing the operations described herein above with references to block 75718, the method may terminate.


Referring to FIG. 75H, method 75800 presents another flow diagram of an exemplary method for using the topology navigator to, for example, investigate abnormal activity of a service and identify a KPI of a dependent service to be added to a list of time-based graph lanes, in accordance with one or more implementations of the present disclosure. Method 75800 may be similar to method 75700 but may include within the graphical user interface a third display component (e.g., a deep dive component 75640) that displays multiple KPIs as time-based graphical visualizations. Method 75800 may be performed by processing devices of a server device or a client device and may begin at block 75810.


At block 75810, the processing device may receive user input identifying a service and requesting a graphical user interface. The user input may be derived from a different graphical user interface or information received from a command line interface (CLI) or configuration file. In one example, the user input to identify the service and to request a GUI may be the same action. For example, a user may select (e.g., click or double click) a control element that represents a service on another graphical user interface (e.g., Glass Table) and the location of the selected action may identify a service and the type of selected (e.g., double click) action may be the request. In another example, the user input identifying the service may be separate from the user input requesting the graphical user interface. For example, the user may identify a service with a first action (e.g., click) and may request the graphical user interface with a second action. The first and second actions may be within different GUIs or portions of GUIs.


At block 75811, the processing device may cause for display a GUI with multiple time-based graph lanes (e.g., Deep Dive GUI) in response to the user input. The GUI may be similar to the graphical user interface discussed in regards to FIG. 75F and may include multiple time-based graph lanes that provide graphical visualizations of multiple KPI values over a time range. The GUI may include a bar along a portion of the GUI, for example, along the right portion of the GUI. The bar may represent the topology navigator in a minimized or hidden mode.


At block 75812, the processing device may receive a user request to display the topology navigator. The user request may be initiated when a user selects a control element (e.g., arrow with appearance of a less-than symbol) and may result in the topology navigator expanding. The user may also select the control element again to close or minimize the topology navigator. The control element may be advantageous because expanding and minimizing the topology navigator may alter the size of the respective display elements and may provide for better use of the available display area.


At block 75813, the processing device may display the first and second display components of the topology navigator, which includes multiple dependent service nodes with visual attributes characterizing the performance of the respective services. This block may be similar to block 75710 of method 75700 and may include displaying the service node identified by the user as well service nodes whose services depend from or are impacted by the service represented by the identified service node. Each service node may include visual attributes that modify the fill of the service node to indicate the value of the respective aggregate KPI value. This may be advantageous because it may allow a user to visually inspect the performance of the service represented by the in-focus service node and its dependent services. It may also enable the user to identify one or more dependent services that may be causing a decrease in performance of the selected service.


At block 75814, the processing device may check if it received a first user selection identifying a dependent service node from the first display component. If the user has not identified another service node, the method may proceed to block 75817 where the processing server may receive a selection of a KPI. If the user has selected another service node, the method may proceed to block 75815 and block 75816.


At blocks 75815 and 75816, the processing device may update the first display component and second display component respectively. Block 75815 and 75816 may be the same or similar to blocks 75714 and 75716 of method 75700 and may be performed in parallel or sequentially. At block 75815, the processing device may update the first display component to transition the in-focus service node identification to the new selected node. At block 75816, the processing device may update the second display component to display multiple KPIs corresponding to the service represented by the current in-focus service node.


At block 75817, the processing device may receive a second user selection identifying a KPI from the second display component to be added to the multiple time-based graph lanes. As discussed in regards to FIG. 75E, the second display component (e.g., details display component 75320) may display multiple KPIs associated with the service of the in-focus service node. Each of the KPIs may be represented by a widget that illustrates the performance of the KPI. The second display component may be advantageous because it may allow a user to visually inspect the performance of the service of the in-focus service node by viewing the constituent KPIs associated with the service and enables the user to identify one of the KPIs to be added as a time-based graph lane for further analysis.


At block 75818, the processing device may prompt the user for configuration information for an additional graph lane. The prompt may be in the form of a lane customization GUI (e.g., dialog window) and may be the same or similar to GUI 5200 of FIG. 52. The lane customization GUI may indicate to the users that they are adding a new lane and may include multiple fields associated with a graph lane such as graph type, graph color, source and search query. The fields may be pre-populated with default values derived from the user-selected KPIs of the second display component and may allow the user to view or change the values. The user may then select to save or create the graph lane at which point the method may proceed to block 75819.


At block 75819, the processing device may add a graph lane for the identified KPI to the multiple time-based graph lanes of the third display component (e.g., Deep Dive display component). The graph lane may provide performance data for the KPI and may help the user to detect or diagnose abnormal activity (e.g., performance degradation, system malfunction) or to identify a performance pattern of interest (e.g., increased usage of one or more services by end users).


The newly added graph lane and the user selected KPI widget may have similarities and differences. Both the graph lane and the KPI widget may represent the same KPI and may include the same or similar data values, but the two may display the data values in different manners. In one example, the graph lanes may continuously or periodically update the visualization to illustrate changes in real time and the KPI widget may be a static image (e.g., thumbnail). In another example, the KPI widget may be configured similar to the corresponding graph lane and present and update the same information in the same manner.


As discussed herein, the disclosure describes a graphical user interface for a topology navigator that may enable a user to view multiple IT services associated with a user's IT environment. The topology navigator may include multiple display components for displaying information about the services. A first display component (e.g., topology graph component) may display multiple services as a graph of interconnected nodes and a second display component (e.g., details display component) may display information about one or more of the services. The topology navigator may enable a user to visually inspect the performance of multiple services and navigate through the multiple services to identify one or more dependent services having performance of interest (e.g., degraded performance) that may adversely affect a service of interest to the user. In particular, the user may navigate through service nodes representing the multiple services and select a service node representing a service of interest to the user, at which point the selected service node may become the in-focus service node. In response to the user selection, both display components may be updated to correspond to the new in-focus node. The second display component may then display KPIs associated with a service represented by the new in-focus node, and one or more of these KPIs can be selected and added to another GUI or to other display components within the same GUI.


KPIs Defined Using a Common Information Model


In certain implementations, in order to create queries, a knowledge of the fields that are included in the events with respect to which such queries are associated and/or a knowledge of the query processing language used for such queries can be advantageous. While certain users may possess domain understanding of underlying data and knowledge of the query processing language, other users (e.g., those who may be responsible for setting up/defining KPI's, for example) may not have such expertise. Accordingly, in certain implementations a common information model (CIM) can be utilized/applied. The referenced CIM can be, for example, a data model that is utilized or applied across multiple data sources. Such a CIM can simplify the creation of KPIs, reports, and other visualizations, thereby assisting end users in utilizing the described technologies.


A KPI associated with a service can be defined by a search query that produces a value derived from machine data, such as may be identified in entity definitions specified in a service definition of the service. Each value can, for example, be indicative of how a particular aspect of a service is performing at a point in time or during a period of time. Additionally, in certain implementations, the referenced KPI can be configured at a more abstract level as well, such as with respect to the overall performance of the service. For example, an aggregate KPI can be configured and calculated for a service to represent the overall health of a service. For example, a service may have 10 KPIs, each monitoring a various aspect of the service. The service may have 7 KPIs in a Normal state, 2 KPIs in a Warning state, and 1 KPI in a Critical state. The aggregate KPI can be a value representative of the overall performance of the service based on the values for the individual KPIs.


As also described herein, implementations of the present disclosure can provide a service-monitoring dashboard that displays one or more KPI widgets. Each KPI widget can provide a numerical or graphical representation of one or more values for a corresponding KPI or service health score (aggregate KPI for a service) indicating how a service or an aspect of a service is performing at one or more points in time. Users can be provided with the ability to design and draw the service-monitoring dashboard and to customize each of the KPI widgets. A dashboard-creation graphical interface can be provided to define a service-monitoring dashboard based on user input allowing different users to each create a customized service-monitoring dashboard. Users can select an image for the service-monitoring dashboard (e.g., image for the background of a service-monitoring dashboard, image for an entity and/or service for service-monitoring dashboard), draw a flow chart or a representation of an environment (e.g., IT environment), specify which KPIs to include in the service-monitoring dashboard, configure a KPI widget for each specified KPI, and add one or more ad hoc KPI searches to the service-monitoring dashboard. Implementations of the present disclosure provide users with service monitoring information that can be continuously and/or periodically updated. Each service-monitoring dashboard can provide a service-level perspective of how one or more services are performing to help users make operating decisions and/or further evaluate the performance of one or more services.


A KPI pertaining to a service (e.g., for monitoring CPU usage for a service provided by one or more entities) can be defined by a search query directed to search machine data. A service definition of the service associates entity definitions of the entities that provide the service with the KPI, and the entity definition includes information that records the association between the entity and its associated machine data.


In certain implementations, input specifying the search processing language for the search query defining the KPI can be provided/received. The input can include a search string defining the search query and/or selection of a data model to define the search query. Data models are described in greater detail herein, such as in conjunction with FIGS. 79B-D and E. The search query can produce, for a corresponding KPI, a value derived from machine data that is identified in the entity definitions that are specified in the service definition. It should also be understood that, as described in detail herein, the referenced search query can include one or more field identifiers to, for example, filter the result of the search query based on specific values included in respective one or more fields in events being searched. For example, the where clause of the search query may include the WHERE command followed by a key/value pair (e.g., WHERE host=Vulcan). In one implementation, “host” is a field identifier and “Vulcan” is a value stored in the field identified as “host.”


In certain implementations, a service monitoring system can define a search query for a KPI using a data model (e.g., via a GUI such as is depicted in FIG. 24). Such a GUI can enable the defining of the search query for the KPI using a data model. A data model refers to one or more objects grouped in a hierarchical manner and can include a root object and, optionally, one or more child objects that can be linked to the root object. A root object can be defined by search criteria for a query to produce a certain set of events, and a set of fields that can be exposed to operate on those events. Each child object can inherit the search criteria of its parent object and can have additional search criteria to further filter out events represented by its parent object. Each child object may also include at least some of the fields of its parent object and optionally additional fields specific to the child object, as described herein in conjunction with FIGS. 79B-D and E.


Referring to FIG. 75I, data model 700 may include a top level data model that may be referred to as a root data model 710. In some embodiments, the root data model 710 may represent a type of event. Each of the data sub-models 720, 730, 740, 750, 760, and 770 may be referred to as child of the data model 710. In some embodiments, the root data model 710 may represent a broader category of events and the data sub-models 720-770 may represent different subsets of the events that are represented by the root data model 710.


In some embodiments, a data sub-model may inherit a subset of the fields of the parent data model and/or may have additional fields, and the events to which the sub-model applies may be determined by adding additional filtering criteria (e.g., relating to field criteria) to the set of events (or search query) defining the parent data model such that the events associated with the sub-model are a subset of the events associated with the parent data model when both the parent and child data models are applied to the same source data. The root data model 710 may be associated with a first criterion for a first field (its search). The data sub-models 720 and 730 may also be associated with the first criterion for the first field. However, in some embodiments, the data sub-model 720 may also be associated with a second criterion for the first field or a second field, and the data sub-model 730 may be associated with a third criterion for the first field or a third field. Accordingly, if the root data model 710 is selected to perform a search, more events may be returned than if one of the data sub-models 720 or 730 is selected to perform a search on the same data.


The data model 700 may be applied to search any data and may define criteria of a search query. For example, if the parent data model 710 is selected to perform a search, then the events that satisfy the search criteria defined by the data model 710 may be returned. However, if the data sub-model 720 is selected to perform a search on the same data, then the events of the data that satisfy the search criteria defined by the data sub-model 720 may be returned. A search that is performed based on the search criteria of the data sub-model may result in fewer returned events than if a parent data model 710 is selected to perform a search on the same data.


Accordingly, a data model may be used to define different hierarchical levels to perform searches on data. The data model may be saved and applied to various different events. In some embodiments, a field module and an associated GUI may be used to generate a data model based on a search of data. For example, in response to an initial search query, a data model may be generated based on the initial search query. In some embodiments, the criteria of the initial search query may be associated with the root data model that is generated in response to the initial search query. Furthermore, when one or more automatically discovered fields are displayed in the GUI, the data model includes those fields as its attributes. A sub-model may be generated by receiving additional filtering criteria for fields through the GUI, and then a narrower search incorporating the initial search query's criteria and the criteria entered through the GUI defines the events associated with the sub-model, and automatically discovered fields determined to be of importance in the set of events generated by the filtered results using the criteria entered through the GUI are the sub-model's fields (attributes).


The data model that is generated based on the initial search query and modified based on values for the fields displayed in the GUI may be saved and used to perform searches of other data. For example, the data model may be generated after an initial search query of source data and may further be modified based on discovered fields of the events of the source data that are returned in response to the initial search query. The data model may be saved and subsequently applied to perform a search of events of different source data.


As discussed above, each of the referenced KPIs can measure an aspect of service performance at a point in time or over a period of time. Each KPI is defined by a search query that derives a KPI value from machine data such as the machine data of events associated with the entities that provide the service. In certain implementations, information in the entity definitions may be used at KPI definition time or execution time to identify the appropriate events. The KPI values derived over time may be stored to build a valuable repository of current and historical performance information for the service, which may itself be queried. Aggregate KPIs may be defined to provide a measure of service performance calculated from a set of service aspect KPI values, possibly across defined timeframes, and possibly across multiple services. A particular service may have an aggregate KPI derived from all of the aspect KPI's for the service for use as an overall health score for the service.


Additionally, in certain implementations, various visualizations can be built on the described service-centric organization of event data and the KPI values generated and collected. Visualizations can be particularly useful for monitoring or investigating service performance. For example, a service monitoring interface can be provided that is suitable as the home page for ongoing IT service monitoring. The interface is appropriate for desktop use or for a wall-mounted display in a network operations center (NOC), for example. The interface may prominently display a services health section with tiles for the aggregate KPI's indicating overall health for defined services, and a general KPI section with tiles for KPI's related to individual service aspects, for example. The tiles of each section may be colored and ordered according to factors such as the KPI state value, and may display KPI information in a variety of ways. The KPI tiles can be interactive so as to provide navigation to visualizations of more detailed KPI information.


Implementations of the present disclosure can enable the filtering down from various system data models to those data models that are populated and are also determined to be relevant, e.g., to a particular application such as a service performance monitoring application (and/or that are user created). For example, a data model specific to Enterprise Security would not be included for selection in a drop-down menu (e.g., in a GUI presented for IT service performance monitoring).



FIGS. 75J and 75K depict flow diagrams of exemplary method 79400 for performing a search query in response to detecting a scheduled time for a KPI, in accordance with one or more implementations of the present disclosure. Method 79400 may be performed by processing devices that may comprise hardware (e.g., circuitry, dedicated logic), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. Method 79400 and each of its functions, routines, subroutines, or operations may be performed by one or more processors of a computer device executing the method.


For simplicity of explanation, the methods of this disclosure are depicted and described as a series of acts (e.g., blocks, steps). Acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the methods in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methods could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methods disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methods to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from any computer-readable device or storage media. In one implementation, method 79400 may be performed to produce a GUI.


Referring to FIG. 75J, method 79400 may be performed by processing devices of a server device or a client device and may begin at block 79410. At block 79410, the processing device may detect a scheduled time for a key performance indicator (KPI). Such a KPI can reflect, for example, how a service provided by one or more entities is performing, such as is described herein. Additionally, in certain implementations, stored entity definition information can record the association between each of the referenced entities with its associated machine data. Moreover, in certain implementations, stored service definition information can associate the entities that provide the referenced service. Moreover, the referenced KPI can be defined by a search query. Such a search query can, for example, derive a value from the referenced associated machine data. The search query can be defined using a data model (e.g., a data model selected by a user from a list of available data models), and can include one or more field identifiers specified in the data model. For example, the WHERE command of the search query may filter the results of the search query associated with the selected data model to only return data that is associated with the host name “Vulcan” (e.g., the field identifier is “host” and the value of the field “host” is “Vulcan”). When defining the KPI, a user may also specify frequency or any other timing parameter(s) for executing the KPI search query (e.g., every 2 minutes, at a scheduled time during the day, etc.).


At block 79412, the processing device can perform a search query (e.g., the search query that defines the referenced KPI). In certain implementations, such a search query can be performed in response to detecting the referenced scheduled time (e.g., detected at block 79410). In certain implementations, the referenced search query can include a field identifier, such as a field identifier specified in a data model. Additionally, in certain implementations, the referenced search query can be defined in response to an input received via a graphical user interface (GUI). Moreover, in certain implementations such a data model can be a common information model (CIM).


Referring to FIG. 75K, various further aspects of block 79412 are described. At block 79412-A, the processing device can associate values in the associated machine data, such as those having disparate field names. In certain implementations, such values can be associated in accordance with disparate schemas (which may include a late-binding schema) with a field identifier, such as a field identifier specified in the referenced data model. For example, the field identifier specified in the referenced data model can be mapped to values in the machine data using the late-binding schema discussed herein.


At block 79412-B, the processing device can process the referenced associated values. In certain implementations, such associated values can be processed as semantically equivalent data instances. Such semantically equivalent data instances can reflect, for example, relationships, similarities, etc., between aspects of machine language and fields in the referenced data model (e.g., machine language corresponding to ‘network address’ and a field corresponding to ‘IP address’). Moreover, in certain implementations each of the referenced associated values can be processed as a value in a statistical calculation.


An example may aid in understanding. In this example, events identified with an Entity A are associated with a schema, such as a late binding schema, capable of extracting a value for a field named “delay_ms” from each event. Events identified with an Entity B are associated with a different schema, such as a late binding schema, capable extracting a value for a field named “tot_delay” from each event. The hypothetical events of Entity A and Entity B come from different respective sources and have different respective contents from one another, leading to the use of the different schemas. In this example, a particular event associated with Entity A has machine data containing “58” that can be extracted by its respective schema as the value for a field named “delay_ms”; and a particular event associated with Entity B has machine data containing “120” that can be extracted by its respective schema as the value for a field named “tot_delay”. The values “58” and “120” both represent individual measurements of the number of milliseconds it took in total to get a response to a ping request. The values have the same semantic but are difficult to use in common because they are associated with disparate field names from disparate schemas. In another example, the value “58” can represent a measurement in milliseconds and the value “120” can represent a measurement in seconds. In such a scenario a conversion (e.g., a unit conversion, such as from seconds to milliseconds) can be performed to ensure that the respective values conform to a common unit of measurement associated with the common field name.


The delay_ms field associated with Entity A events and the tot_delay field associated with Entity B events can be linked together by associating each with a common field name, such as “delay_total”, of a common information data model. The linking may be accomplished, for example, by making a record in storage of the association of each with the common field name. During search query processing, the computing machine can make reference to the stored association to use the common field name as an alternate field name, or alias, for the field values extracted from the events associated with both Entity A and Entity B, and the common information data model can serve as a logical overlay to the field naming of disparate schemas.


Continuing with the example, a KPI is defined by a search query that includes the strings “search . . . where delay_total>0” to select events for processing that have a delay_total field with a value greater than zero, and “|stats avg(delay_total)” to perform a calculation of the average of the delay_total field values for the selected events The “delay_total” field name in the search query string is the field name from the common information data model. Other selection criteria in the KPI search query select events associated with Entity A and Entity B. When the KPI search query is executed, such as when the computing machine detects that a prescribed period has elapsed, the hypothetical events of Entity A and Entity B will be processed in view of their association with the common information data model so that their respective values of 58 and 120 will be processed as semantic equivalents. Both will be used to satisfy the “delay_total>0” selection criteria for their respective events, and both will be used to calculate the average of the delay_total values.


In one embodiment, the stored association between a schema and a data model may be by reference to the corresponding field name of each. In one embodiment, the stored association between a schema and a data model may be by the data model field name referencing extraction information associated with the corresponding field name of the schema. In one embodiment, field values may be extracted from event data using the schema and then associated with the common field name on a field name basis. In one embodiment, field values may be extracted from event data using information from the schema and previously associated with the common field name. Other embodiments are possible. Moreover, field values may not be merely mapped to the common field name, but transformations are possible as part of the process. For example, a scaling factor may be applied to the data value to make it conform to a common unit of measurement associated with the common field name.


Accordingly, data models can be conveniently used to define search queries for KPIs and such search queries can be performed at scheduled time against events using associations between values in the events and field specifiers in data models in accordance with respective schemas. As such, the creation of KPIs is significantly simplified and no longer requires user knowledge of the specific fields that are included in the events being searched, and user extensive knowledge of the query processing language used for KPI search queries.


Control Modules


The present detailed description discusses command/configuration/control (CCC) data that is used to direct the ongoing operation of a service monitoring system (SMS) including, without limitation, storage representations and user interfaces for creating, reading, viewing, updating, and deleting CCC data. The present detailed description describes inventive aspects that may each provide a system user or administrator the ability to leverage their work in establishing the CCC data for an installation, easing their burden. For example, entity association rules such as entity filter criteria (as discussed in relation to FIG. 17C, for example) can reduce the need for constant updates and repetitive entry of definitional/configuration data that associates services with relevant entities.


Apart from any benefit a user may experience, the use of such aforementioned inventive aspects results in an improved SMS computing machine. As one example, by increasing the reusability of certain CCC data, expanding its scope of influence, or reducing the level of interactivity required with a user or administrator to effect the CCC data necessary to implement the service monitoring in an environment, the amount of computing resources expended by the system and the computing resources footprint needed to maintain the system at a point in time may be reduced. A system without such inventive aspects requires more and prolonged user interface processing to implement its CCC-based command and control function. Moreover, such user interface processing can be disproportionately expensive as it consumes valuable computing resources such as memory, cache space, paging space, process and thread representations, and the computer processor work used to implement, manage and maintain those —even when the user is idle—for the duration of the user's interactive session. Accordingly, an SMS practicing such inventive aspects reduces the overhead burden of its command and control function yielding greater capacity for actual service monitoring work.


An SMS with inventive aspects that can be seen to have such advantages is next described, one that enables a system user or administrator to create a collection or module that includes one or more configuration or definitional (CCC) components already defined in an SMS, and to package that module in a portable format that can be easily conveyed, transported, or transmitted for reuse by a different SMS installation, deployment, or instance, or that can be used as an external storage format for CCC information for purposes of checkpointing, backup, archiving, or the like.


FIG. 75L1 illustrates a block diagram of a system implementing control modules in one embodiment. System 91800 of FIG. 75L1 illustrates one possible embodiment where command module processing functionality of a first service monitoring system (SMS) may be used to construct a command module from command/configuration/control information of that SMS, and to package that command module in a portable format for use by a second SMS. System 91800 is shown to include information technology operational analytics (ITOA) system 91802, module packages data store 91804, secondary service monitoring system (SMS) 91806, and user interface device 91818. ITOA system 91802 is shown to further include primary service monitoring system (SMS) 91814 and command/configuration/control (CCC) data store 91810. SMS 91814 is shown to include module management component 91816. CCC data store 91810 is shown to include a variety of types, classes, categories, objects, items, elements, components, or the like, of command/configuration/control data 91820-91830, now generically discussed as control data and control data items inasmuch as the data is useful to effect control and direction of the active operation of a service monitoring system. Secondary SMS 91806 is shown to include module manager 91808.


In this illustrative example, ITOA system 91802 may represent an active and operating ITOA system deployed for an IT environment and may well include components and functionality not specifically shown and discussed here. For example, system 91802 may include a data input and query system or event processing system (DIQ/EPS) that ingests machine data produced by or about components in the IT environment, storing that data, and making it available for use by SMS 91814. Event processing system 205 of FIG. 2 is one example of such a DIQ/EPS. CCC data store 91810 of FIG. 75L1 may include command/configuration/control information exclusively for SMS 91814 or for other components within the ITOA system 91802, as well. In one embodiment, certain CCC data may be used by more than one component of ITOA system 91802. For example, certain CCC data may be used both by SMS 91814 and by a companion DIQ/EPS (not shown). These and other embodiments are possible.


The control data items 91820-91830 shown for command/configuration/control data store 91810 are now described principally in their relationship to SMS 91814 without regard to any usefulness each may have to other components of ITOA system 91802. Each of the control data items is illustrated in FIG. 75L1 as a plurality of three items, instances, occurrences, or the like, of the particular item type. The plurality of instances signifies that an SMS may utilize multiple instances of control items of a particular type but the actual number of any may vary from embodiment to embodiment, deployment to deployment, and from time to time. Each of control data items represented by 91820-91830 may be a singular data item, a collection of data items, a collection of collections, combinations of these, or the like. The control data items are logical constructs susceptible to a wide variety of organizations, formats, representations, or the like, both logically and physically. Moreover, different control data item types and even different instances of control data items of the same type may be organized, formatted, represented, and the like separately and/or differently. For example, in an embodiment, one production instance of a Service control data item as represented by 91820 used to actively direct the instant operation of service monitoring system 91814 may be kept locally in a high performance key-value store, while a proposed or historic instance of a Service control data item as represented by 91820 not in active use to direct the operation of service monitoring system 91814 may be kept remotely on network-accessible storage in a long-form, textual format that is easily readable by a user, developer, or administrator. Similarly, in an embodiment, foreground (active control) data items may be stored in a higher performance representation format and location as compared to background control data items such as metadata or templates for foreground items. These and other embodiments are possible.


The control data items 91820-91830 shown for CCC data store 91810 are illustrative examples of the types of CCC data as might be used by an SMS like 91814. Service control data items as represented by 91820 of FIG. 75L1 of the presently described embodiment may be data items that define or configure services to effect monitoring by SMS 91814. Service control data items as represented by 91820 may include service definitions as wholes, component parts thereof, or collections thereof. Service definitions are illustrated and discussed in relation to FIGS. 4 and 17B, for example.


Entity control data items as represented by 91821 of FIG. 75L1 of the presently described embodiment may be data items that define or configure entities recognized by service monitoring system 91814. Entity control data items as represented by 91821 may include entity definitions as wholes, component parts, or collections. Entity definitions are illustrated and discussed in relation to FIGS. 4, 10B, 10C, and 17C, for example.


KPI control data items as represented by 91822 of FIG. 75L1 of the presently described embodiment may be data items that define or configure KPIs produced by operation of service monitoring system 91814. KPI control data items as represented by 91822 may include KPI definitions as wholes, component parts, or collections. KPI definitions are illustrated and discussed in relation to FIGS. 4, and 17B, for example.


Shared Search control data items as represented by 91823 of FIG. 75L1 of the presently described embodiment may be data items that define or configure a shared base search executed during operation of service monitoring system 91814. Shared Search control data items as represented by 91823 may include shared search definitions as wholes, component parts, or collections. Definition of shared searches is discussed in relation to FIG. 27A1, for example.


Correlation Search control data items as represented by 91824 of FIG. 75L1 of the presently described embodiment may be data items that define or configure Correlation Searches performed by service monitoring system 91814. Correlation Search control data items as represented by 91824 may include correlation search definitions as wholes, component parts, or collections. Correlation searches are illustrated and discussed in relation to FIG. 34D, for example.


Glass Table control data items as represented by 91825 of FIG. 75L1 of the presently described embodiment may be data items that define or configure glass table or dashboard visualizations generated during service monitoring system 91814 operation. Glass Table control data items as represented by 91825 may include glass table definitions as wholes, component parts, or collections. The definition of glass tables is discussed in relation to FIG. 35, et seq, for example.


Deep Dive control data items as represented by 91826 of FIG. 75L1 of the presently described embodiment may be data items that define or configure deep dive or graph lane visualizations generated during service monitoring system 91814 operation. Deep Dive control data items as represented by 91826 may include deep dive or graph lane definitions as wholes, component parts, or collections. The definition of deep dive visualizations is discussed in relation to FIG. 50A, et seq, for example.


Data Model control data items as represented by 91827 of FIG. 75L1 of the presently described embodiment may be data items that define data models utilized during operation of service monitoring system 91814. Data Model control data items as represented by 91827 may include data model definitions as wholes, component parts, or collections. Data models are discussed in relation to FIGS. 25, 751, 79B, and 79C, for example.


Rules/Searches control data items as represented by 91828 of FIG. 75L1 of the presently described embodiment may be data items that define Rules/Searches utilized during operation of service monitoring system 91814, possibly including filter criteria, association indicators, or the like. Rules/Searches control data items as represented by 91828 may include Rule/Search definitions as wholes, component parts, or collections. Rules/Search examples are seen and discussed in relation to FIGS. 17B, 17C, and 17D, for example.


Other control data items as represented by 91829 of FIG. 75L1 of the presently described embodiment may be any variety of data items that are used to effect control over operation of service monitoring system 91814.


Module control data items as represented by 91830 of FIG. 75L1 of the presently described embodiment may be data items that define, configure, represent, or the like, command modules that group, collect, aggregate, or the like, one or more control data items of an SMS. Module control data items may indicate membership of particular control data items in a particular module. Module control data items may include metadata about a module. Module control data items may include information about packaged forms (e.g., portable/external/exchangeable forms) of a module. Module control data items that substantially subsume the entire content of the module may be represented in the same form as a module package or in a different form. Module control data items may include information for a Domain Add-on facility as discussed in relation to FIG. 10AF. One of skill will further appreciate module control data items of FIG. 75L1 as represented by 91830 after consideration of the figures and discussion follow.


FIG. 75L2 is a diagram of methods and process flow for creation, use, and management of control modules and module packages in one embodiment. Process flow 91850 is shown to include creation and export method 91852 and import method 91854. Methods 91852 and 91854 are such as may be performed by the processing of service monitoring system (SMS) 91814 of FIG. 75L1, for example, and more particularly module manager 91816 of FIG. 75L1. Module control items 91830, control module packages 91804, and user interface device 91818 of FIG. 75L1 also appear in FIG. 75L2 as part of the operating context of process flow 91850.


The methods illustrated and discussed in relation to FIG. 75L2 may be performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as the one run on a general purpose computer system or a dedicated machine), or a combination of both. In one implementation, the method may be performed by a client computing machine. In another implementation, the method may be performed by a server computing machine coupled to the client computing machine over one or more networks. These and other embodiments are possible.


The methods illustrated and discussed in relation to FIG. 75L2 are intended to help explain inventive aspects. One of skill will understand that inventive aspects may be practiced in a variety of different embodiments including those that may make additions to, augment, change, omit, reorder, or otherwise modify, the processing described in relation to process flow 91850. In that vein, processing described for a particular process block of FIG. 75L2 may be dispersed or distributed differently in a varying embodiment without departing from the inventive subject matter.


Control module creation and export method 91852 of FIG. 75L2 illustrates the construction of a control module as well as the creation of a control module package for export. In one embodiment, the internal format of a control module used by the module manager processor of an SMS (such as module manager 91816 of SMS 91814 of FIG. 75L1) may be identical to the external format of a control module package used to convey SMS CCC data to other SMS instances, installations, or deployments. In one embodiment, the internal format of a control module is by logical reference to constituent elements, possibly modified representations of constituent elements (such as a templatized version of a constituent element, such as a service definition), and useful metadata, while the external format of a control module is a unified package with contents strictly conforming to a packaging standard or specification for the interchange of SMS control data. Such a standard may be promulgated by a specific vendor of SMS systems or software, by an industry group, a standards body, or by another.


The illustrative processing of control module creation and export method 91852 of FIG. 75L2 begins at block 91860. At block 91860, a user interface is presented or displayed that enables a user, such as a system administrator or developer, to add and possibly, view, update, delete, or otherwise process metadata relating to, and the substantive content of, a control module. In one embodiment, the user interface is a graphical user interface with interactive elements that a user may engage via a user interface device such as 91818. In one embodiment, the user interface substantially relies on a persistently displayed main page over which transient display elements, such as pop-ups or modal windows, appear as needed. In one embodiment, the user interface navigates among a number of full-page displays as processing needs dictate, with or without additional transient elements. In one embodiment, the processing of block 91860 begins with the display or presentation of a list or inventory of the command modules known to the SMS along with interactive elements enabling the user to navigate to subsequent user interface displays that may provide various presentations of command module information, possibly at varying levels of detail, and that enable the user to add, modify, and/or delete entire command modules or certain of their contents. These and other embodiments are possible.


At block 91862, user input resulting from interaction by the user with the user interface displayed at block 91860 is received by the computing machine. Processing may iterate over blocks 91860 and 91862 until the processing of block 91862 determines that processing should proceed to block 91864. In one embodiment, the processing of block 91862 determines that processing is to proceed to block 91864 on the basis of having received from the user indications for the minimum information required to construct a control module. In one embodiment, the processing of block 91862 determines that processing is to proceed to block 91864 on the basis of having received from the user indications for a full complement of information for a control module. In one embodiment, the processing of block 91862 determines that processing is to proceed to block 91864 based on a specific indication by the user that processing should proceed, such as a mouse click on an interactive button labeled “Continue”, “Next”, or “Construct Module”, for example. These and other embodiments are possible.


In one example embodiment, processing iterates over blocks 91860 and 91862. In a first iteration, a user is required to specify certain metadata about a control module to be created, such as providing a name and a description. In a second and subsequent iterations a user is presented with lists of control data items in the CCC data store of an SMS (such as 91820-91829 of FIG. 75L1). The user interacts with the list to select items to be included in the control module. The iteration stops when block 91862 detects user interaction with the user interface element indicating “create module”, and processing proceeds to block 91864.


At block 91864, the computing machine populates the command/configuration/control data store of an SMS (such as CCC data store 91810 of SMS 91814 of FIG. 75L1) with one or more Module control data items, as represented by 91830 in both FIGS. 75L1 and 75L2, that collectively represent the module. One or more module control data items may include module metadata. One or more module control data items may include references to control data items that make up the substance of the control module now being created, such as control data items from among those of 91820-91829 of FIG. 75L1. One or more module control data items may include copies of control data items that make up the substance of the control module now being created, such as control data items from among those of 91820-91829 of FIG. 75L1. These and other embodiments are possible.


One or more module control data items may include adaptations or derivations of control data items that the user indicates should make up the substance of the control module now being created, such as adaptations or derivations of control data items from among those of 91820-91829 of FIG. 75L1. In one embodiment, such an adaptation or derivation may result from processing of block 91864 to templatize a control data item. The templatizing process results in a derivation of a control data item that has more general applicability than the original control data item. In one embodiment, the templatizing process may result in a derivation of a control data item with blanks (i.e., content, possibly for certain fields, that is selectively deleted or omitted) for a downstream user to fill in, and/or a derivation of a control data item with site-specific or privacy data removed. In one embodiment, information that is blanked, deleted, or removed by the templatizing process may be replaced with text for user prompts for replacement information, harmless dummy information, or anonymized information. In one embodiment, the templatizing process may be directed by pattern recognition and substitution rules that may include regular expressions. In one embodiment such pattern recognition and substitution rules may be built-in to the SMS, supplied by the user, or both. As one example, the processing of block 91864 to construct a control module may evaluate Service control data items that specify entity association indicators, looking for any IP V4 addresses and replacing them with the constant “0.0.0.0”. As another example, the processing of block 91864 to construct a control module may evaluate KPI control data items that specify threshold values, replacing any value found with the prompt text “Enter a value between 1 and 100” that a downstream user would see after importing a package of the control module.


In one embodiment, templatizing may include stripping information from a control data item that is specific for a given SMS instance. For example, a control data item that embodies a KPI definition may have some properties that may be relevant to all or many system instances where that KPI definition may be employed (e.g., KPI name, search string or template, etc.) and some properties that are system/deployment/site/installation/instance-specific (e.g., threshold levels, etc.). In one embodiment, a control data item may be templatized by transforming at least one of its instance-specific properties by elimination of the property, altogether, or by changing its representation (e.g., IPAddr property of 192.168.25.100 changed to templatized value of “ ” or “0.0.0.0”). In one embodiment a control data item is templatized by transforming all of its instance-specific properties. In one embodiment a control data item is templatized by transforming some of its instance-specific properties based on built-in criteria. In one embodiment a control data item is templatized by transforming some of its instance-specific properties based on user-supplied criteria. These and other embodiments are possible.


At block 91866, the computing machine performs processing to validate the structure, content, and/or acceptability of the constructed control module. In one embodiment, validation processing may include machine validation using rule sets. Rule sets may be built-in and based on system requirements. For example one SMS may require that for every service definition/template in a control module there is at least one corresponding entity association rule and at least one corresponding KPI definition/template. In one embodiment, validation rule sets may be customized by the user. In one embodiment, built-in and user-created rule sets are used together for validation.


In one embodiment, validation may include enabling a user to signal her own acceptance/validation of the module. One such embodiment may include displaying summary or detail information regarding the module properties and/or contents to the user. One such embodiment may require the user to signal validation/acceptance in an express manner such as by direct interaction with a user interface button labeled “Approve”, “Accept”, “Save Module”, “Commit”, or the like, in contrast to a more passive or implied signal. After the module is validated at block 91866, processing proceeds to block 91868, in the illustrated embodiment.


At block 91868, the control module is exposed and/or exported for external use in the form of a control module package. In one embodiment, a module control data item such as represented by 91830, includes a representation of the control module in the form required by a control module package, or substantively so. In such an embodiment, the processing of block 91868 may include copying the module control data item to a specific location designated for exported control module packages (e.g., 91804), such as to a particular directory in a host file system. In one embodiment, the processing of block 91868 may include assembling the control module package contents from among various control data item types in a CCC data store (such as control data types represented by 91820-91830 of CCC data store 91810 of FIG. 75L1) in accordance with Module control data items for the control module (such as represented by 91830 of CCC data store 91810 of FIG. 75L1). In one embodiment the control module is represented in a data package that conforms to a particular control module packaging standard representation format. Such a data package may be standalone and portable, and useful for archiving or distributing a control module. In one embodiment a standard data package may be represented as a collection of key-value pairs. In one embodiment the standard data package may be represented as a .zip file that includes standardized folder and file names. In one embodiment, a standard data package may conform to a standard representation format promulgated by a standards body. In one embodiment, a standard data package may conform to a standard representation format specified by a SMS provider. These are but a few examples of the types of data packages an embodiment may employ.


In one embodiment, a standard representation format specified by a SMS provider may be generally available to developers, customers, or the world at large, perhaps by publication of its specifications/requirements in printed documentation or via a website with little restriction. In one embodiment, a standard representation format specified by a SMS provider may not be generally available but rather may have restricted access and be revealed on a restrictive basis to certain of its regular and contract employees, development partners, or the like, only after establishing a trust, confidence, or legal relationship to prevent or limit use and dissemination of information about the standard representation format. Such may be the case where protecting the representation format can provide increased system reliability or security, for example. In an embodiment, a process such as method 91852 of FIG. 75L2 to create a control module/package, may be able to be practiced without requiring users who exercise the method (e.g., via an interactive session) to have prior knowledge of, availability of, access to, or a working knowledge of the specification and requirements of a standard representation format for the control module package. In such an embodiment, a computer user who is agnostic of the details of a control module package representation format prior to engaging a method, such as 91852 of FIG. 75L2, may be able to interact with a system implementing the method, perhaps by interaction with user interfaces as illustrated herein, to cause the production of a properly formatted control module package. This may be true whether the user is agnostic because they do not have access to the standard representation format requirements or because they have not yet availed themselves of readily available standard representation format requirements.


Process 91854 of FIG. 75L2 makes downstream use of control module packages, such as represented by 91804. Where a control module package was created for checkpointing or archiving, the processing of 91854 may likely be conducted by the same SMS which utilized a process such as 91852 to create the control module package. Where a control module package was created for export to a different SMS instance/installation/deployment, the processing of 91854 may likely be conducted by a different SMS than that which created the control module package.


At block 91870, a control module package is identified for import. In one embodiment, the computing machine enables a user make the identification, perhaps by selecting a control module package file using a file-open dialog box or a host filesystem browser presented to the user via a user interface device 91818. In one embodiment, the command module package to be imported is identified by a network communication with another computer, perhaps by use of a RESTful interface exchange between the host computer that exported the control module package and the host computer importing the control module package. These and other embodiments are possible. The control module package having been successfully identified by the processing of 91870, processing may proceed to block 91872.


At block 91872, the identified control module package is imported. The processing of block 91872, in an embodiment, may essentially reverse the processing described earlier in relation to block 91868, with the result that the substantive content of the identified control module package takes on a representation in one or more control data items in the command/configuration/control data store of the importing SMS. The control data items may be Module control data items and control data items of a variety of types. In one embodiment, the processing of block 91872 may further result in enabling user interaction to transition control module contents into the active control of the importing SMS system. In one embodiment, such transitioning may include merely “flipping a switch” to activate certain ready-to-go control module contents. In one embodiment, such transitioning may include engaging the user to provide necessary inputs to instantiate certain templatized control data items. These and other embodiments are possible.


The preceding discussion of process flow 91850 of FIG. 75L2 disclosed and illustrated processing mechanisms of a service monitoring system (SMS) to leverage the use of existing command/configuration/control (CCC) data, and to unite groups of CCC data items for common management, and to thereby reduce the computing resource burden imposed for the necessary CCC data creation, maintenance, and management. Such processing mechanisms and further inventive aspects will be appreciated by one of skill in the art by consideration of the additional disclosure that follows including, for example, the instructional user interface display examples of FIGS. 75L3-75L8 and the control module package example of FIG. 75L9.


FIGS. 75L3 to 75L8 illustrate example interface displays and interface display components useful to conduct control module management processing. FIG. 75L3 illustrates an example interface display listing control modules of an SMS and enabling navigation requests to further processing options. Interface 91900 of FIG. 75L3 is shown to include system title bar area 91902, application menu/navigation bar area 91904, application header area 91910, and control module listing area 91920. Control module listing area 91920 is shown to include list management area 91922, and module list area 91930. Module list area 91930 is shown to include column heading area or row 91932, and a list item area 91934 having list item rows or entries 91934a-i.


System title bar area 91902 is comparable to system title bar area 27102 of FIG. 27A2 discussed in detail elsewhere. Application menu/navigation bar area 91904 is comparable to application menu/navigation bar area 27104 of FIG. 27A2 discussed in detail elsewhere. Application header area 91910 shown to include the title “ITSI Modules”, the description “Viewer for all ITSI Modules”, and Create Module action button 91912. List management area 91922 of control module listing area 91920 is shown to include control module count indicator 91926 and filter component 91928. Filter component 91928 is shown as a text box displaying the user prompt “filter.” Filter component 91928 is interactive enabling the user to enter or edit filter criteria for determining the control modules appearing in module list display table 91930.


Module list display table 91930 is shown to include column header row 91932 and control module list entries 91934a-i. Module list display table 91930 displays a possibly filtered list of control module definitions in a tabular format. Column header row 91932 includes an informational-“i” identifier for column 91941, a “Title” identifier for column 91942, a “Current Version” identifier for column 91943, a “Module Developer” identifier for column 91944, a “Last Exported” identifier for column 91945, and an “Actions” identifier shared for columns 91946-91948. A representative control module list entry row 91934a, for example, displays: an interactive token, “>”, in column 91941 enabling a user to navigate to an interface display (perhaps similar to what is shown in tab display area 92020 of interface 92000 of FIG. 75L5) that enable a user to view and/or edit substantive information pertaining to the control module definition represented by the list entry; the text “ITSI Module for Application Servers” in column 91942 as the title of the control module represented by the list entry as may have been initially entered using element 91980 of interface 91960 of FIG. 75L4, for example; the number “2.5.0” in column 91943 as the current version identifier of the control module represented by the list entry as may have been initially entered using element 92044 of interface 92000 of FIG. 75L5, for example; the text “Splunk, Inc.” in column 91944 as the identifier for the module developer of the control module represented by the list entry as may have been initially entered using element 92042 of interface 92000 of FIG. 74 L5, for example; the timestamp “11/21/2016, 11:37:38 AM” in column 91945 as the last exported time of the control module represented by the list entry as may have been updated by the processing of block 91868 of FIG. 75L2, for example; a “Download” interactive element in column 91946 enabling a user to signal the desire to initiate a process to download or export the control module represented by the list entry, which process may include navigating to user interfaces or components (not shown) such as a save-file-as system dialogue, for example; a “Validate Module” interactive element in column 91947 enabling a user to signal the desire to initiate a process to validate the control module represented by the list entry, which process may include navigating to user interfaces or components (not shown) to facilitate such processing and as may be effected by the validate processing block 91866 of FIG. 75L2, for example; and an “Editv” interactive element in column 91948 enabling a user to navigate to an interface display or perform other processing for the control module represented by the list entry, as may be selected from a drop-down list associated with the interactive element.


Each of the control modules represented in a list entry of display table 91920 of FIG. 75L3 may have been introduced to the command/configuration/control data of the service monitoring system through the importation of a control module package, or may have been introduced by module creation processing as described for process flow 91852 of FIG. 75L2. Such module creation processing may be requested by a user through interaction with the “Create Module” command button 91912 of FIG. 75L3, such as a mouse click or finger press on a touchscreen. In response to the user interaction with command button 91912 the computing machine may initiate certain processing such as described in relation to 91852 of FIG. 75L2 which may include causing the display of an initial “Create New Module” user interface as will next be described.


FIG. 75L4 depicts a user interface related to control module information in one embodiment. Such an interface may be used in an embodiment to prompt for and acquire user input indicating the desired content for a control module being defined and created (or edited), and particularly information related to identifying, describing, or characterizing the control module, such as certain properties, attributes, or metadata. Interface 91960 of FIG. 75L4 is such as might be involved in the processing of blocks 91860 and 91862 of FIG. 75L2. Interface 91960 of FIG. 75L4 is shown to include title area 91962, footer area 91966, and main display area 91964. Title area 91962 is shown to include the title “Create New Module” 91972 which may describe the process, subprocess, or function for which the user interface is being displayed. Main display area 91964 is shown to include module title component 91980, module application ID component 91982, description component 91984, and permissions component 91986. Footer area 91966 is shown to include Cancel command button 91974 and Create command button 91976. Module title component 91980 is shown as a text box that is interactive, enabling the user to add or edit text indicating the title of the control module being created. The title supplied by the user through interaction with Module title component 91980 is such as might appear in column 91942 of display table 91920 of FIG. 75L3 at a time when the newly created control module is represented there.


Module application ID component 91982 is shown as a text box that is interactive, enabling the user to add or edit text indicating an application identifier of the control module being created. In one embodiment, an application identifier for a control module may be a secondary identifier for the module used to identify the module within the SMS or to a companion or related system such as a DIQ/EPS utilized during SMS processing. In one embodiment, an application identifier for control module may be the identifier for a group of control modules with which the control module is to be identified. These and other embodiments are possible. Description component 91984 is shown as a text box that is interactive, enabling the user to add or edit text indicating a description of the control module being created. Permissions component 91986 is shown as a pair of mutually exclusive interactive buttons 91986a and 91986b for respectively indicating either a private or shared permission for the control module being created. In one embodiment, control modules having private permission may be viewed and manipulated only by their creating user while control modules having shared permission may be viewed and manipulated by any user. In one embodiment, control modules having private permission may be viewed by everyone but manipulated only by their creating user, while control modules having shared permission may be viewed by everyone and manipulated by any user having a particular privilege level, such as an administrative privilege level. These and other embodiments are possible.


After indicating desired choices and information using interface 91960 of FIG. 75L4, a user may indicate acceptance of the user interface content by interacting with “Create” action button 91976. User interaction with action button 91976 may result in the computing machine populating a portion of a nascent control module definition in computer storage, and may result in the computing machine placing some representation of the nascent control module definition in a region of a command/control/configuration data store for an SMS, such as may contain Module control data items 91830 of FIG. 75L1. Such processing may be performed by processing block 91864 of FIG. 75L2 in an embodiment. Processing responsive to user interaction with “Create” action button 91976 of FIG. 75L4 may include navigating to a subsequent user interface display such as depicted in FIG. 75L5. These and other embodiments are possible.


FIG. 75L5 depicts a user interface related to control module detail information in one embodiment. Such an interface may be used in an embodiment to prompt for and acquire user input indicating the desired content for a control module being defined and created (or edited). Interface 92000 of FIG. 75L5 is such as might be involved in the processing of blocks 91860 and 91862 of FIG. 75L2. Interface 92000 of FIG. 75L5 is shown to include system title bar area 91902, application menu/navigation bar area 91904, header area 92010, and tabbed display area 92020. Header area 92010 is shown to include an interface title “Test Module” 92012, “Add Content” action button 92014, and “Export Module” action button 92016. In one embodiment, interface title 92012 may contain fixed text. In one embodiment, interface title 92012 may correspond to the title for the control module which is the subject of the displayed interface, such as a title as may have been provided using element 91980 of interface 91960 of FIG. 75L4, described earlier. In one embodiment, interface title 92012 of FIG. 75L5 may correspond to the Application ID for the control module which is the subject of the displayed interface, such as an Application ID as may have been provided using element 91982 of interface 91960 of FIG. 75L4, described earlier.


Tabbed display area 92020 of FIG. 75L5 is shown to include tab control area 92022 and tabbed information display area 92024. Tab control area 92022 is shown to include a single tab control, “Module Metadata” 92030 which, accordingly, is by default the selected tab control and has its information appearing in tabbed information display area 92024. Tabbed information display area 92024 is shown to include Application ID component 92040, Author component 92042, Version component 92044, Readme File component 92046, Application Icon component 92048, 2X Application Icon component 92050, and Add Content action button 92052 which duplicates the appearance and functionality of Add Content action button 92014.


Application ID component 92040 of FIG. 75L5 is shown to include the label “App ID” and the text value of the App ID, “TestModule”, which is visible for display but cannot be edited. The value for the App ID may have been initially provided by a user by means of component 91982 of interface 91960 of FIG. 75L4, for example. Author component 92042 of FIG. 75L5 is shown as a text box that is interactive, enabling the user to add or edit text indicating the author of the control module being created. Version component 92044 of FIG. 75L5 is shown as a text box that is interactive, enabling the user to add or edit text indicating a version designation for the control module being created. The version designation supplied by the user through interaction with Version component 92044 is such as might appear in column 91943 of display table 91920 of FIG. 75L3 at a time when the newly created control module may be represented there.


Readme File component 92046 of FIG. 75L5 is shown as a file designation user interface control element with the label “README file” that enables the user to designate a file to be used as the README file of the control module being created. The file designation user interface control element of one illustrative embodiment displays a name and/or location designation (such as a fully qualified path in a host file system) for a file or, if undetermined, displays a prompt text such as “Browse . . . ”. In either case, the user may interact with the file designation user interface control element, such as by a mouse click or a finger press on a touchscreen, to navigate to user interface elements enabling the user to specify a file designation (e.g., file name, fully qualified path, etc.), perhaps by selecting from a browsable list of files in a file system, files in a specific directory, recently used files, or the like. Such a file designation may then appear in the display of the file designation user interface control element.


Application Icon component 92048 of FIG. 75L5 is shown as a file designation user interface control element with the label “App Icon” that enables the user to designate a file to be used as the icon representation of the control module being created. Similarly, 2X Application Icon component 92050 of FIG. 75L5 is shown as a file designation user interface control element with the label “App Icon@2x” that enables the user to designate a file to be used as a large format icon representation of the control module being created.


After indicating desired choices and information using interface 92000 of FIG. 75L5, a user may indicate acceptance of the user interface content by interacting with “Add Content” action button 92052 or 92014. User interaction with the action button may result in the computing machine populating a portion of a nascent control module definition in computer storage, and may result in the computing machine placing some representation of the nascent control module definition in a region of a command/control/configuration data store for an SMS, such as may contain Module control data items 91830 of FIG. 75L1. Such processing may be performed by processing block 91864 of FIG. 75L2 in an embodiment. Processing responsive to user interaction with an “Add Content” action button of FIG. 75L5 may cause the display of a different or modified user interface such as depicted in FIG. 75L6. These and other embodiments are possible.


FIG. 75L6 illustrates an example interface related to control module detail information options in one embodiment. Such an interface may be used in an embodiment to prompt for and acquire user input indicating a type, class, category, section, or the like, of content to be added to the control module. Interface 92100 of FIG. 75L6 is such as might be involved in the processing of blocks 91860 and 91862 of FIG. 75L2.


In one embodiment, the presentation of interface 92100 results from a modification of user interface 92000 of FIG. 75L5. Interface 92100 of FIG. 75L6 is shown to include system title bar area 91902, application menu/navigation bar area 91904, and header area 92010, just as for interface 92000 of FIG. 75L5. Interface 92100 of FIG. 75L6 it is shown to include a tabbed display area 92020 having the same content as tabbed display area 92030 of FIG. 75L5, but narrowed. The user interface area vacated by the narrowing of tabbed display area 92020 of FIG. 75L6 is shown to contain control data content options area 92110. Control data content options area 92110 is shown to include header area 92112 and options list area 92114. Header area 92112 of the control data content options area 92110 of FIG. 75L6 displays the title “Add Content to Module”. Options list area 92114 of the control data content options area 92110 of FIG. 75L6 displays interactive list entries “Services” 92120, “Data Models” 92122, and “Entity Searches” 92124. The user may indicate to the computing machine the type or category of content (such as control data items as represented by 91820-91829 of CCC data store 91810 of FIG. 75L1, for example) he would like to add to the control module by interacting with the corresponding interactive list entry, such as by a mouse click or finger press on a touchscreen. User interaction with one of the interactive list entries may signal the computing machine of the user's desire to add content of a particular type to the control module and the computing machine may undertake processing to effect the same, perhaps by navigating or adapting the user interface. As one illustrative example, user interaction with interactive list entry 92120 of FIG. 75L6, “Services”, may result in processing that causes the presentation of the user interface of FIG. 75L7.


FIG. 75L7 illustrates an example interface for adding content to a control module. Such an interface may be used in an embodiment to prompt for and acquire user input indicating one or more content items to be included in a control module, perhaps by user selection from a list of available items. Interface 92150 of FIG. 75L7 is such as might be involved in the processing of blocks 91860 and 91862 of FIG. 75L2.


In one embodiment, the presentation of interface 92150 of FIG. 75L7 results from a modification of user interface 92100 of FIG. 75L6. Interface 92150 of FIG. 75L7 is shown to include system title bar area 91902, application menu/navigation bar area 91904, and header area 92010, just as for interface 92100 of FIG. 75L6. Interface 92150 of FIG. 75L7 is shown to include a tabbed display area 92020 having the same content as tabbed display area 92020 of FIG. 75L6 but narrowed. The user interface area vacated by the narrowing of tabbed display area 92020 of FIG. 75L7 is shown to contain control data content options area 92110 having the same content as control data content options area 92110 of FIG. 75L6. The remaining user interface area of interface 92150 of FIG. 75L7 is shown to be occupied by control data item area 92160.


Control data item area 92160 of FIG. 75L7 is shown to include header area 92162, search component 92164, and control data item table 92166. Control data item area 92160 may enable a user to indicate to the computing machine an identification of one or more control data items to be included in a control module. The control data item area 92160 of the illustrated embodiment enables a user to indicate the identification of control data items by indicating a selection from a list of available control data items presented in tabular form. In the present example, because the presentation of user interface 92150 is deemed to have occurred because of user interaction with “Services” interactive element 91210 of FIG. 75L6, the list of available data items in control data item area 92160 may only include service control data items (consider, for example, service control data items 91820 of FIG. 75L1).


The header area 92162 of control data item area 92160 of FIG. 75L7 is shown to include the title “Select Services to Add” and an “Add to Module” action button 92163. Search component 92164 is shown as a text box displaying the user prompt “search”. Search component 92164 is interactive enabling the user to enter or edit search/filter criteria for determining the subset of service control data items appearing in control data item table 92166.


Control data item table 92166 is shown to include column header row 92170, and control data item entries 92180, 92182, 92184, down through to 92189. Column header row 92170 is shown to include an interactive checkbox in column 92172, interaction with which by the user may result in indicating the selection of all of the control data items represented in the table. Column header row 92170 is shown to further include “Service” column name in column 92174, and “Description” column name in column 92176. Representative control data item entry 92180 is shown to include an interactive checkbox 92180a in column 92172, service identifier “IT Service” in column 92174, and service description “IT Service” in column 92176. User interaction with checkbox 92180a, such as by a mouse click or finger press on a touchscreen, may toggle checkbox 92180a between a selected and unselected state, with checkbox 92180a showing a check mark (not shown) when selected and showing the empty checkbox when unselected.


After indicating desired choices using interface 92150 of FIG. 75L7, particularly by indicating desired selections using the control data item area 92160, a user may indicate acceptance of the user interface content by interacting with “Add to Module” action button 92663. User interaction with the action button may result in the computing machine populating a portion of a nascent control module definition in computer storage, and may result in the computing machine placing some representation of the nascent control module definition in a region of a command/control/configuration data store for an SMS, such as illustrated by Module control data items 91830 of FIG. 75L1. Such processing may be performed by processing block 91864 of FIG. 75L2 in an embodiment. Processing responsive to user interaction with “Add to Module” action button 92163 of FIG. 75L7 may cause the display of a different or modified user interface such as depicted in FIG. 75L8. These and other embodiments are possible.


As stated earlier, because the presentation of user interface 92150 is deemed to have occurred because of user interaction with “Services” interactive element 91120 of FIG. 75L6, the list of available data items in control data item area 92160 may only include service control data items. If for example, user interaction with “Data Models” interactive element 92122 causes the display of a user interface such as 92150, the list of available data items and control data item area 92160 of FIG. 75L7 may only include data model control data items (consider, for example data model control data items 91827 FIG. 75L1). Further, when the type, class, or category of control data items represented for selection in control data item table 92166 changes, the presentation of the particular control data items may change as well. For example, the number and designation of columns in the control data item table may change. In one embodiment, the control data item area 92160 may not contain a pre-populated selection list but rather may include a number of empty list entries into which a user can enter identifying information for control data items to be included in the module being created. These and other embodiments are possible.


FIG. 75L8 illustrates an example interface related to the creation of a control module after certain content has been added. One of skill may appreciate the similarity between interface 92200 of FIG. 75L8 and interface 92100 of FIG. 75L6. In one embodiment, SMS module manager functionality that produced user interface 92100 of FIG. 75L6 is the same functionality that produces user interface 92200 of FIG. 75L8, albeit at a different point in time in this illustrative example, after certain content has been added to the subject control module.


Interface 92200 of FIG. 75L8 is shown to include system title bar area 91902, application menu/navigation bar area 91904, and header area 92010, just as for interface 92100 of FIG. 75L6. Interface 92200 of FIG. 75L8 is shown to include a tabbed display area 92020 corresponding to the tabbed display area 92020 of FIG. 75L6. Interface 92200 of FIG. 75L8 is shown to include control data content options area 92110 having the same content as control data content options area 92110 of FIG. 75L6. Notably, the content and appearance of the tabbed display areas 92020 of FIGS. 75L6 and 75L8 differ. Tab control area 92022 of FIG. 75L8 is shown to include “Module Metadata” tab control 92030 as does FIG. 75L6. Tab control area 92022 of FIG. 75L8 further includes “Services” tab control 92210 which appears as the selected or active tab control in interface 92200. Accordingly, tabbed information display area 92024 of FIG. 75L8 is shown to present Services-related content. Tabbed information display area 92024 of FIG. 75L8 is shown to include header area 92220, service items list management area 92230, and a service items list display table including column header row 92240 and service item entry rows 92251-92258. Header area 92220 is shown to include the title “Services”. Service items list management area 92230 is shown to include service items count component 92232, “Remove Selected” action component 92234, and filter component 92236. Service items count component 92232 is shown to include a count of the service-type control data items represented in the list appearing in tabular fashion beneath (“11 Services”) and a count of the number of those control data items presently in the selected state (“(0 selected)”). User interaction with “Remove Selected” action component 92234 enables a user to indicate to the computing machine a desire to remove from the control module content items represented by entries in the list beneath that are in the selected state. The computing machine in response to such user interaction may perform such removal and cause an update to the display of interface 92200 to reflect such removal. Such processing may be variously performed by processing blocks 91860, 91862, 91864 of FIG. 75L2, in one embodiment. Filter component 92236 is shown as a text box displaying the user prompt “Filter.” Filter component 92236 is interactive enabling the user to enter or edit filter criteria for determining the service items appearing in the table beneath.


The service items list display table displays in a tabular format a possibly filtered list of service control data items included in the subject control module. In one embodiment each service control data item is a service definition. Column header row 92240 includes an informational-“i” identifier for column 92242, an checkbox for column 92244 which may be interactive enabling a user to indicate selection or non-selection of all entries in the service items list display table at once, a “Service” identifier for column 92246, and a “Service Description” identifier for column 92248. A representative service item entry row 92251, for example, displays: an interactive token, “>”, in column 92242 enabling a user to navigate to an interface display (not shown) presenting additional information about the service item represented in the entry row; an interactive checkbox in column 92244 enabling a user to toggle the selected state of the service item represented by the list entry in the row; the service identifier “IT Service” in column 92246, and the service description “IT Service” in column 92248.


From earlier discussion, one of skill will appreciate how a user may utilize interactive elements 92122 and/or 92124 to request processing to add additional content to the subject control module represented in interface 92200, and how the computing machine may revise the presentation of user interface 92200 to reflect such additional content, for example, by adding a control tab and corresponding listing for Data Models and/or adding a control tab and corresponding listing for Entity Searches.


In one embodiment, a user may interact, such as by a mouse click or touch screen press, with “Export Module” action button 92016 to indicate to the computing machine a request to engage processing to export the subject control module as displayed, displayable, and revisable using an interface such as interface 92200 of FIG. 75L8. User interaction with the action button in an embodiment may result in the computing machine populating a portion of a nascent control module definition in computer storage, and may result in the computing machine placing some representation of the nascent control module definition in a region of a command/control/configuration data store for an SMS, such as may contain Module control data items 91830 of FIG. 75L1. Such processing may be performed by processing block 91864 of FIG. 75L2 in an embodiment. User interaction with the action button in an embodiment may result in the computing machine engaging the processing of block 91868 of FIG. 75L2 to create a control module package as represented by 91804. User interaction with the action button 92016 of FIG. 75L8 in an embodiment may also indicate an implicit user validation action and engage certain processing of block 91866 of FIG. 75L2. These and other embodiments are possible.


FIG. 75L9 illustrates packaging of a particular control module in one embodiment. The example illustrated in FIG. 75L9 is illustrative and details about structure and content in this example should not be construed as limiting the practice of inventive aspects disclosed herein. The packaging illustrated and discussed in relation to FIG. 75L9 may be useful, for example, in the processing of blocks 91864 and 91868 of FIG. 75L2 where a control module representation may be saved or otherwise stored. Similarly, the packaging illustrated and discussed in relation to FIG. 75L9 may be useful, for example, in the processing of block 91872 of FIG. 75L2 where a control module representation is imported.


FIG. 75L9 illustrates packaging of a control module in one embodiment that organizes module content into a hierarchical arrangement of directories and files such as commonly available in a file system of a computer operating system. The root directory or node “MySmsControlModule” 92280 subsumes the collection of control and configuration data making up the control module. Root directory 92280 is shown to directly include subdirectories “default” 92282 and “data” 92292, and readme.txt file 92298.


Subdirectory “default” 92282 of FIG. 75L9 of the illustrated embodiment may be used to contain information that relates to directing the operations of the service monitoring system (SMS) to monitor a particular IT environment in a particular way. The example files shown for subdirectory 92282 each illustrate a different class of configuration and control information as might be used by an SMS in one embodiment. Example file “smsServiceTemplate.conf” 92284 is illustrative of SMS control data that may specify a service definition or templatized version thereof. Example file “smsKpiTemplate.conf” 92286 is illustrative of SMS control data that may specify a KPI definition or templatized version thereof. Example file “smsKpiBaseSearch.conf” 92288 is illustrative of SMS control data that may specify a KPI base search definition or templatized version thereof. Example file “Other” 92290 it is a broad placeholder to illustrate that a wide variety of types, classes, categories, and the like of SMS control data items may be included in a control module and that they need not all share a common representation format such as may be indicated by a .conf file name extension.


In one embodiment, SMS control data items such as the .conf files shown and discussed for the “default” subdirectory 92282 may be simple text files containing key-value pairs, ordered parameter lists, statements written in a proprietary configuration language, CSV-formatted tabular data, to name but a few possible examples. In one embodiment, the SMS configuration and control information files may be represented in a preprocessed or precompiled format. In one embodiment, configuration and control information such as illustrated and discussed in regards to the contents of subdirectory “default” 92282 may be maintained in a single file. In one embodiment, configuration and control information such as illustrated and discussed in regards to the contents of subdirectory “default” 92282 may be variously distributed among the same or a different set of files/collections than those illustrated and discussed in relation to FIG. 75L9. Accordingly, one of skill again appreciates that FIG. 75L9 is a teaching example that does not limit the embodiments possible that employ inventive aspects disclosed herein.


Subdirectory “data” 92292 of FIG. 75L9 of the illustrated embodiment may be used to contain information that relates to data, possibly ingested machine data, that may be processed by a service monitoring system (SMS) to monitor a particular IT environment in a particular way. One class of such information may be data models which occupy subdirectory “models” 92294 within subdirectory “data” 92292. The example file shown for subdirectory 92294, “myDataModel.json” helps to illustrate that packaged control module data items of an SMS may employ standard, generalized data representation formats, such as JSON (JavaScript Object Notation).


Example file “readme.txt” 92298 may contain user-readable text conveying any desired information about the control module/package to users receiving the control module. In one embodiment, a readme.txt file also automatically includes certain metadata about the control module, such as its title, author, and version number.


The example directories and files subsumed under root node “MySmsControlModule” 92280 may, in their native format, implement a control module package 92272 in one embodiment. In an embodiment, the control module content represented by 92272 may be processed to form package 92274. Package 92274 may represent a form for the control module data that is compacted, compressed, certified, authenticated, encoded, encrypted, secured, more portable, or otherwise altered or processed from its starting form. In an embodiment, packages 92272 and 92274 may both represent control module formats acceptable to a targeted SMS. In an embodiment, packaging formats may be nested to many levels. In an embodiment, packaging formats may not be nested but may exist as alternatives. In an embodiment, a packaging format such as illustrated by 92274 may not be directly usable by a target SMS without pre-processing, such as by decompression or unpacking, possibly by widely known and available utilities. Such utilities may include, for example, tar, gzip, 7-zip, and WinRAR. In an embodiment, a target SMS may enable the direct import or use of control module packages in native, compressed, archived, and other formats.


In an embodiment where control module content may be usefully organized as the hierarchical collections/containers paradigm of one or more files within one or more filesystem directories, advantage may be taken of known and available filesystem archiving formats, utilities, and tools to create control module packages. Known archive formats/tools ar, cpio, shar, tar, LBR, BagIt, and WAD, for example, may be utilized to create control module packages in an embodiment where compression of the control module content is not desired. Known archive formats/tools 7z, ACE, ARC, AU, B1, Cabinet, cfs, cpt, DGCA, .dmg, .egg, kgb, LHA, LZX, MPQ, PEA, qda, RAR, rzip, sit, SQX, UDA, UHARC, Xar, zoo, ZIP, and ZPAQ, for example, may be utilized to create control module packages in an embodiment where compression of the control module content is desired. In an embodiment where control module content is paradigmatically represented in a single file, perhaps an XML file, known compression formats/tools bzip2, gzip, lzip, LZMA, lzop, xz, SQ, and compress, for example, may be utilized to create compressed control module packages without archiving aspects (e.g., file concatenations and/or directory representations). In an embodiment, known archive formats/tools and known compression format/tools may be combined to produce a control module package including compression and archiving aspects. A package in the known .tar.gz format, sometimes referred to as a “tarball,” may be viewed as one such example, where an archive created in .tar format is compressed using gzip. An embodiment may additionally or alternatively rely on custom, private, or proprietary control module package formats, utilities, tools, and functions. Such control module packaging may or may not utilize compression or archival aspects (e.g., unification of multiple parts, portions, or components into a single container or construct (e.g., a file); representation of relationships among multiple components in a container or construct (e.g., directory structure)) for some or all of the total control module package content.


One of skill appreciates that the packaging shown and discussed for FIG. 75L9 represent illustrative examples to aid an understanding of inventive aspects. While this illustration has been made in terms of a hierarchical arrangement of the data/containers, and often in terms of a hierarchical arrangement of file folders/directories and files, the practice of inventive aspects disclosed herein is not so limited. control module data and/or containers may, in one embodiment, be represented as a hierarchical tree construct in eXtensible Markup Language (XML). In an embodiment, control module data and/or containers may not use a hierarchical organization. These and other variations and alternatives are possible without departing from the inventive aspects taught herein.


1.1 Overview


Modern data centers often comprise thousands of host computer systems that operate collectively to service requests from even larger numbers of remote clients. During operation, these data centers generate significant volumes of performance data and diagnostic information that can be analyzed to quickly diagnose performance problems. In order to reduce the size of this performance data, the data is typically pre-processed prior to being stored based on anticipated data-analysis needs. For example, pre-specified data items can be extracted from the performance data and stored in a database to facilitate efficient retrieval and analysis at search time. However, the rest of the performance data is not saved and is essentially discarded during pre-processing. As storage capacity becomes progressively cheaper and more plentiful, there are fewer incentives to discard this performance data and many reasons to keep it.


This plentiful storage capacity is presently making it feasible to store massive quantities of minimally processed performance data at “ingestion time” for later retrieval and analysis at “search time.” Note that performing the analysis operations at search time provides greater flexibility because it enables an analyst to search all of the performance data, instead of searching pre-specified data items that were stored at ingestion time. This enables the analyst to investigate different implementations of the performance data instead of being confined to the pre-specified set of data items that were selected at ingestion time.


However, analyzing massive quantities of heterogeneous performance data at search time can be a challenging task. A data center may generate heterogeneous performance data from thousands of different components, which can collectively generate tremendous volumes of performance data that can be time-consuming to analyze. For example, this performance data can include data from system logs, network packet data, sensor data, and data generated by various applications. Also, the unstructured nature of much of this performance data can pose additional challenges because of the difficulty of applying semantic meaning to unstructured data, and the difficulty of indexing and querying unstructured data using traditional database systems.


These challenges can be addressed by using an event-based system, such as the SPLUNK® ENTERPRISE system produced by Splunk Inc. of San Francisco, Calif., to store and process performance data. The SPLUNK® ENTERPRISE system is the leading platform for providing real-time operational intelligence that enables organizations to collect, index, and harness machine-generated data from various web sites, applications, servers, networks, and mobile devices that power their businesses. The SPLUNK® ENTERPRISE system is particularly useful for analyzing unstructured performance data, which is commonly found in system log files. Although many of the techniques described herein are explained with reference to the SPLUNK® ENTERPRISE system, the techniques are also applicable to other types of data server systems.


In the SPLUNK® ENTERPRISE system, performance data is stored as “events,” wherein each event comprises a collection of performance data and/or diagnostic information that is generated by a computer system and is correlated with a specific point in time. Events can be derived from “time series data,” wherein time series data comprises a sequence of data points (e.g., performance measurements from a computer system) that are associated with successive points in time and are typically spaced at uniform time intervals. Events can also be derived from “structured” or “unstructured” data. Structured data has a predefined format, wherein specific data items with specific data formats reside at predefined locations in the data. For example, structured data can include data items stored in fields in a database table. In contrast, unstructured data does not have a predefined format. This means that unstructured data can comprise various data items having different data types that can reside at different locations. For example, when the data source is an operating system log, an event can include one or more lines from the operating system log containing raw data that includes different types of performance and diagnostic information associated with a specific point in time. Examples of data sources from which an event may be derived include, but are not limited to: web servers; application servers; databases; firewalls; routers; operating systems; and software applications that execute on computer systems, mobile devices, and sensors. The data generated by such data sources can be produced in various forms including, for example and without limitation, server log files, activity log files, configuration files, messages, network packet data, performance measurements and sensor measurements. An event typically includes a timestamp that may be derived from the raw data in the event, or may be determined through interpolation between temporally proximate events having known timestamps.


The SPLUNK® ENTERPRISE system also facilitates using a flexible schema to specify how to extract information from the event data, wherein the flexible schema may be developed and redefined as needed. Note that a flexible schema may be applied to event data “on the fly,” when it is needed (e.g., at search time), rather than at ingestion time of the data as in traditional database systems. Because the schema is not applied to event data until it is needed (e.g., at search time), it is referred to as a “late-binding schema.”


During operation, the SPLUNK® ENTERPRISE system starts with raw data, which can include unstructured data, machine data, performance measurements or other time-series data, such as data obtained from weblogs, syslogs, or sensor readings. It divides this raw data into “portions,” and optionally transforms the data to produce timestamped events. The system stores the timestamped events in a data store, and enables a user to run queries against the data store to retrieve events that meet specified criteria, such as containing certain keywords or having specific values in defined fields. Note that the term “field” refers to a location in the event data containing a value for a specific data item.


As noted above, the SPLUNK® ENTERPRISE system facilitates using a late-binding schema while performing queries on events. A late-binding schema specifies “extraction rules” that are applied to data in the events to extract values for specific fields. More specifically, the extraction rules for a field can include one or more instructions that specify how to extract a value for the field from the event data. An extraction rule can generally include any type of instruction for extracting values from data in events. In some cases, an extraction rule comprises a regular expression, in which case the rule is referred to as a “regex rule.”


In contrast to a conventional schema for a database system, a late-binding schema is not defined at data ingestion time. Instead, the late-binding schema can be developed on an ongoing basis until the time a query is actually executed. This means that extraction rules for the fields in a query may be provided in the query itself, or may be located during execution of the query. Hence, as an analyst learns more about the data in the events, the analyst can continue to refine the late-binding schema by adding new fields, deleting fields, or changing the field extraction rules until the next time the schema is used by a query. Because the SPLUNK® ENTERPRISE system maintains the underlying raw data and provides a late-binding schema for searching the raw data, it enables an analyst to investigate questions that arise as the analyst learns more about the events.


In the SPLUNK® ENTERPRISE system, a field extractor may be configured to automatically generate extraction rules for certain fields in the events when the events are being created, indexed, or stored, or possibly at a later time. Alternatively, a user may manually define extraction rules for fields using a variety of techniques.


Also, a number of “default fields” that specify metadata about the events rather than data in the events themselves can be created automatically. For example, such default fields can specify: a timestamp for the event data; a host from which the event data originated; a source of the event data; and a source type for the event data. These default fields may be determined automatically when the events are created, indexed or stored.


In some embodiments, a common field name may be used to reference two or more fields containing equivalent data items, even though the fields may be associated with different types of events that possibly have different data formats and different extraction rules. By enabling a common field name to be used to identify equivalent fields from different types of events generated by different data sources, the system facilitates use of a “common information model” (CIM) across the different data sources.


1.2 Data Server System



FIG. 76 presents a block diagram of an exemplary event-processing system 7100, similar to the SPLUNK® ENTERPRISE system. System 7100 includes one or more forwarders 7101 that collect data obtained from a variety of different data sources 7105, and one or more indexers 7102 that store, process, and/or perform operations on this data, wherein each indexer operates on data contained in a specific data store 7103. These forwarders and indexers can comprise separate computer systems in a data center, or may alternatively comprise separate processes executing on various computer systems in a data center.


During operation, the forwarders 7101 identify which indexers 7102 will receive the collected data and then forward the data to the identified indexers. Forwarders 7101 can also perform operations to strip out extraneous data and detect timestamps in the data. The forwarders next determine which indexers 7102 will receive each data item and then forward the data items to the determined indexers 7102.


Note that distributing data across different indexers facilitates parallel processing. This parallel processing can take place at data ingestion time, because multiple indexers can process the incoming data in parallel. The parallel processing can also take place at search time, because multiple indexers can search through the data in parallel.


System 7100 and the processes described below with respect to FIGS. 71-5 are further described in “Exploring Splunk Search Processing Language (SPL) Primer and Cookbook” by David Carasso, CITO Research, 2012, and in “Optimizing Data Analysis With a Semi-Structured Time Series Database” by Ledion Bitincka, Archana Ganapathi, Stephen Sorkin, and Steve Zhang, SLAML, 2010, each of which is hereby incorporated herein by reference in its entirety for all purposes.


1.3 Data Ingestion



FIG. 77 presents a flowchart illustrating how an indexer processes, indexes, and stores data received from forwarders in accordance with the disclosed embodiments. At block 7201, the indexer receives the data from the forwarder. Next, at block 7202, the indexer apportions the data into events. Note that the data can include lines of text that are separated by carriage returns or line breaks and an event may include one or more of these lines. During the apportioning process, the indexer can use heuristic rules to automatically determine the boundaries of the events, which for example coincide with line boundaries. These heuristic rules may be determined based on the source of the data, wherein the indexer can be explicitly informed about the source of the data or can infer the source of the data by examining the data. These heuristic rules can include regular expression-based rules or delimiter-based rules for determining event boundaries, wherein the event boundaries may be indicated by predefined characters or character strings. These predefined characters may include punctuation marks or other special characters including, for example, carriage returns, tabs, spaces or line breaks. In some cases, a user can fine-tune or configure the rules that the indexers use to determine event boundaries in order to adapt the rules to the user's specific requirements.


Next, the indexer determines a timestamp for each event at block 7203. As mentioned above, these timestamps can be determined by extracting the time directly from data in the event, or by interpolating the time based on timestamps from temporally proximate events. In some cases, a timestamp can be determined based on the time the data was received or generated. The indexer subsequently associates the determined timestamp with each event at block 7204, for example by storing the timestamp as metadata for each event.


Then, the system can apply transformations to data to be included in events at block 7205. For log data, such transformations can include removing a portion of an event (e.g., a portion used to define event boundaries, extraneous text, characters, etc.) or removing redundant portions of an event. Note that a user can specify portions to be removed using a regular expression or any other possible technique.


Next, a keyword index can optionally be generated to facilitate fast keyword searching for events. To build a keyword index, the indexer first identifies a set of keywords in block 7206. Then, at block 7207 the indexer includes the identified keywords in an index, which associates each stored keyword with references to events containing that keyword (or to locations within events where that keyword is located). When an indexer subsequently receives a keyword-based query, the indexer can access the keyword index to quickly identify events containing the keyword.


In some embodiments, the keyword index may include entries for name-value pairs found in events, wherein a name-value pair can include a pair of keywords connected by a symbol, such as an equals sign or colon. In this way, events containing these name-value pairs can be quickly located. In some embodiments, fields can automatically be generated for some or all of the name-value pairs at the time of indexing. For example, if the string “dest=10.0.1.2” is found in an event, a field named “dest” may be created for the event, and assigned a value of “10.0.1.2.”


Finally, the indexer stores the events in a data store at block 7208, wherein a timestamp can be stored with each event to facilitate searching for events based on a time range. In some cases, the stored events are organized into a plurality of buckets, wherein each bucket stores events associated with a specific time range. This not only improves time-based searches, but it also allows events with recent timestamps that may have a higher likelihood of being accessed to be stored in faster memory to facilitate faster retrieval. For example, a bucket containing the most recent events can be stored as flash memory instead of on hard disk.


Each indexer 7102 is responsible for storing and searching a subset of the events contained in a corresponding data store 7103. By distributing events among the indexers and data stores, the indexers can analyze events for a query in parallel, for example using map-reduce techniques, wherein each indexer returns partial responses for a subset of events to a search head that combines the results to produce an answer for the query. By storing events in buckets for specific time ranges, an indexer may further optimize searching by looking only in buckets for time ranges that are relevant to a query.


Moreover, events and buckets can also be replicated across different indexers and data stores to facilitate high availability and disaster recovery as is described in U.S. patent application Ser. No. 14/266,812 filed on 30 Apr. 2014, and in U.S. patent application Ser. No. 14/266,817 also filed on 30 Apr. 2014.


1.4 Query Processing



FIG. 78 presents a flowchart illustrating how a search head and indexers perform a search query in accordance with the disclosed embodiments. At the start of this process, a search head receives a search query from a client at block 7301. Next, at block 7302, the search head analyzes the search query to determine what portions can be delegated to indexers and what portions need to be executed locally by the search head. At block 7303, the search head distributes the determined portions of the query to the indexers. Note that commands that operate on single events can be trivially delegated to the indexers, while commands that involve events from multiple indexers are harder to delegate.


Then, at block 7304, the indexers to which the query was distributed search their data stores for events that are responsive to the query. To determine which events are responsive to the query, the indexer searches for events that match the criteria specified in the query. This criteria can include matching keywords or specific values for certain fields. In a query that uses a late-binding schema, the searching operations in block 7304 may involve using the late-binding scheme to extract values for specified fields from events at the time the query is processed. Next, the indexers can either send the relevant events back to the search head, or use the events to calculate a partial result, and send the partial result back to the search head.


Finally, at block 7305, the search head combines the partial results and/or events received from the indexers to produce a final result for the query. This final result can comprise different types of data depending upon what the query is asking for. For example, the final results can include a listing of matching events returned by the query, or some type of visualization of data from the returned events. In another example, the final result can include one or more calculated values derived from the matching events.


Moreover, the results generated by system 7100 can be returned to a client using different techniques. For example, one technique streams results back to a client in real-time as they are identified. Another technique waits to report results to the client until a complete set of results is ready to return to the client. Yet another technique streams interim results back to the client in real-time until a complete set of results is ready, and then returns the complete set of results to the client. In another technique, certain results are stored as “search jobs,” and the client may subsequently retrieve the results by referencing the search jobs.


The search head can also perform various operations to make the search more efficient. For example, before the search head starts executing a query, the search head can determine a time range for the query and a set of common keywords that all matching events must include. Next, the search head can use these parameters to query the indexers to obtain a superset of the eventual results. Then, during a filtering stage, the search head can perform field-extraction operations on the superset to produce a reduced set of search results.


1.5 Field Extraction



FIG. 79A presents a block diagram illustrating how fields can be extracted during query processing in accordance with the disclosed embodiments. At the start of this process, a search query 7402 is received at a query processor 7404. Query processor 7404 includes various mechanisms for processing a query, wherein these mechanisms can reside in a search head 7104 and/or an indexer 7102. Note that the exemplary search query 7402 illustrated in FIG. 79A is expressed in Search Processing Language (SPL), which is used in conjunction with the SPLUNK® ENTERPRISE system. SPL is a pipelined search language in which a set of inputs is operated on by a first command in a command line, and then a subsequent command following the pipe symbol “|” operates on the results produced by the first command, and so on for additional commands. Search query 7402 can also be expressed in other query languages, such as the Structured Query Language (“SQL”) or any suitable query language.


Upon receiving search query 7402, query processor 7404 sees that search query 7402 includes two fields “IP” and “target.” Query processor 7404 also determines that the values for the “IP” and “target” fields have not already been extracted from events in data store 7414, and consequently determines that query processor 7404 needs to use extraction rules to extract values for the fields. Hence, query processor 7404 performs a lookup for the extraction rules in a rule base 7406, wherein rule base 7406 maps field names to corresponding extraction rules and obtains extraction rules 7408-7409, wherein extraction rule 7408 specifies how to extract a value for the “IP” field from an event, and extraction rule 7409 specifies how to extract a value for the “target” field from an event. As is illustrated in FIG. 79A, extraction rules 7408-7409 can comprise regular expressions that specify how to extract values for the relevant fields. Such regular-expression-based extraction rules are also referred to as “regex rules.” In addition to specifying how to extract field values, the extraction rules may also include instructions for deriving a field value by performing a function on a character string or value retrieved by the extraction rule. For example, a transformation rule may truncate a character string, or convert the character string into a different data format. In some cases, the query itself can specify one or more extraction rules.


Next, query processor 7404 sends extraction rules 7408-7409 to a field extractor 7412, which applies extraction rules 7408-7409 to events 7416-7418 in a data store 7414. Note that data store 7414 can include one or more data stores, and extraction rules 7408-7409 can be applied to large numbers of events in data store 7414, and are not meant to be limited to the three events 7416-7418 illustrated in FIG. 79A. Moreover, the query processor 7404 can instruct field extractor 7412 to apply the extraction rules to all the events in a data store 7414, or to a subset of the events that have been filtered based on some criteria.


Next, field extractor 7412 applies extraction rule 7408 for the first command “Search IP=“10*” to events in data store 7414 including events 7416-7418. Extraction rule 7408 is used to extract values for the IP address field from events in data store 7414 by looking for a pattern of one or more digits, followed by a period, followed again by one or more digits, followed by another period, followed again by one or more digits, followed by another period, and followed again by one or more digits. Next, field extractor 7412 returns field values 7420 to query processor 7404, which uses the criterion IP=“10*” to look for IP addresses that start with “10”. Note that events 7416 and 7417 match this criterion, but event 7418 does not, so the result set for the first command is events 7416-7417.


Query processor 7404 then sends events 7416-717 to the next command “stats count target.” To process this command, query processor 7404 causes field extractor 7412 to apply extraction rule 7409 to events 7416-7417. Extraction rule 7409 is used to extract values for the target field for events 7416-7417 by skipping the first four commas in events 7416-7417, and then extracting all of the following characters until a comma or period is reached. Next, field extractor 7412 returns field values 7421 to query processor 7404, which executes the command “stats count target” to count the number of unique values contained in the target fields, which in this example produces the value “2” that is returned as a final result 7422 for the query.


Note that query results can be returned to a client, a search head, or any other system component for further processing. In general, query results may include: a set of one or more events; a set of one or more values obtained from the events; a subset of the values; statistics calculated based on the values; a report containing the values; or a visualization, such as a graph or chart, generated from the values.


1.5.1 Data Models


Creating queries requires knowledge of the fields that are included in the events being searched, as well as knowledge of the query processing language used for the queries. While a data analyst may possess domain understanding of underlying data and knowledge of the query processing language, an end user responsible for creating reports at a company (e.g., a marketing specialist) may not have such expertise. In order to assist end users, implementations of the event-processing system described herein provide data models that simplify the creation of reports and other visualizations.


A data model encapsulates semantic knowledge about certain events. A data model can be composed of one or more objects grouped in a hierarchical manner. In general, the objects included in a data model may be related to each other in some way. In particular, a data model can include a root object and, optionally, one or more child objects that can be linked (either directly or indirectly) to the root object. A root object can be defined by search criteria for a query to produce a certain set of events, and a set of fields that can be exposed to operate on those events. A root object can be a parent of one or more child objects, and any of those child objects can optionally be a parent of one or more additional child objects. Each child object can inherit the search criteria of its parent object and have additional search criteria to further filter out events represented by its parent object. Each child object may also include at least some of the fields of its parent object and optionally additional fields specific to the child object.



FIG. 79B illustrates an example data model structure 7428, in accordance with some implementations. As shown, example data model “Buttercup Games” 7430 includes root object “Purchase Requests” 7432, and child objects “Successful Purchases” 7434 and “Unsuccessful Purchases” 7436.



FIG. 79C illustrates an example definition 7440 of root object 7432 of data model 7430, in accordance with some implementations. As shown, definition 7440 of root object 7432 includes search criteria 7442 and a set of fields 7444. Search criteria 7442 require that a search query produce web access requests that qualify as purchase events. Fields 7444 include inherited fields 7446 which are default fields that specify metadata about the events of the root object 7432. In addition, fields 7444 include extracted fields 7448, whose values can be automatically extracted from the events during search using extraction rules of the late binding schema, and calculated fields 7450, whose values can be automatically determined based on values of other fields extracted from the events. For example, the value of the productName field can be determined based on the value in the productID field (e.g., by searching a lookup table for a product name matching the value of the productID field). In another example, the value of the price field can be calculated based on values of other fields (e.g., by multiplying the price per unit by the number of units).



FIG. 79D illustrates example definitions 7458 and 7460 of child objects 7434 and 7436 respectively, in accordance with some implementations. Definition 7458 of child object 7434 includes search criteria 7462 and a set of fields 7464. Search criteria 7462 inherits search criteria 7442 of the parent object 7432 and includes an additional criterion of “status=200,” which indicates that the search query should produce web access requests that qualify as successful purchase events. Fields 7464 consist of the fields inherited from the parent object 7432.


Definition 7460 of child object 7436 includes search criteria 7470 and a set of fields 7474. Search criteria 7470 inherits search criteria 7442 of the parent object 7432 and includes an additional criterion of “status!=200,” which indicates that the search query should produce web access requests that qualify as unsuccessful purchase events. Fields 7474 consist of the fields inherited from the parent object 7432. As shown, child objects 7434 and 7436 include all the fields inherited from the parent object 7432. In other implementations, child objects may only include some of the fields of the parent object and/or may include additional fields that are not exposed by the parent object.


When creating a report, a user can select an object of a data model to focus on the events represented by the selected object. The user can then view the fields of the data object and request the event-processing system to structure the report based on those fields. For example, the user can request the event-processing system to add some fields to the report, to add calculations based on some fields to the report, to group data in the report based on some fields, etc. The user can also input additional constraints (e.g., specific values and/or mathematical expressions) for some of the fields to further filter out events on which the report should be focused.


1.6 Exemplary Search Screen



FIG. 81A illustrates an exemplary search screen 7600 in accordance with the disclosed embodiments. Search screen 7600 includes a search bar 7602 that accepts user input in the form of a search string. It also includes a time range picker 7612 that enables the user to specify a time range for the search. For “historical searches” the user can select a specific time range, or alternatively a relative time range, such as “today,” “yesterday” or “last week.” For “real-time searches,” the user can select the size of a preceding time window to search for real-time events. Search screen 7600 also initially displays a “data summary” dialog as is illustrated in FIG. 81B that enables the user to select different sources for the event data, for example by selecting specific hosts and log files.


After the search is executed, the search screen 7600 can display the results through search results tabs 7604, wherein search results tabs 7604 includes: an “events tab” that displays various information about events returned by the search; a “statistics tab” that displays statistics about the search results; and a “visualization tab” that displays various visualizations of the search results. The events tab illustrated in FIG. 81A displays a timeline graph 7605 that graphically illustrates the number of events that occurred in one-hour intervals over the selected time range. It also displays an events list 7608 that enables a user to view the raw data in each of the returned events. It additionally displays a fields sidebar 7606 that includes statistics about occurrences of specific fields in the returned events, including “selected fields” that are pre-selected by the user, and “interesting fields” that are automatically selected by the system based on pre-specified criteria.


1.7 Acceleration Techniques


The above-described system provides significant flexibility by enabling a user to analyze massive quantities of minimally processed performance data “on the fly” at search time instead of storing pre-specified portions of the performance data in a database at ingestion time. This flexibility enables a user to see correlations in the performance data and perform subsequent queries to examine interesting implementations of the performance data that may not have been apparent at ingestion time.


However, performing extraction and analysis operations at search time can involve a large amount of data and require a large number of computational operations, which can cause considerable delays while processing the queries. Fortunately, a number of acceleration techniques have been developed to speed up analysis operations performed at search time. These techniques include: (1) performing search operations in parallel by formulating a search as a map-reduce computation; (2) using a keyword index; (3) using a high performance analytics store; and (4) accelerating the process of generating reports. These techniques are described in more detail below.


1.7.1 Map-Reduce Technique


To facilitate faster query processing, a query can be structured as a map-reduce computation, wherein the “map” operations are delegated to the indexers, while the corresponding “reduce” operations are performed locally at the search head. For example, FIG. 80 illustrates how a search query 7501 received from a client at search head 7104 can split into two phases, including: (1) a “map phase” comprising subtasks 7502 (e.g., data retrieval or simple filtering) that may be performed in parallel and are “mapped” to indexers 7102 for execution, and (2) a “reduce phase” comprising a merging operation 7503 to be executed by the search head when the results are ultimately collected from the indexers.


During operation, upon receiving search query 7501, search head 7104 modifies search query 7501 by substituting “stats” with “prestats” to produce search query 7502, and then distributes search query 7502 to one or more distributed indexers, which are also referred to as “search peers.” Note that search queries may generally specify search criteria or operations to be performed on events that meet the search criteria. Search queries may also specify field names, as well as search criteria for the values in the fields or operations to be performed on the values in the fields. Moreover, the search head may distribute the full search query to the search peers as is illustrated in FIG. 78, or may alternatively distribute a modified version (e.g., a more restricted version) of the search query to the search peers. In this example, the indexers are responsible for producing the results and sending them to the search head. After the indexers return the results to the search head, the search head performs the merging operations 7503 on the results. Note that by executing the computation in this way, the system effectively distributes the computational operations while minimizing data transfers.


1.7.2 Keyword Index


As described above with reference to the flow charts in FIGS. 77 and 78, event-processing system 7100 can construct and maintain one or more keyword indices to facilitate rapidly identifying events containing specific keywords. This can greatly speed up the processing of queries involving specific keywords. As mentioned above, to build a keyword index, an indexer first identifies a set of keywords. Then, the indexer includes the identified keywords in an index, which associates each stored keyword with references to events containing that keyword, or to locations within events where that keyword is located. When an indexer subsequently receives a keyword-based query, the indexer can access the keyword index to quickly identify events containing the keyword.


1.7.3 High Performance Analytics Store


To speed up certain types of queries, some embodiments of system 7100 make use of a high performance analytics store, which is referred to as a “summarization table,” that contains entries for specific field-value pairs. Each of these entries keeps track of instances of a specific value in a specific field in the event data and includes references to events containing the specific value in the specific field. For example, an exemplary entry in a summarization table can keep track of occurrences of the value “94107” in a “ZIP code” field of a set of events, wherein the entry includes references to all of the events that contain the value “94107” in the ZIP code field. This enables the system to quickly process queries that seek to determine how many events have a particular value for a particular field, because the system can examine the entry in the summarization table to count instances of the specific value in the field without having to go through the individual events or do extractions at search time. Also, if the system needs to process all events that have a specific field-value combination, the system can use the references in the summarization table entry to directly access the events to extract further information without having to search all of the events to find the specific field-value combination at search time.


In some embodiments, the system maintains a separate summarization table for each of the above-described time-specific buckets that stores events for a specific time range, wherein a bucket-specific summarization table includes entries for specific field-value combinations that occur in events in the specific bucket. Alternatively, the system can maintain a separate summarization table for each indexer, wherein the indexer-specific summarization table only includes entries for the events in a data store that is managed by the specific indexer.


The summarization table can be populated by running a “collection query” that scans a set of events to find instances of a specific field-value combination, or alternatively instances of all field-value combinations for a specific field. A collection query can be initiated by a user, or can be scheduled to occur automatically at specific time intervals. A collection query can also be automatically launched in response to a query that asks for a specific field-value combination.


In some cases, the summarization tables may not cover all of the events that are relevant to a query. In this case, the system can use the summarization tables to obtain partial results for the events that are covered by summarization tables, but may also have to search through other events that are not covered by the summarization tables to produce additional results. These additional results can then be combined with the partial results to produce a final set of results for the query. This summarization table and associated techniques are described in more detail in U.S. Pat. No. 8,682,925, issued on Mar. 25, 2014.


1.7.4 Accelerating Report Generation


In some embodiments, a data server system such as the SPLUNK® ENTERPRISE system can accelerate the process of periodically generating updated reports based on query results. To accelerate this process, a summarization engine automatically examines the query to determine whether generation of updated reports can be accelerated by creating intermediate summaries. (This is possible if results from preceding time periods can be computed separately and combined to generate an updated report. In some cases, it is not possible to combine such incremental results, for example where a value in the report depends on relationships between events from different time periods.) If reports can be accelerated, the summarization engine periodically generates a summary covering data obtained during a latest non-overlapping time period. For example, where the query seeks events meeting a specified criteria, a summary for the time period includes only events within the time period that meet the specified criteria. Similarly, if the query seeks statistics calculated from the events, such as the number of events that match the specified criteria, then the summary for the time period includes the number of events in the period that match the specified criteria.


In parallel with the creation of the summaries, the summarization engine schedules the periodic updating of the report associated with the query. During each scheduled report update, the query engine determines whether intermediate summaries have been generated covering portions of the time period covered by the report update. If so, then the report is generated based on the information contained in the summaries. Also, if additional event data has been received and has not yet been summarized, and is required to generate the complete report, the query can be run on this additional event data. Then, the results returned by this query on the additional event data, along with the partial results obtained from the intermediate summaries, can be combined to generate the updated report. This process is repeated each time the report is updated. Alternatively, if the system stores events in buckets covering specific time ranges, then the summaries can be generated on a bucket-by-bucket basis. Note that producing intermediate summaries can save the work involved in re-running the query for previous time periods, so only the newer event data needs to be processed while generating an updated report. These report acceleration techniques are described in more detail in U.S. Pat. No. 8,589,403, issued on Nov. 19, 2013, and U.S. Pat. No. 8,412,696, issued on Apr. 2, 2011.


1.8 Security Features


The SPLUNK® ENTERPRISE platform provides various schemas, dashboards and visualizations that make it easy for developers to create applications to provide additional capabilities. One such application is the SPLUNK® APP FOR ENTERPRISE SECURITY, which performs monitoring and alerting operations and includes analytics to facilitate identifying both known and unknown security threats based on large volumes of data stored by the SPLUNK® ENTERPRISE system. This differs significantly from conventional Security Information and Event Management (SIEM) systems that lack the infrastructure to effectively store and analyze large volumes of security-related event data. Traditional SIEM systems typically use fixed schemas to extract data from pre-defined security-related fields at data ingestion time, wherein the extracted data is typically stored in a relational database. This data extraction process (and associated reduction in data size) that occurs at data ingestion time inevitably hampers future incident investigations, when all of the original data may be needed to determine the root cause of a security issue, or to detect the tiny fingerprints of an impending security threat.


In contrast, the SPLUNK® APP FOR ENTERPRISE SECURITY system stores large volumes of minimally processed security-related data at ingestion time for later retrieval and analysis at search time when a live security threat is being investigated. To facilitate this data retrieval process, the SPLUNK® APP FOR ENTERPRISE SECURITY provides pre-specified schemas for extracting relevant values from the different types of security-related event data, and also enables a user to define such schemas.


The SPLUNK® APP FOR ENTERPRISE SECURITY can process many types of security-related information. In general, this security-related information can include any information that can be used to identify security threats. For example, the security-related information can include network-related information, such as IP addresses, domain names, asset identifiers, network traffic volume, uniform resource locator strings, and source addresses. (The process of detecting security threats for network-related information is further described in U.S. patent application Ser. Nos. 13/956,252, and 13/956,262.) Security-related information can also include endpoint information, such as malware infection data and system configuration information, as well as access control information, such as login/logout information and access failure notifications. The security-related information can originate from various sources within a data center, such as hosts, virtual machines, storage devices and sensors. The security-related information can also originate from various sources in a network, such as routers, switches, email servers, proxy servers, gateways, firewalls and intrusion-detection systems.


During operation, the SPLUNK® APP FOR ENTERPRISE SECURITY facilitates detecting so-called “notable events” that are likely to indicate a security threat. These notable events can be detected in a number of ways: (1) an analyst can notice a correlation in the data and can manually identify a corresponding group of one or more events as “notable;” or (2) an analyst can define a “correlation search” specifying criteria for a notable event, and every time one or more events satisfy the criteria, the application can indicate that the one or more events are notable. An analyst can alternatively select a pre-defined correlation search provided by the application. Note that correlation searches can be run continuously or at regular intervals (e.g., every hour) to search for notable events. Upon detection, notable events can be stored in a dedicated “notable events index,” which can be subsequently accessed to generate various visualizations containing security-related information. Also, alerts can be generated to notify system operators when important notable events are discovered.


The SPLUNK® APP FOR ENTERPRISE SECURITY provides various visualizations to aid in discovering security threats, such as a “key indicators view” that enables a user to view security metrics of interest, such as counts of different types of notable events. For example, FIG. 82A illustrates an exemplary key indicators view 7700 that comprises a dashboard, which can display a value 7701, for various security-related metrics, such as malware infections 7702. It can also display a change in a metric value 7703, which indicates that the number of malware infections increased by 63 during the preceding interval. Key indicators view 7700 additionally displays a histogram panel 7704 that displays a histogram of notable events organized by urgency values, and a histogram of notable events organized by time intervals. This key indicators view is described in further detail in pending U.S. patent application Ser. No. 13/956,338 filed Jul. 31, 2013.


These visualizations can also include an “incident review dashboard” that enables a user to view and act on “notable events.” These notable events can include: (1) a single event of high importance, such as any activity from a known web attacker; or (2) multiple events that collectively warrant review, such as a large number of authentication failures on a host followed by a successful authentication. For example, FIG. 82B illustrates an exemplary incident review dashboard 7710 that includes a set of incident attribute fields 7711 that, for example, enables a user to specify a time range field 7712 for the displayed events. It also includes a timeline 7713 that graphically illustrates the number of incidents that occurred in one-hour time intervals over the selected time range. It additionally displays an events list 7714 that enables a user to view a list of all of the notable events that match the criteria in the incident attributes fields 7711. To facilitate identifying patterns among the notable events, each notable event can be associated with an urgency value (e.g., low, medium, high, critical), which is indicated in the incident review dashboard. The urgency value for a detected event can be determined based on the severity of the event and the priority of the system component associated with the event. The incident review dashboard is described further in “http://docs.splunk.com/Documentation/PCI/2.1.1/User/IncidentReviewdashboard.”


1.9 Data Center Monitoring


As mentioned above, the SPLUNK® ENTERPRISE platform provides various features that make it easy for developers to create various applications. One such application is the SPLUNK® APP FOR VMWARE®, which performs monitoring operations and includes analytics to facilitate diagnosing the root cause of performance problems in a data center based on large volumes of data stored by the SPLUNK® ENTERPRISE system.


This differs from conventional data-center-monitoring systems that lack the infrastructure to effectively store and analyze large volumes of performance information and log data obtained from the data center. In conventional data-center-monitoring systems, this performance data is typically pre-processed prior to being stored, for example by extracting pre-specified data items from the performance data and storing them in a database to facilitate subsequent retrieval and analysis at search time. However, the rest of the performance data is not saved and is essentially discarded during pre-processing. In contrast, the SPLUNK® APP FOR VMWARE® stores large volumes of minimally processed performance information and log data at ingestion time for later retrieval and analysis at search time when a live performance issue is being investigated.


The SPLUNK® APP FOR VMWARE® can process many types of performance-related information. In general, this performance-related information can include any type of performance-related data and log data produced by virtual machines and host computer systems in a data center. In addition to data obtained from various log files, this performance-related information can include values for performance metrics obtained through an application programming interface (API) provided as part of the vSphere Hypervisor™ system distributed by VMware, Inc. of Palo Alto, Calif. For example, these performance metrics can include: (1) CPU-related performance metrics; (2) disk-related performance metrics; (3) memory-related performance metrics; (4) network-related performance metrics; (5) energy-usage statistics; (6) data-traffic-related performance metrics; (7) overall system availability performance metrics; (8) cluster-related performance metrics; and (9) virtual machine performance statistics. For more details about such performance metrics, please see U.S. patent Ser. No. 14/167,316 filed 29 Jan. 2014, which is hereby incorporated herein by reference. Also, see “vSphere Monitoring and Performance,” Update 1, vSphere 5.5, EN-001357-00, http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-551-monitoring-performance-guide.pdf.


To facilitate retrieving information of interest from performance data and log files, the SPLUNK® APP FOR VMWARE® provides pre-specified schemas for extracting relevant values from different types of performance-related event data, and also enables a user to define such schemas.


The SPLUNK® APP FOR VMWARE® additionally provides various visualizations to facilitate detecting and diagnosing the root cause of performance problems. For example, one such visualization is a “proactive monitoring tree” that enables a user to easily view and understand relationships among various factors that affect the performance of a hierarchically structured computing system. This proactive monitoring tree enables a user to easily navigate the hierarchy by selectively expanding nodes representing various entities (e.g., virtual centers or computing clusters) to view performance information for lower-level nodes associated with lower-level entities (e.g., virtual machines or host systems). Exemplary node-expansion operations are illustrated in FIG. 82C, wherein nodes 7733 and 7734 are selectively expanded. Note that nodes 7731-7739 can be displayed using different patterns or colors to represent different performance states, such as a critical state, a warning state, a normal state or an unknown/offline state. The ease of navigation provided by selective expansion in combination with the associated performance-state information enables a user to quickly diagnose the root cause of a performance problem. The proactive monitoring tree is described in further detail in U.S. patent application Ser. No. 14/235,490 filed on 15 Apr. 2014, which is hereby incorporated herein by reference for all possible purposes.


The SPLUNK® APP FOR VMWARE® also provides a user interface that enables a user to select a specific time range and then view heterogeneous data, comprising events, log data and associated performance metrics, for the selected time range. For example, the screen illustrated in FIG. 82D displays a listing of recent “tasks and events” and a listing of recent “log entries” for a selected time range above a performance-metric graph for “average CPU core utilization” for the selected time range. Note that a user is able to operate pull-down menus 7742 to selectively display different performance metric graphs for the selected time range. This enables the user to correlate trends in the performance-metric graph with corresponding event and log data to quickly determine the root cause of a performance problem. This user interface is described in more detail in U.S. patent application Ser. No. 14/167,316 filed on 29 Jan. 2014, which is hereby incorporated herein by reference for all possible purposes.



FIG. 83 illustrates a diagrammatic representation of a machine in the exemplary form of a computer system 7800 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. The system 7800 may be in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, or the Internet. The machine may operate in the capacity of a server machine in client-server network environment. The machine may be a personal computer (PC), a set-top box (STB), a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. In one embodiment, computer system 7800 may represent system 210 of FIG. 2.


The exemplary computer system 7800 includes a processing device (processor) 7802, a main memory 7804 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)), a static memory 7806 (e.g., flash memory, static random access memory (SRAM)), and a data storage device 7818, which communicate with each other via a bus 7830.


Processing device 7802 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processing device 7802 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processing device 7802 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 7802 is configured to execute the notification manager 210 for performing the operations and steps discussed herein.


The computer system 7800 may further include a network interface device 7808. The computer system 7800 also may include a video display unit 7810 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 7812 (e.g., a keyboard), a cursor control device 7814 (e.g., a mouse), and a signal generation device 7816 (e.g., a speaker).


The data storage device 7818 may include a computer-readable medium 7828 on which is stored one or more sets of instructions 7822 (e.g., instructions for search term generation) embodying any one or more of the methodologies or functions described herein. The instructions 7822 may also reside, completely or at least partially, within the main memory 7804 and/or within processing logic 7826 of the processing device 7802 during execution thereof by the computer system 7800, the main memory 7804 and the processing device 7802 also constituting computer-readable media. The instructions may further be transmitted or received over a network 7820 via the network interface device 7808.


While the computer-readable storage medium 7828 is shown in an exemplary embodiment to be a single medium, the term “computer-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable storage medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention. The term “computer-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.


The preceding description sets forth numerous specific details such as examples of specific systems, components, methods, and so forth, in order to provide a good understanding of several embodiments of the present invention. It will be apparent to one skilled in the art, however, that at least some embodiments of the present invention may be practiced without these specific details. In other instances, well-known components or methods are not described in detail or are presented in simple block diagram format in order to avoid unnecessarily obscuring the present invention. Thus, the specific details set forth are merely exemplary. Particular implementations may vary from these exemplary details and still be contemplated to be within the scope of the present invention.


In the above description, numerous details are set forth. It will be apparent, however, to one of ordinary skill in the art having the benefit of this disclosure, that embodiments of the invention may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the description.


Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.


It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “determining”, “identifying”, “adding”, “selecting” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.


Embodiments of the invention also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.


The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.


Implementations that are described may include graphical user interfaces (GUIs). Frequently, an element that appears in a GUI display is associated or bound to particular data in the underlying computer system. The GUI element may be used to indicate the particular data by displaying the data in some fashion, and may possibly enable the user to interact to indicate the data in a desired, changed form or value. In such cases, where a GUI element is associated or bound to particular data, it is a common shorthand to refer to the data indications of the GUI element as the GUI element, itself, and vice versa. The reader is reminded of such shorthand and that the context renders the intended meaning clear to one of skill in the art where a distinction between a GUI element and the data to which it is bound is meaningful.


It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.


The preceding point may be elaborated with a few examples. Many details have been discussed and disclosed in regards to user interfaces including graphical user interfaces (GUIs). While it is convenient to describe inventive subject matter in terms of embodiments that include familiar technologies, components, and elements, the inventive subject matter should not be considered to be constrained to these, and the ready availability and appropriateness of substitutes, alternatives, extensions, and the like is to be recognized. What may be shown or described as a single GUI or interface component should liberally be understood to embrace combinations, groupings, collections, substitutions, and subdivisions in an embodiment. What may be shown or described as a single GUI or interface component may be embodied as an atomic or truly elemental interface component, or may readily be embodied as a complex or compound component or element having multiple constituent parts. What may be shown, described, or suggested to be a uniformly shaped and contiguous GUI or interface component, such as an interface region, area, space, or the like, may be readily subject to implementation with non-uniformly shaped or noncontiguous display real estate.


As yet one more example, apparatus that perform methods, processes, procedures, operations, or the like, disclosed herein may be referred to as a computer, computer system, computing machine, or the like. Any such terminology used herein should be reasonably understood as embracing any collection of temporarily or permanently connected hardware devices in combination with any software each requires to operate and perform operations and functions necessary to an implementation of an inventive aspect. Adopting such an understanding is consistent with modern computing practices and eliminates the need to obscure the disclosure of inventive aspects with catalogs of implementation options and alternatives.


As one final example, methods, procedures, or processes may be described herein by reference to flow charts or block diagrams and possibly in terms of sequences of steps or operations. It should be understood, however, that the practice of an inventive aspect is generally not limited to the number, ordering, or combination of operations as may be described for an illustrative embodiment used to teach and convey an understanding of inventive aspects possibly within a broader context. Accordingly, not all operations or steps described are illustrated may be required to practice of an inventive aspect. Different embodiments may variously omit, augment, combine, separate, reorder, or reorganize the performance of operations, steps, methods, procedures, functions, and the like disclosed or suggested herein without departing from an inventive aspect. Further, where sequences of operations may be illustrated, suggested, expressed, or implied, an embodiment practicing inventive aspects may perform one or more of those operations or sets of operations in parallel rather than sequentially.


Accordingly, inventive aspects disclosed herein should be considered broadly without unnecessary limitation by the details of the disclosure, and should be considered as limited only by accompanying claims or where reason demands it.

Claims
  • 1. A method comprising: causing display of a service-monitoring user interface on a client machine, the service-monitoring user interface comprising a region having an interactive element;receiving at a server machine over a network connection a client machine transmission indicative of a user interaction with the interactive element to indicate a by-service mode;creating a set of by-service seed group definitions in response to the client machine transmission and updating a record of seed group membership in computer storage by automatically classifying a plurality of received notable events into seed group membership based on the seed group definitions;detecting a pattern of predictor event type occurrences among the plurality of received notable events classified into the seed group membership associated with a particular one of the seed group definitions;transmitting, in response to detecting the pattern, an action message to at least one target action device over a network to cause the performance of a preventive action;wherein at least one of the plurality of received notable events results from a correlation search over at least one key performance indicator (KPI), each KPI relating to a service having a stored service definition that identifies one or more entities that provide the service, each entity having stored entity definition information that identifies machine data produced by or about the entity from one or more sources, and each KPI being defined by a search query that produces a value derived from at least some of the machine data identified by the entity definition information of the one or more entities, the value indicative of a measure of the service relating to that KPI at a point in time or during a period of time;wherein the machine data is produced by one or more components within an information technology environment and reflects activity within the information technology environment; andwherein the method is performed by one or more processing devices.
  • 2. The method of claim 1 wherein the at least one target action device is a machine in the information technology environment.
  • 3. The method of claim 1 wherein the action message is directed to a command interface of the at least one target action device and the at least one target action device is a machine in the information technology environment.
  • 4. The method of claim 1 wherein each of the received notable events includes an identification of an associated service.
  • 5. The method of claim 1 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type.
  • 6. The method of claim 1 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the response action type is related to the preventive action.
  • 7. The method of claim 1 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the predictor rule includes information about an occurrence timeframe.
  • 8. The method of claim 1 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the predictor rule includes information about an occurrence timeframe and a prediction action.
  • 9. The method of claim 1 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the predictor rule includes information about an occurrence timeframe, a prediction action, and matching criteria.
  • 10. The method of claim 1 wherein automatically classifying the plurality of received notable events into seed group membership occurs in or near realtime.
  • 11. The method of claim 1 wherein automatically classifying the plurality of received notable events into seed group membership and detecting the pattern of predictor event type occurrences occur in or near realtime.
  • 12. The method of claim 1 wherein detecting the pattern of predictor event type occurrences includes searching a subset of notable events limited to a particular time period.
  • 13. The method of claim 1 wherein detecting the pattern of predictor event type occurrences includes searching a subset of notable events limited to a particular time period based at least in part on a time parameter of a prediction rule.
  • 14. The method of claim 1 wherein the pattern of predictor event type occurrences is based on predictor event types of a predictor rule.
  • 15. A system comprising: a memory; anda processing device coupled with the memory to perform operations comprising: causing display of a service-monitoring user interface on a client machine, the service-monitoring user interface comprising a region having an interactive element;receiving at a server machine over a network connection a client machine transmission indicative of a user interaction with the interactive element to indicate a by-service mode;creating a set of by-service seed group definitions in response to the client machine transmission and updating a record of seed group membership in computer storage by automatically classifying a plurality of received notable events into seed group membership based on the seed group definitions;detecting a pattern of predictor event type occurrences among the plurality of received notable events classified into the seed group membership associated with a particular one of the seed group definitions; andtransmitting, in response to detecting the pattern, an action message to at least one target action device over a network to cause the performance of a preventive action;wherein at least one of the plurality of received notable events results from a correlation search over at least one key performance indicator (KPI), each KPI relating to a service having a stored service definition that identifies one or more entities that provide the service, each entity having stored entity definition information that identifies machine data produced by or about the entity from one or more sources, and each KPI being defined by a search query that produces a value derived from at least some of the machine data identified by the entity definition information of the one or more entities, the value indicative of a measure of the service relating to that KPI at a point in time or during a period of time;wherein the machine data is produced by one or more components within an information technology environment and reflects activity within the information technology environment.
  • 16. The system of claim 15 wherein the at least one target action device is a machine in the information technology environment.
  • 17. The system of claim 15 wherein the action message is directed to a command interface of the at least one target action device and the at least one target action device is a machine in the information technology environment.
  • 18. The system of claim 15 wherein each of the received notable events includes an identification of an associated service.
  • 19. The system of claim 15 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type.
  • 20. The system of claim 15 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the response action type is related to the preventive action.
  • 21. The system of claim 15 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the predictor rule includes information about an occurrence timeframe.
  • 22. The system of claim 15 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the predictor rule includes information about an occurrence timeframe and a prediction action.
  • 23. The system of claim 15 wherein the pattern of predictor event type occurrences is based on a predictor rule associating one or more predictor event types with a response action type, and the predictor rule includes information about an occurrence timeframe, a prediction action, and matching criteria.
  • 24. The system of claim 15 wherein automatically classifying the plurality of received notable events into seed group membership occurs in or near realtime.
  • 25. The system of claim 15 wherein automatically classifying the plurality of received notable events into seed group membership and detecting the pattern of predictor event type occurrences occur in or near realtime.
  • 26. The system of claim 15 wherein detecting the pattern of predictor event type occurrences includes searching a subset of notable events limited to a particular time period.
  • 27. The system of claim 15 wherein detecting the pattern of predictor event type occurrences includes searching a subset of notable events limited to a particular time period based at least in part on a time parameter of a prediction rule.
  • 28. The system of claim 15 wherein the pattern of predictor event type occurrences is based on predictor event types of a predictor rule.
  • 29. A non-transitory computer readable storage medium encoding instructions thereon that, in response to execution by one or more processing devices, cause the one or more processing devices to perform operations comprising: causing display of a service-monitoring user interface on a client machine, the service-monitoring user interface comprising a region having an interactive element;receiving at a server machine over a network connection a client machine transmission indicative of a user interaction with the interactive element to indicate a by-service mode;creating a set of by-service seed group definitions in response to the client machine transmission and updating a record of seed group membership in computer storage by automatically classifying a plurality of received notable events into seed group membership based on the seed group definitions;detecting a pattern of predictor event type occurrences among the plurality of received notable events classified into the seed group membership associated with a particular one of the seed group definitions; andtransmitting, in response to detecting the pattern, an action message to at least one target action device over a network to cause the performance of a preventive action;wherein at least one of the plurality of received notable events results from a correlation search over at least one key performance indicator (KPI), each KPI relating to a service having a stored service definition that identifies one or more entities that provide the service, each entity having stored entity definition information that identifies machine data produced by or about the entity from one or more sources, and each KPI being defined by a search query that produces a value derived from at least some of the machine data identified by the entity definition information of the one or more entities, the value indicative of a measure of the service relating to that KPI at a point in time or during a period of time;wherein the machine data is produced by one or more components within an information technology environment and reflects activity within the information technology environment.
  • 30. The non-transitory computer readable storage medium of claim 29 wherein the action message is directed to a command interface of the at least one target action device and the at least one target action device is a machine in the information technology environment.
RELATED APPLICATIONS

THIS APPLICATION is a continuation-in-part of U.S. Nonprovisional application Ser. No. 15/418,766, filed Jan. 29, 2017, entitled “Service Monitoring Interface,” which is a continuation of U.S. Nonprovisional application Ser. No. 14/933,919, filed Nov. 5, 2015, entitled “Service Monitoring Interface,” issued as U.S. Pat. No. 9,590,877 on Mar. 7, 2017, which is a continuation of U.S. Nonprovisional application Ser. No. 14/611,216, filed Jan. 31, 2015, entitled “Service Monitoring Interface,” issued as U.S. Pat. No. 9,210,056 on Dec. 8, 2015, which is a continuation-in-part of U.S. Nonprovisional application Ser. No. 14/528,858, filed Oct. 30, 2014, entitled “Monitoring Service-Level Performance Using Key Performance Indicators Derived from Machine Data,” issued as U.S. Pat. No. 9,130,860 on Sep. 8, 2015, which claims the benefit of U.S. Provisional Patent Application No. 62/062,104 filed Oct. 9, 2014, entitled “Monitoring Service-Level Performance Using Key Performance Indicators Derived from Machine Data,” each and all of which are incorporated herein by reference for all valid purposes; AND THIS APPLICATION is a continuation-in-part of U.S. Nonprovisional application Ser. No. 15/485,222, filed Apr. 11, 2017, entitled “Automatic Creation of Related Event Groups for IT Service Monitoring,” which is a continuation-in-part of U.S. Nonprovisional application Ser. No. 15/276,776, filed Sep. 26, 2016, entitled “Automatic Event Group Actions,” which is continuation-in-part of U.S. Nonprovisional application Ser. No. 15/014,017, filed Feb. 3, 2016, entitled “Monitoring Service-Level Performance Using a Key Performance Indicator (KPI) Correlation Search,” which is a continuation in part of U.S. Nonprovisional application Ser. No. 14/611,200, filed Jan. 31, 2015, entitled “Monitoring Service-Level Performance Using a Key Performance Indicator (KPI) Correlation Search,” issued on Mar. 22, 2016 as U.S. Pat. No. 9,294,361, which is a continuation-in-part of U.S. Nonprovisional application Ser. No. 14/528,858, filed Oct. 30, 2014, entitled “Monitoring Service-Level Performance Using Key Performance Indicators Derived from Machine Data,” issued on Sep. 8, 2015 as U.S. Pat. No. 9,130,860, which claims the benefit of U.S. Provisional Patent Application No. 62/062,104 filed Oct. 9, 2014, entitled “Monitoring Service-Level Performance Using Key Performance Indicators Derived from Machine Data,” each and all of which are incorporated herein by reference for all valid purposes; AND THE AFOREREFERENCED U.S. Nonprovisional application Ser. No. 15/485,222, filed Apr. 11, 2017, entitled “Automatic Creation of Related Event Groups for IT Service Monitoring,” is furthermore a continuation-in-part of U.S. Nonprovisional application Ser. No. 15/376,516, filed Dec. 12, 2016, entitled “Machine Data-Derived Key Performance Indicators with Per-Entity States” which is a is a continuation of U.S. Nonprovisional application Ser. No. 15/012,848, filed Feb. 1, 2016, entitled “Machine Data-Derived Key Performance Indicators with Per-Entity States,” issued as U.S. Pat. No. 9,521,047 on Aug. 4, 2016, which is a continuation of U.S. Nonprovisional application Ser. No. 14/611,200, filed Jan. 31, 2015, entitled “Monitoring Service-Level Performance Using a Key Performance Indicator (KPI) Correlation Search,” issued as U.S. Pat. No. 9,294,361 on Mar. 22, 2016, which is a continuation-in-part of U.S. Nonprovisional application Ser. No. 14/528,858, filed Oct. 30, 2014, entitled “Monitoring Service-Level Performance Using Key Performance Indicators Derived from Machine Data,” issued as U.S. Pat. No. 9,130,860 on Sep. 9, 2015, which claims the benefit of U.S. Provisional Patent Application No. 62/062,104 filed Oct. 9, 2014, entitled “Monitoring Service-Level Performance Using Key Performance Indicators Derived from Machine Data,” each and all of which are incorporated herein by reference for all valid purposes.

US Referenced Citations (227)
Number Name Date Kind
6611867 Bowman-Amuah Aug 2003 B1
6704012 Lefave Mar 2004 B1
7124101 Mikurak Oct 2006 B1
7299358 Chateau et al. Nov 2007 B2
7315826 Guheen et al. Jan 2008 B1
7321894 Degtyar Jan 2008 B2
7444342 Hall et al. Oct 2008 B1
7461334 Lu et al. Dec 2008 B1
7613801 Pierre Cote et al. Nov 2009 B2
7680721 Cutler Mar 2010 B2
7711670 Roediger May 2010 B2
7716253 Netz et al. May 2010 B2
7778952 Vespe et al. Aug 2010 B2
7792784 Gupta Sep 2010 B2
7800613 Hanrahan et al. Sep 2010 B2
7822662 Guzik et al. Oct 2010 B2
7848260 Cowan Dec 2010 B2
7877359 Kodama et al. Jan 2011 B2
7945596 Anonsen May 2011 B2
8050921 Mark et al. Nov 2011 B2
8056130 Njemanze et al. Nov 2011 B1
8095417 Handy et al. Jan 2012 B2
8099400 Haub et al. Jan 2012 B2
8176069 Timm et al. May 2012 B2
8234308 Brunswig Jul 2012 B2
8266148 Guha et al. Sep 2012 B2
8320261 Vasamsetti Nov 2012 B2
8327335 Noble et al. Dec 2012 B2
8356047 Narayanan et al. Jan 2013 B2
8364460 Ostermeyer et al. Jan 2013 B2
8370324 Kiyama et al. Feb 2013 B2
8412696 Zhang et al. Apr 2013 B2
8488460 Holm-Oste Jul 2013 B2
8538787 Braun et al. Sep 2013 B2
8543527 Bates et al. Sep 2013 B2
8560366 Mikurak Oct 2013 B2
8589403 Marquardt et al. Nov 2013 B2
8613083 Njemanze et al. Dec 2013 B1
8682925 Marquardt et al. Mar 2014 B1
8712953 Beringer et al. Apr 2014 B2
8732213 Sowell et al. May 2014 B2
8738414 Nagar et al. May 2014 B1
8762313 Lahav et al. Jun 2014 B2
8806361 Noel et al. Aug 2014 B1
8825752 Madhavan Sep 2014 B1
8898277 Chen Nov 2014 B2
8948369 Shaffer et al. Feb 2015 B2
9003062 Nampally et al. Apr 2015 B1
9031889 Basu et al. May 2015 B1
9069788 Dutta et al. Jun 2015 B2
9130832 Boe Sep 2015 B1
9130860 Boe Sep 2015 B1
9146954 Boe Sep 2015 B1
9146962 Boe Sep 2015 B1
9158811 Choudhary et al. Oct 2015 B1
9204319 Ouyang Dec 2015 B2
9208463 Bhide Dec 2015 B1
9210056 Choudhary et al. Dec 2015 B1
9218676 Brugler et al. Dec 2015 B2
9274668 Powers Mar 2016 B2
9294361 Choudhary et al. Mar 2016 B1
9413890 McCormack Aug 2016 B2
9443015 Eischeid et al. Sep 2016 B1
9521052 Nandyalam Dec 2016 B1
9584374 Bingham et al. Feb 2017 B2
20010049682 Vincent et al. Dec 2001 A1
20010051998 Henderson Dec 2001 A1
20020091689 Wiens et al. Jul 2002 A1
20020099578 Eicher, Jr. Jul 2002 A1
20030033288 Shanahan Feb 2003 A1
20030083846 Curtin et al. May 2003 A1
20030120764 Laye et al. Jun 2003 A1
20030174173 Nishiyama et al. Sep 2003 A1
20030182310 Charnock et al. Sep 2003 A1
20030225549 Shay et al. Dec 2003 A1
20040006556 Kwoh Jan 2004 A1
20040024770 Cardno Feb 2004 A1
20040030668 Pawlowski et al. Feb 2004 A1
20040260566 King Dec 2004 A1
20050021733 Clinton et al. Jan 2005 A1
20050060048 Pierre et al. Mar 2005 A1
20050081157 Clark et al. Apr 2005 A1
20050086207 Heuer et al. Apr 2005 A1
20050171833 Jost et al. Aug 2005 A1
20050181835 Lau et al. Aug 2005 A1
20050216831 Guzik et al. Sep 2005 A1
20050256766 Garcia et al. Nov 2005 A1
20050289138 Cheng et al. Dec 2005 A1
20060004624 Melara et al. Jan 2006 A1
20060156250 Chaudhri et al. Jul 2006 A1
20060159017 Mun et al. Jul 2006 A1
20060221387 Swift Oct 2006 A1
20060288072 Knapp et al. Dec 2006 A1
20070005388 Busch et al. Jan 2007 A1
20070150480 Hwang et al. Jun 2007 A1
20070192150 Belkin et al. Aug 2007 A1
20070208601 Pulianda Sep 2007 A1
20070234198 Tien et al. Oct 2007 A1
20070260625 Tien et al. Nov 2007 A1
20070276815 Naibo et al. Nov 2007 A1
20080046414 Haub et al. Feb 2008 A1
20080046457 Haub et al. Feb 2008 A1
20080059258 Lee Mar 2008 A1
20080081632 Malik Apr 2008 A1
20080097807 Chang et al. Apr 2008 A1
20080120129 Seubert et al. May 2008 A1
20080126417 Mazurik May 2008 A1
20080140514 Stenger Jun 2008 A1
20080163015 Kagan et al. Jul 2008 A1
20080168376 Tien et al. Jul 2008 A1
20080172629 Tien et al. Jul 2008 A1
20080177595 Wu et al. Jul 2008 A1
20080201397 Peng et al. Aug 2008 A1
20080244453 Cafer Oct 2008 A1
20080256516 Chaar et al. Oct 2008 A1
20080317217 Bemardini et al. Dec 2008 A1
20090013246 Cudich et al. Jan 2009 A1
20090018996 Hunt et al. Jan 2009 A1
20090064025 Christ et al. Mar 2009 A1
20090112932 Skierkowski et al. Apr 2009 A1
20090125577 Kodama et al. May 2009 A1
20090222749 Marinescu et al. Sep 2009 A1
20090265637 Lee et al. Oct 2009 A1
20090313503 Atluri et al. Dec 2009 A1
20090319320 Daughtrey et al. Dec 2009 A1
20100023362 Nguyen et al. Jan 2010 A1
20100031234 Chaar et al. Feb 2010 A1
20100042680 Czyzewicz et al. Feb 2010 A1
20100094676 Perra et al. Apr 2010 A1
20100115389 Gautestad May 2010 A1
20100123575 Mittal May 2010 A1
20100185710 Kiyama et al. Jul 2010 A1
20100185961 Fisher et al. Jul 2010 A1
20100229096 Maiocco Sep 2010 A1
20100306229 Timm et al. Dec 2010 A1
20100324927 Tinsley Dec 2010 A1
20100324962 Nester et al. Dec 2010 A1
20100332466 White et al. Dec 2010 A1
20110106453 Krieftewirth May 2011 A1
20110113341 Liensberger et al. May 2011 A1
20110178977 Drees Jul 2011 A1
20110191314 Howes et al. Aug 2011 A1
20110214081 Dobrin et al. Sep 2011 A1
20110219045 Yanai et al. Sep 2011 A1
20110261055 Wong et al. Oct 2011 A1
20110264663 Verkasalo Oct 2011 A1
20110313817 Wang Dec 2011 A1
20120005581 Turner Jan 2012 A1
20120005593 Redpath Jan 2012 A1
20120029977 Alcorn Feb 2012 A1
20120089650 Gibbs et al. Apr 2012 A1
20120102024 Campbell et al. Apr 2012 A1
20120131187 Cancel et al. May 2012 A1
20120158521 McCullen Jun 2012 A1
20120162265 Heinrich et al. Jun 2012 A1
20120197934 Zhang et al. Aug 2012 A1
20120259583 Noboa et al. Oct 2012 A1
20120260306 Njemanze et al. Oct 2012 A1
20130018686 Wright et al. Jan 2013 A1
20130018703 Majeed Jan 2013 A1
20130142322 Grasso et al. Jun 2013 A1
20130155873 Berg et al. Jun 2013 A1
20130157616 Berg et al. Jun 2013 A1
20130166490 Elkins et al. Jun 2013 A1
20130182700 Figura et al. Jul 2013 A1
20130185306 Botros Jul 2013 A1
20130185693 Chaar et al. Jul 2013 A1
20130205023 Bemardini et al. Aug 2013 A1
20130238403 Benson Sep 2013 A1
20130262279 Finley et al. Oct 2013 A1
20130318236 Coates et al. Nov 2013 A1
20130318589 Ford et al. Nov 2013 A1
20130318603 Merza Nov 2013 A1
20130325147 Kamouskos Dec 2013 A1
20130326620 Merza et al. Dec 2013 A1
20130332472 Vogel et al. Dec 2013 A1
20140012983 Brown et al. Jan 2014 A1
20140038583 Berg et al. Feb 2014 A1
20140040285 Rubinstein et al. Feb 2014 A1
20140040306 Gluzman Peregrine et al. Feb 2014 A1
20140067836 Holmes et al. Mar 2014 A1
20140072115 Makagon et al. Mar 2014 A1
20140122176 Olsen et al. May 2014 A1
20140122711 Lientz May 2014 A1
20140129298 Hulen et al. May 2014 A1
20140146648 Alber et al. May 2014 A1
20140156323 Prieto Jun 2014 A1
20140157142 Heinrich et al. Jun 2014 A1
20140160238 Yim et al. Jun 2014 A1
20140177819 Vymenets et al. Jun 2014 A1
20140181087 Wu Jun 2014 A1
20140229462 Lo Aug 2014 A1
20140236889 Vasan et al. Aug 2014 A1
20140236890 Vasan et al. Aug 2014 A1
20140279942 Siepmann et al. Sep 2014 A1
20140280175 Gelfand Sep 2014 A1
20140282586 Shear et al. Sep 2014 A1
20140282600 Siepmann et al. Sep 2014 A1
20140324448 Lacy et al. Oct 2014 A1
20140336984 Starr Nov 2014 A1
20140337871 Garcia De Bias et al. Nov 2014 A1
20140337938 Abhyanker Nov 2014 A1
20140364114 Zhao Dec 2014 A1
20140375650 Grundstrom et al. Dec 2014 A1
20140376710 Shaffer et al. Dec 2014 A1
20150026156 Meek et al. Jan 2015 A1
20150026167 Neels et al. Jan 2015 A1
20150050637 James-Hatter et al. Feb 2015 A1
20150085681 Bowdery et al. Mar 2015 A1
20150112700 Sublett et al. Apr 2015 A1
20150200824 Sadovsky et al. Jul 2015 A1
20150254955 Fields et al. Sep 2015 A1
20150269617 Mikurak Sep 2015 A1
20150310061 Hengstler et al. Oct 2015 A1
20150310371 Byrne et al. Oct 2015 A1
20150314681 Riley et al. Nov 2015 A1
20150356182 Ivershen et al. Dec 2015 A1
20160057020 Halmstad et al. Feb 2016 A1
20160087856 Groenendijk Mar 2016 A1
20160093226 Machluf et al. Mar 2016 A1
20160094411 Brennan et al. Mar 2016 A1
20160105329 Coates Apr 2016 A1
20160105334 Boe Apr 2016 A1
20160105335 Choudhary Apr 2016 A1
20160132575 Fletcher May 2016 A1
20160147380 Coates May 2016 A1
20160267420 Budic Sep 2016 A1
Foreign Referenced Citations (1)
Number Date Country
2013119200 Aug 2013 WO
Non-Patent Literature Citations (14)
Entry
Bitincka, Ledion, et al., “Optimizing Data Analysis with a Semi-Structured Time Series Database”, Splunk Inc., 2010 pp. 1-9.
Carasso, David, “Exploring Splunk Search Processing Language (SPL) Primer and Cookbook”, Splunk Inc., 2012 , CITO Research, New York, 154 Pages.
http://docs.splunk.com/Documentation/PCI/2.1.1/ [000119] User/IncidentReviewdashboard, 2 Pages. (Last accessed Aug. 5, 2014).
Jack Coates, Cognitive Splunking, Sep. 17, 2012; Splunk-blogs, Blogs-Security, 1-3.
Title: The OI Difference: Two Major Differences between OI & BI. Publisher: Vitria. Alleged Publication Date: Feb. 8, 2011. URL: http://www.vitria.com/blog/The-OI-Difference-Two-Major-Differences-Between-OI-BI/.
U.S. Appl. No. 114/326,459, filed Jul. 8, 2014.
U.S. Appl. No. 14/167,316, filed Jan. 29, 2014.
U.S. Appl. No. 14/326,459, filed Jul. 31, 2014.
U.S. Appl. No. 14/326,459, filed Jul. 8, 2014.
U.S. Appl. No. 14/448,995, filed Jul. 31, 2014.
U.S. Appl. No. 14/523,661, filed Oct. 24, 2014.
U.S. Appl. No. 14/528,926, filed Oct. 30, 2014.
U.S. Appl. No. 14/611,214, filed Jan. 31, 2015.
vSphere Monitoring and Performance, VMware, Inc., Update 1, vSphere 5.5, EN-001357-02, 2010-2014, pp. 1-174 http://pubs.vmware.com/ vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-551-mo- nitoring-performance-guide.pdf.
Provisional Applications (1)
Number Date Country
62062104 Oct 2014 US
Continuations (2)
Number Date Country
Parent 14933919 Nov 2015 US
Child 15418766 US
Parent 14611216 Jan 2015 US
Child 14933919 US
Continuation in Parts (5)
Number Date Country
Parent 15418766 Jan 2017 US
Child 15713592 US
Parent 14528858 Oct 2014 US
Child 14611216 US
Parent 15713592 Oct 2014 US
Child 14611216 US
Parent 15485222 Apr 2017 US
Child 15713592 US
Parent 15276776 Sep 2016 US
Child 15485222 US