The following description relates in general to monitoring systems, and more particularly to systems and methods for programmatically controlling measurements in monitoring sources.
Computing systems of various types are widely employed today. Data centers, grid environments, servers, routers, switches, personal computers (PCs), laptop computers, workstations, devices, handhelds, sensors, and various other types of information processing devices are relied upon for performance of tasks. Monitoring systems are also often employed to monitor these computing systems. For instance, monitoring systems may be employed to observe whether a monitored computing system is functioning properly (or at all), the amount of utilization of resources of such monitored computing system (e.g., CPU utilization, memory utilization, I/O utilization, etc.), and/or other aspects of the monitored computing system. In general, monitoring instrumentation (e.g., software and/or hardware) is often employed at the monitored system to collect information, such as information regarding utilization of its resources, etc. The collected information, which may be referred to as “raw metric data,” may be stored to a data store (e.g., database or other suitable data structure) that is either local to or remote from the monitored computing system, and monitoring tools may then access the stored information. In some instances, tasks may be triggered by the monitoring tools based on the stored information. For example, a monitoring tool may generate utilization charts to display to a user the amount of utilization of resources of a monitored system over a period of time. As another example, alerts may be generated by the monitoring tool to alert a user to a problem with the monitored computing system (e.g., that the computing system is failing to respond). As still another example, the monitoring tool may take action to re-balance workloads among various monitored computing systems (e.g., nodes of a cluster) based on the utilization information observed for each monitored computing system.
Today, monitoring data is collected in the form of metrics that are defined and observed for a monitored computing system. In general, instrumentation and/or monitoring sources are manually configured regarding the metrics that are reported in the monitoring data collected for a given monitored computing system. Such reporting configuration may include manually configuring which metrics are to be reported (e.g., CPU utilization, memory utilization, I/O utilization, etc.), the rate at which the metrics are reported, the destination to which the metrics are to be reported (e.g., a distribution list), and the format of the reported metrics. If a change is desired in the metric reporting configuration, the monitoring source must be manually re-configured. Further, if multiple monitoring sources are implemented and a change is desired across all of such monitoring sources, each monitoring source must be individually manually re-configured. Such manual configuration or re-configuration of a monitoring source generally involves a user editing the configuration file of each monitoring source, and then restarting the monitoring source. This process is not only time consuming but is also error prone and limits the rate at which changes can be applied.
For improved efficiency and flexibility, we have recognized a desire to provide improved control over metric reporting configuration in a monitoring source. More specifically, we have recognized a desire for an interface to a monitoring source that allows for programmatic configuration (and re-configuration) of metric reporting configurations, rather than requiring the above-described manual configuration.
Embodiments of the present invention provide an interface for programmatically configuring metrics reported in monitoring data collected for a monitored component.
System 100 further includes a monitoring source 107. In general, a monitoring source 107 is a component that gathers or stores monitoring data about monitored components, such as monitored component 102, in an environment. Monitoring sources commonly include a monitoring data store 104 for storing monitoring data collected for monitored component 102. This exemplary embodiment further includes metric reporting configuration interface 101 for enabling reporting of metrics to be programmatically configured, as described further herein. In certain embodiments, metric reporting configuration operations are supported for configuring one or more of the following configuration parameters: metric selection 10, metric delivery rate 11, reporting format definition 12, reporting distribution list 13, priority, or utility (as notion of “value” of monitoring data), and/or metric collection rate 14, as illustrated by the optional dashed-line boxes shown in
The monitoring data for monitored component 102 collected by monitoring instrumentation 103 is stored to monitoring data store 104. Such data store 104 may be stored to any suitable form of computer-readable storage medium, such as memory (e.g., RAM), a hard drive, optical disk, floppy disk, tape drive, etc., and may store the monitoring data in the form of a database or any other suitable data structure. In certain implementations, a given monitoring data store 104 may store monitoring data for a plurality of different monitored components. In certain embodiments, the monitoring data is communicated by monitoring instrumentation 103 to monitoring data stores 104 via a communication network (not shown), such as the Internet or other wide-area network (WAN), a local area network (LAN), a telephony network, a wireless network, or any other communication network that enables two or more information processing devices to communicate data. The monitoring data stored therein may comprise any number of metrics collected for monitored component 102, such as CPU utilization, memory utilization, I/O utilization, etc. In certain embodiments, the monitoring data stored to monitoring data store 104 is configured in accordance with metric reporting configurations defined for such monitoring data. That is, the metrics that are included in the monitoring data for monitored component 102, the metric delivery rate (how often such metrics are reported for monitored component 102), the reporting format, and/or other aspects of the metrics of the monitoring data are defined by reporting configurations. As described further below, such metric reporting configurations may be defined (e.g., dynamically changed) via metric reporting configuration interface 101.
A monitoring tool 105 is further implemented in system 100, which is operable to access (e.g., via a communication network) the collected monitoring data in monitoring data store 104. As used herein, a “monitoring tool” refers to any device that is operable to access the collected monitoring data for at least one monitored component. Monitoring tool 105 may comprise a server, PC, laptop, or other suitable information processing device, which may have one or more software applications executing thereon for accessing the monitoring data in monitoring data store 104 for one or more monitored components, such as monitored component 102. Monitoring tool 105 may be implemented, for example, to take responsive actions based on such monitoring data. As described further herein, monitoring data may be pushed from monitoring source 107 to monitoring tool 105 in certain embodiments, and monitoring data may be pulled from monitoring source 107 by monitoring tool 105 in other embodiments.
In accordance with embodiments of the present invention, the metrics reported (e.g., in monitoring data store 104 and/or to a monitoring tool 105) for monitored component 102 can be programmatically configured via metric reporting configuration interface 101. As described further herein, metric reporting configuration interface 101 may support operations for defining such configuration parameters as a) selecting metrics to be reported in the monitoring data (block 10 of
Further, in certain embodiments, the configuration parameters may be autonomously changed (e.g., by monitoring source 107, monitoring controller 106, and/or monitoring tool 105) responsive to certain changes occurring in the monitored environment. For instance, a monitored component 102 may migrate within a monitored environment (e.g., from one data center to another), such that the migrated monitored component 102 may be monitored by a different monitoring source 107. Thus, the new monitoring source 107 that is associated with the monitoring component may enable the configuration of data collection for the component via metric reporting configuration interface 101. A monitoring tool may become aware of the support for configuration of data collection for the component by monitoring source 107 and cause the desired configuration.
In operational block 22, the metric reporting configuration interface 101 supports defining configuration parameters of at least one metric to be reported in monitoring data collected for the at least one monitored component 102. As shown in sub-operational block 202, in certain embodiments the metric reporting configuration interface 101 supports defining the following metric reporting configuration parameters: a) selecting metrics to be reported in the monitoring data (block 10 of
In operational block 23, monitoring data is collected for the at least one monitored component. As shown in sub-operational block 203, in certain embodiments this comprises receiving at a monitoring source 107 raw metrics from instrumentation 103 coupled to the at least one monitored component 102. In operational block 24, the monitoring data is reported in accordance with the defined configuration. For instance, in certain embodiments, the monitoring data, having metrics according to the defined configuration, is reported to a monitoring data store 104 that is accessible by a monitoring tool 105, as shown by sub-operational block 204. Thus, the monitoring data having metrics configured according to the defined configuration parameters may be stored for access by a monitoring tool 105 in certain embodiments. Additionally or alternatively, in certain embodiments the monitoring data having metrics configured according to the defined configuration parameters may be communicated to a monitoring tool 105, as shown by sub-operational block 205. That is, instead of or in addition to storing such monitoring data, the monitoring data may be communicated from a monitoring source 107 to a monitoring tool 105 or an event may be sent to the monitoring tool 105 signaling that the data is available. Thus, monitoring data may be communicated to the monitoring tool via pushing such monitoring data from the monitoring source 107 to the monitoring tool 105 or via notifying (e.g., by an event) the monitoring tool 105 that the monitoring data is available at the monitoring source 107 so that the monitoring tool 105 may then pull the monitoring data from the monitoring source 107 when desired. Alternatively, in another embodiment, the monitoring tool may poll the monitoring source 107 to learn whether the monitoring data is available. This polling is done via the control interface. Data is either pushed to the monitoring tool 105 or read by the monitoring tool using the monitoring data interface. In either case, the data may be delivered via a reporting network, such as the exemplary reporting network described further in co-pending and commonly assigned U.S. Patent Application Serial No. [Attorney Docket No. 200404993-1 titled “SYSTEM AND METHOD FOR AUTONOMOUSLY CONFIGURING A REPORTING NETWORK NETWORK”, the disclosure of which is hereby incorporated herein by reference.
In general, monitoring source 1071 is a component that gathers or stores monitoring data about monitored components, such as monitored component 102 (of
Raw monitoring data is received for various metrics at the raw metrics ports 301 from raw metric sources, such as from instrumentation 103 coupled to monitored component 102 in
The subsequent metric selector 303 refers to the metric descriptors 310 and selects the metrics that have been enabled. That is, metric selector 303 filters out from the incoming raw metric data those metric descriptors that are enabled. Metrics that are not referenced by any metric descriptors may not be stored by monitoring source 1071 nor processed further.
In certain embodiments, monitoring source 1071 provides an inward facing programmable interface (i.e., control interface 1011). Control interface 1011 may be use by the monitoring tool or by a monitored component of monitoring source 107. That is, this interface 1011 is used by the monitored environment to register meta-data, models, and their corresponding meta-models for the monitored environment, such as described in various ones of the related applications incorporated herein by reference above. Furthermore, as the monitored component changes, the interface is used to reflect these changes as registered in the monitoring source. The registered information is used to support the implementation of the metric reporting configuration interface 101.
Metrics that have been enabled are further passed on to the raw data collector 304, an intermediary store that exists for each enabled metric. The purpose of the raw data collector 304 is to adjust the receiving rate for metrics with the desired delivery rate. In one embodiment, when a higher delivery rate is chosen than the actual receiving rate, metric values are repeatedly reported from the intermediate store in the raw data collector 304. When the delivery rate is lower than the receiving rate, metric values may simply be overwritten (and lost) after new raw metric values have arrived before current values could be delivered. The raw data collector 304 may also apply some different policy than overwriting in this manner. It may contain a queue of values and perform some interpolation for delivering a metric value at due time.
The following delivery rate controller 305 determines the delivery rate that has been defined for a metric by accessing metric values from the intermediate store in the raw data collector 304. Various alternatives exist for determining when a monitored metric is to be delivered, including as examples the following:
After the delivery rate controller 305 has triggered the delivery of a metric value obtained from the intermediate store in the raw data collector 304, the subsequent format processor 306 applies a transformation in the raw metric value(s) in order to generate its final representation, as expected by the destinations of the monitoring data record. The transformation performed by the format processor 306 is described and controlled by the format definition document 311 that defines the final representation of the monitoring data record that will be sent out. This document 311 is machine readable.
The final processing step is performed in the distributor component 307 that disseminates the monitoring data record to all destinations that have subscribed to the corresponding metric. Subscribers, such as various monitoring tools, are described in the automatically maintained distribution list document 312. In certain embodiments, in addition to or instead of distributing the monitoring data to recipients (e.g., monitoring tools), it may be stored to a data store, such as monitoring data store 104 of
The various control functions performed by the exemplary components of monitoring source 1071 can be controlled through the control interface 1011 linked to the control processor 309. The task of the control processor 309 is to translate control instructions received in form of invocations of control methods into respective changes in internal control data, such as the metric descriptor 310, the format definition document 311, and the distribution list 312. Changes in such configuration parameters made via control interface 1011 will thus affect the processing of monitoring data in the monitoring source 1071.
The following methods are examples of operations that may be supported by control interface 1011 according to embodiment of monitoring source 1071:
selectRawMetricByName(metric), which is an instruction for selecting a raw metric for processing;
selectDeliveryRateForMetric(metric, rate), which is an instruction for selecting the delivery rate for a metric;
selectCollectionRateForMetric(metric,rate), which is an instruction for selecting the collection rate for a metric;
selectDeliveryLatencyForMetric(metric, latency), which is an instruction for selecting the latency for metric delivery;
selectCollectionLatencyForMetric(metric,latency), which is an instruction for selecting collection latency for metric collection;
defineMetricFormat(metric, formatDef), which is an instruction for uploading the format definition document (such as an XML XSLT translation document) that is (applied to the specified metric. the document formatDef described a transformation which is applied to the metric data that are obtained from the monitoring source specified by metric. This allows control of the output format of monitoring date;
subscribe(metric, destDesc), which is an instruction for subscribing the destination defined by the destination descriptor to receive the specified metrics;
unsubscribe(metric, destDesc), which is an instruction for unsubscribing the destination defined by the destination descriptor to receive the specified metrics.
Thus, the above instructions and/or others that may be supported by a given implementation of control interface 1011 may be used for programmatically defining (e.g., changing) the metric reporting configuration parameters of monitoring source 1071. For instance, a process (e.g., as on monitoring controller 106 and/or monitoring tool 105 of
Operation of exemplary monitoring store 1071 is described further with the operational flow of
In operational block 42, raw metric data is received for a monitored component (such as monitoring component 102 of
In operational block 46, format processor 306 controls the delivery format for the selected metric data to be delivered to recipients on the distribution list 312 in accordance with the format definition document 311. In operational block 47, distributor 307 controls the destinations to which the selected metric data is delivered from monitoring source 1071 (via delivery port 308) in accordance with the distribution list 312.
Certain embodiments of the present invention enable decoupling of raw metric data delivery and the delivery of processed data. In order to allow adjustment of the reporting rate for monitoring data, the processes of receiving raw metric data and processing and delivering of requested monitoring data is decoupled in certain embodiments of the present invention. Decoupling, in this regard, means that they may be split into separate processing threads that may be initiated independently and that may operate concurrently.
In the exemplary embodiment of
Thread 503 is initiated and controlled by the delivery rate controller 305 obtaining raw data from queue in the raw data collector 304 and executing the format processor 306 and distributor 307 further down the processing pipeline.
Accordingly, system 600 provides a monitoring environment that has of a number of monitoring sources 107A-107C and a number of tools 105t1-105t2 that access, process, store and report monitoring data. The monitoring sources are implemented for different data center locations 602A-602C, respectively, in this example.
Further, in this example, application 601 initially resides in data center 602C and is monitored by instrumentation 103C3. Monitoring data is delivered from instrumentation 103C3 to monitoring source 107C. Reporting tool chain 105t1 receives monitoring data for application 601 through monitoring source 107C.
Assume that application 601 is migrated to the data center 602A, initiated by a system administrator or monitoring tool. In one embodiment, this event causes a notification to be sent to both reporting tool chains alerting them to the fact that application 601 has moved, and hence, that the monitoring data for it is no longer available from monitoring source 107C. Consequently, tool chains 105t1 and 105t2 will reconfigure so as to obtain the data about application 601 from monitoring source 107A, which in turn, obtains data about application 601 from its new instrumentation 103A2. The tool chain uses metric introspection interface 603 to retrieve the new definitions for the metrics of interest, and use this information to subsequently retrieve the monitoring data for application 601 from monitoring data interface 308A. In an alternate embodiment, following the migration of application 601, the monitoring data for application 601 may be migrated to monitoring source A.
Comprehensive control or “programmability” of monitoring sources (instrumentations) in terms of:
Thus, embodiments of the invention enable programmatically controlling a collection of monitoring data in monitoring sources (instrumentations) provided in a system. In many instances, monitoring data can be large, and transmitting and storing monitoring data can consume significant resources. This may have an impact on the monitored system since transmission and collection of monitoring data also occurs and consumes resources in the monitored environment. Transmitting monitoring data consumes bandwidth in the shared networking infrastructure.
Providing the capability to control when, where, which and at which rate monitoring data is gathered, processed and stored in a system thus is advantageous. Making this control capability available through programmable interfaces (APIs) allows further automated adjustment of where and when and at which rate monitoring data is collected in the system.
Embodiments of the invention provide a mechanism to control the resources used to transmit and store monitoring data. Embodiments of the invention allow monitoring services to programmatically actuate the collection, transmission and storage of monitoring data according to a purpose. Embodiments of the invention enable flow control of monitoring data flows. This is desired, for instance, when transmission resources are needed for other purposes in a system or when destinations of monitoring data cannot process data at arrival rate.
Thus, in view of the above, exemplary embodiments of the present invention provide one or more monitoring sources that provide a control interface (a set of methods) that allows defining metric reporting configuration parameters, such as selecting which metrics are reported by the source, at which rate (minimum and maximum numbers of records or data points per metric per time), to which destination, and in which format. All these parameters can be changed and redefined during run-time by invoking methods at the control interface. In certain embodiments, the monitoring source also provides a distribution capability in which clients to monitoring data can subscribe for receiving monitoring data from the monitoring source.
In certain embodiments, per-subscriber customization of metric reporting configuration parameters may be supported. For instance, per-subscriber customization of metric reporting formats and/or delivery rates may be supported. Such an implementation may be supported by using per-subscriber instances of metric selector 303, raw data collectors 304, delivery rate controllers 305, format processors 306, and/or distributors 307.
This application is related to concurrently filed and commonly assigned U.S. patent application Ser. Nos. [Attorney Docket No. 200404993-1] entitled “SYSTEM AND METHOD FOR AUTONOMOUSLY CONFIGURING A REPORTING NETWORK NETWORK”; [Attorney Docket No. 200404992-1] entitled “A MODEL-DRIVEN MONITORING ARCHITECTURE”; [Attorney Docket No. 200404994-1] entitled “SYSTEM FOR METRIC INTROSPECTION IN MONITORING SOURCES”; and [Attorney Docket No. 200405195-1] entitled “SYSTEM AND METHOD FOR USING MACHINE-READABLE META-MODELS FOR INTERPRETING DATA MODELS IN A COMPUTING ENVIRONMENT”, the disclosures of which is hereby incorporated herein by reference.