The present invention relates generally to the field of systems management.
Computer networks are becoming larger and more complex. Network management of computer networks often involves monitoring deployed nodes on the network (e.g., computers, servers, routers, sub-networks, network enabled devices, and the like). This monitoring process may involve a variety of parameters that are important to the system manager and the health of the network.
Monitoring performed by a client network management system can include measuring and collecting performance data of servers and other computer systems in the network. Performance measurement and system health monitors can measure and collect data needed to diagnose a problem with a system on the network. Performance measurement and system health monitors can use a measurement engine that acquires desired system metrics (e.g., CPU utilization, percentage of memory used, and the like). This data can then be used for generating performance reports and for aiding system operators in diagnosing system problems such as a memory bottleneck. Those skilled in the art will appreciate that a significant amount of data may be necessary to diagnose potential system problems.
Examples of known performance measurement and system health monitors can include commercially available software systems, such as MeasureWare available from Hewlett-Packard Company and Patrol available from BMC Software, Inc. Known performance measurement and system health monitors typically require the customer to define performance thresholds. When performance crosses the defined thresholds, an alert is generated to notify system administrators or support personnel, perhaps accompanied by a static set of recommendations or corrective actions.
Threshold-based performance monitoring is reactive in the sense that customers are not made aware of an emerging problem until a threshold is reached. Experts can be assigned to customers with performance problems, however they are usually limited in number and limited in how many customers can be helped. Known systems and methods do not causally link performance improvement or degradation with configuration changes that may be a factor. Such systems and methods require expertise in the customer's information technology (IT) staff to evaluate relative merits of static advice sets to determine what course of action should be attempted first. One customer does not automatically benefit from learning at other customer sites because prior threshold-based performance monitoring is localized to a customer site.
Briefly summarized, an exemplary embodiment of the invention relates to a method for adjusting the relative value of system configuration recommendations. The method can include identifying system configuration changes in a system, obtaining performance metrics for the system before and after system configuration changes are implemented, and assessing the effectiveness of system configuration changes based on the obtained performance metrics.
Another embodiment relates to a system including hardware components in a computer system, installed software in the computer system, configuration settings indicating configuration conditions for the hardware components and the installed software, and programmed instructions. The programmed instructions are configured to identify implemented configuration changes in the computer system, collect performance metrics associated with the computer system having the identified implemented configuration changes, and weight effectiveness of the identified implemented configuration changes.
Another embodiment relates to a system for adjusting relative value of implemented configuration changes on computer systems in a network. The system includes means for obtaining configuration information for the computer systems in the network, means for obtaining performance data for the computer systems in the network, means for recommending configuration changes to one of the computer systems in the network, means for obtaining performance data for the one of the computer system after implementation of recommended configuration changes, and means for adjusting relative value of the recommended configuration changes based on an evaluation of the performance data after implementation of recommended configuration changes.
In the system, performance data is obtained for the computer systems on the network. Performance information can be obtained using a monitoring program (such as the HAO software described below) that automatically polls computer system performance. Such automatic polling can be performed periodically, randomly, or when configuration changes are made. Alternatively, performance information can be obtained and/or entered manually. Once obtained, performance information is preferably stored in a database. When configuration changes are made, new configuration information and new performance information are obtained and the effectiveness of configuration changes are assessed. In general, effective configuration changes result in improved performance.
In an exemplary embodiment, rules are developed from knowledge obtained through interviews with systems performance experts and stored in a database 602. These rules define configuration settings that are symptomatic of system performance problems along with recommended corrective actions. These rules also define recommended configuration changes to optimize or maximize system performance. Preferably, the system performance is keyed off of specific configuration parameters.
Performance monitoring software, hereafter referred to as a performance monitor 608, is installed on each monitored system. The performance monitor 608 can be the HP Measure Ware product available from Hewlett-Packard Company. In an alternative embodiment, the performance monitor 608 is located at an enterprise level instead of on each monitored system. A configuration-tracking infrastructure, hereafter referred to as configuration tracker 606, is installed. The configuration tracker 606 is capable of monitoring configuration changes on the systems in a customer environment and delivering those changes to a central repository where the rules are implemented. The configuration tracker 606 can be the HP Configuration Tracker product, which is part of the High Availability Observatory (“HAO”) that collects configuration changes and transmits them from the customer site to a central site.
A performance data collection utility, hereafter referred to as a performance collector 609, is installed on the monitored systems (e.g. computers 612, 614, 616, and 620). The configuration tracker 606 can invoke the performance collector 609 on a daily basis. The performance collector queries the performance monitor 608 and extracts various performance metrics. In addition, the configuration tracker 606 collects other configuration parameters on a daily basis. The performance metrics and configuration parameters are sent to a central repository 604 via the configuration tracker. At the central repository 604, the data is accumulated in the database 602 or other memory structure. Over time, the performance metrics describe a performance baseline for the monitored system.
Newly collected performance metrics can be compared to the baseline. If the new metrics statistically deviate from the performance baseline, the configuration tracker 606 is queried to identify configuration changes that occurred prior to the performance change and may have contributed to the change in performance. Exemplary methods of identifying configuration changes are described in U.S. Pat. No. 6,282,175 entitled “Method for Tracking Configuration Changes in Networks of Computer Systems Through Historical Monitoring of configuration Status of Devices on the Network,” incorporated herein by reference in its entirety, and assigned to the same assignee as the present application.
Advantageously, whether or not the observed configuration changes are, in fact, relevant to the performance change is initially unimportant. If the performance change is an improvement, the factors are tagged as likely to elicit better performance on a customer's system. Likewise, if performance degrades following the configuration change, the contributing factors are tagged as likely to diminish performance. The repository of such tagged factors is linked to the rule base through the configuration parameters common to each. Various kinds of automated or manual analysis can be applied to the collected configuration and performance data.
One example embodiment automatically compares the configuration settings for a system against the norms for other similar systems hosting similar applications and notifies support personnel or customers of settings that deviate statistically from the norms along with the normal settings. Another example embodiment compares newly received performance metrics against the performance baselines and queries the data warehouse for antecedent configuration changes. Keyed off these configuration changes, one or more corrective actions from the rule base can be proposed to the customer, incorporating configuration changes that were tagged as resulting in improved system performance.
After analysis is completed and recommendations delivered, the configuration tracker 606 identifies configuration changes that are actually implemented and collects new performance metrics. The new performance metrics define new baselines which can be compared to the performance baselines prior to the implementation of the corrective action. The results are used to weight the effectiveness of the implemented advice. Heavily weighted recommendations (i.e. those which were implemented and resulted in performance improvements) are prioritized in future recommendation sets. Less effective recommendations are offered only secondarily or are dropped altogether, thereby improving the value of the automatically generated performance recommendations.
The advice and corrective actions recommended to customers is automatically reprioritized such that the most effective and most likely to be implemented recommendations are identified and offered first, using the automated self-correcting and self-adjusting system described, thereby improving the value of the advice offered to customers. The advice is also improved in that it is tied to specific, observed configuration changes. In addition, the system and method quantify the effectiveness of advice using performance metrics. Utilization of this technology and gathering metrics to present to customers about the effectiveness of the advice given provides self-correcting/self-tuning features.
In another exemplary embodiment, a reporting capability summarizes the recommended actions that were identified for a customer, the configuration changes that were implemented and the resulting change in performance. The reporting can describe performance trends that have continued on systems where none of the recommended actions were implemented.
Referring to
In an operation 102, enterprise configuration information and performance data is collected from field nodes. The collection operation utilizes a set of collectors 104 to gather the desired configuration and performance information. The collectors 104 are commands or programs stored on a support node in the enterprise. The collectors 104 can be run at periodic intervals on each node of the enterprise. The collectors 104 gather desired configuration and performance information and store it in the tracker database 106. Specifically, for each node, there is configuration information and performance information stored on the tracker database 106 associated with each and every collector that generates configuration and performance information.
In an operation 108, configuration information and performance information are analyzed by an analyzer harness 806 (
The analyzer harness 806 executes the desired analyzer 110 with the configuration information and/or performance information stored in the tracker database 106. The analyzer harness 806 can generate a report, identifying issues relating to the field nodes. This issue identifying report is stored in an issue database 112. If an issue has not arisen with respect to any node, the issue will be absent from the report.
At this stage, the report generated by operation 108 may be used to generate a full report along with text descriptions of the issues. The report from the analyzer harness 806 is sent to a report generator 206 (
In an operation 114, issues are analyzed using rules written by the experts, and a report is generated. Generally speaking, the reports are generated from templates stored in the report templates and rules database 204. The reports may be presented in an operation 118 with recommendations to improve system performance based on analysis of field node configuration, performance baselines, current performance metrics, and prioritized advice.
Software known as High Availability Observatory (“HAO”) available from Hewlett-Packard, Company is stored on the support node 308, and it manages the collectors 104 that gather configuration and performance information. In the enterprise 300, the support node 308 is connected to the nodes 302 and 304 by a network which enables the collectors 104 to gather configuration and performance information. Use of a support node in this manner is one of many ways in which configuration and performance information may be collected and subjected to analysis.
The analyzer server 800 includes an analyzer database 804 which stores the analyzers 110 and an analyzer harness 806 for wrapping the analyzers 110 retrieved from the analyzer database 804 with the configuration information and performance information retrieved from the tracker database 106. The analyzer harness 806 generates an issue report file which is placed into the issues database 112.
The legacy server 210 includes a legacy database 212. The legacy database 212 stores configuration and performance information files obtained during prior manual collections or other means. The legacy database 212 can be linked to the analyzer harness 806. In the event that the HAO collectors 104 are unavailable or not present to automatically and continually gather configuration and performance information, configuration and performance information can be retrieved from the legacy database 212. This information, however, is only as current as the most recent collection. But other automated techniques may be used to place node configuration and performance information into the legacy database 212. The HAO collectors 104 is one way in which configuration and performance information may be gathered from an enterprise.
The report generator server 202 is also a part of the central site. The report generator server 202 is linked to the analyzer server 800 through the issues database 112. The report generator server 202 includes a report generator 206 for receiving the issue reports from the database 112. As discussed above with respect to
A report templates and rules database 204 is part of the report generator server 202. The server 202 stores various report templates and rules which are developed by subject matter experts. These experts can be field personnel or product development personnel. The application of these rules helps to determine the overall performance of the enterprise 300. The report generator 206 can retrieve the rules from the report templates and rules database 204 and the issues stored in the issues database 112. The report generator 206 generates a report 208 using the templates from the report templates and rules database 204. The report may be in any desired format, such as Microsoft Word, Excel, PowerPoint, or HTML or in some special format designed for delivery to some other computer or node for use in automatic control. The report may identify issues with the enterprise 300 to find specific opportunities to improve the overall performance of the enterprise 300.
With reference once again to
Report generation rules and templates 116 generate the reports 208. As shown in
Once the analyzers 110 are created and installed and the report templates and rules 116 are put in place, the system may then be called upon to do an assessment of the enterprise 300. At 814, an assessment task A is shown. The assessment task 814 includes, in its definition, a list of the enterprises that are to be analyzed, a list of the nodes at each enterprise which are to be subjected to analysis, and a list of the analysis that is to be performed in the form of the actual names of the analyzers which are to be executed. In addition, the assessment task 814 includes a list of the reports that are to be generated following the analysis. Report generation may be done at the time of the analysis, or the reports may be generated at a later time in a separate session.
Once a task 814 is defined and initiated, the list of enterprises, nodes, and analyzers are passed to the analyzer harness 806. The analyzer harness 806 then proceeds by picking up the analyzers 110 from the database 804, one at a time, and with each analyzer 110 the analyzer harness 806 proceeds through the nodes 302, etc. one at a time. For each node, the harness 806 creates a framework linking the analyzer 110 to configuration and performance information files that are retrieved from the tracker database 106. Using this framework, the harness 806 wraps the analyzer 110 in this environment and causes it to be executed in the context of the list of configuration and performance information files that contain configuration and performance information gathered from the node 302 that is being currently analyzed.
During its execution, the analyzer 110 calls upon special subroutines that generate reports of any issue which warrants management attention and also of any error condition which may arise. After the analyzer 110 terminates, the analyzer harness 806 takes these issue reports and expands them, using issue text templates retrieved from the analyzer database 804 and also information as to the identity of the node and the identity of the assessment task, and creates an expanded report which is stored in the issues database 112 after the analysis have been run against all of the nodes 302, etc. In this manner, an extended issue report is generated in a format that is both human readable and also that lends itself to being incorporated into a database for automated retrieval and manipulation.
The list of reports from the task definition 814 is passed to the report generator 206. The report generator 206 also has access to the report templates and rules database 204 and to the issue report which can be retrieved from the issues database 112. Using all of these materials, an expert system engine within, or supplementing, the report generator 206 evaluates the rules and, under their guidance, examines the issue information, generating high-level conclusions for management concerning the general state of the enterprise. Then, using the report templates, the report generator 206 prepares a variety of reports, as has been explained, setting forth the status of the enterprise 300 and its nodes 302, etc. These are then fed to various recipients of the reports 817.
The results of the performance measurements are used in an operation 706 to weight the effectiveness of an implemented configuration change. Weighting configuration changes involves providing a weighting value or number that represents the relative value of performance improvement based on implementation of a configuration change compared to performance improvements from other configuration changes. In an operation 710, configuration changes with heavy weighting are prioritized above lesser configuration changes. Less effective configuration changes are removed from a set of possible configuration changes or dropped in priority relative to others in that set during an operation 712.
The foregoing description of an embodiment of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the embodiment disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto, and their equivalents.