Modern computer hardware and software are highly complex, and faults are not uncommon. Many vendors encourage end-users to send reports of errors to the vendor, both for the vendor's information and to enable the vendor to recommend to the user measures that may solve, mitigate, or avoid the problem in question. Many computer systems are arranged to generate automatically an error report, or a report of the state of the computer at the time when the error is detected, and to send such a report to the vendor either automatically or with only minimal intervention by the user. In order to provide a quick response, at least to comparatively straight-forward errors, a computerized diagnostic system operated by the vendor that receives the report from the user's computer may automatically carry out a standard review of the data in the report, and may generate a recommendation for the user in appropriate cases.
However, previously proposed diagnostic systems have typically carried out a standard review on all incoming user reports, using a largely hard-coded procedure. That procedure in some cases results in the diagnostic system running analyses that are clearly inapplicable to a particular report, which is inefficient even if the inapplicable analysis is validated and skipped as soon as it starts. In addition, the need to sequence the analyses correctly, especially if one analysis is dependent on the result of another analysis, may render that procedure somewhat inflexible.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
In the drawings:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings.
Referring initially to
The diagnostic software 26 may passively receive data 32 from other parts of the computer system 24 or, if the part of the computer system 24 affected by error is still at least partly operational, the diagnostic software 26 may send requests 34 for specific further data.
The diagnostic software 26 is in communication with a vendor's system 40. The link between the client system 24 and the vendor system 40 may be, for example, an internet link to a vendor website 41, as described in commonly-assigned U.S. Patent Application Ser. No. 2004/0249914 (Flocken et al.), which is incorporated herein by reference in its entirety.
At the vendor system 40, error reports comprising diagnostic data sets 28 are received by a validation unit 42. The validation unit 42 may determine whether an incoming message is in fact a valid error report from a computer system 24 that the vendor system 40 supports, and may also determine whether the specific client computer system 24 is eligible for support from the vendor system 40. If the client system 20 is not eligible for support, the vendor may wish to analyze error reports for the vendor's own information, or may not consider analyzing such reports to be worth-while. Alternatively, the vendor may consider the risk of spurious error reports to be sufficiently high that it is prudent to discard all messages not emanating from client sites 20 eligible for support.
From the validation unit 42, the validated diagnostic data set 28 passes to a coordination unit 44. In an embodiment, the coordination unit 44 maintains a registry of a plurality of analysis modules or analysis agents 46. Each analysis module 46 is arranged to carry out a specific part of the analysis procedure that is required by a diagnostic data set 28. Some analysis modules 46 may analyze every diagnostic data set 28. Some analysis modules 46 may analyze only diagnostic data sets 28 having specific characteristics. For example, the client computer system 24 may monitor different applications 61 simultaneously, and some analysis may be specific to a particular application whereas other analysis may be more generalized and applicable to multiple software products; therefore, each analysis agent or module 46 subscribes to a type of fault data that is appropriate for the module 46 in question to analyze. Some analysis modules 46 may require the output from other analysis modules 46 before they start, for example, because a later analysis module 46 uses an output from an earlier analysis module 46 as part of the data that the later analysis module 46 analyzes. Some analysis modules 46 may require a specific output from an earlier analysis module 46 as part of the criteria to determine whether the later analysis module 46 will analyze the specific data set at all. Other analysis modules 46 that are not interdependent may analyze the data set 28 in parallel.
Once all of the analysis modules 46 have completed their processing of the data set 28, an analysis report 48 is generated. The analysis report 48 may be sent to the vendor website 41, and a message 50 may be sent to the client site 20 instructing the client personnel 22 or the client computer system 24 how to access the report. As described in the above-mentioned U.S. Patent Application Ser. No. 2004/0249914 of Flocken et al., the analysis report 48 may include recommendations to personnel 22, patches to software in client system 24, or other measures to correct or prevent the future occurrence of an error or otherwise to improve the operation of the client system 24.
Referring now to
The customer site 20 is connected to the vendor system 40 by the transport layer or transport service 100 on the local computer system 24 communicating with a data retrieval service 105 on the vendor system 40.
The vendor system 40 comprises a server services coordination engine 152, which is in communication with associated infrastructure components such as a web user interface 154, a set of configuration files and policy settings 156, a data store 158, a context map 160, and a subscription registry 162. The server services coordination engine 152 communicates with attached service modules through a service oriented interface layer, which in an embodiment is a web services interface 164.
As shown in
Referring now to
As an example, the log files monitor 62 may have a configuration file that includes the paths to error log files of the applications 61 that the log files monitor 62 is monitoring, instructions on how often to run, and a list of error patterns to check for in each error log file. The error patterns may be regular string expressions. If the log files monitor 62 does not have permission to delete error messages from the error log files of the applications 61, the log files monitor 62 may also maintain a record of error messages previously reported, so that the log files monitor 62 does not report the same error message twice. The list of error patterns may be different for each application, and may be determined partly by the registered analysis modules 46 in the vendor system 40. When the log files monitor 62 finds in one of the error log files an error message matching one of the error patterns listed for that error log file, the log files monitor 62 passes the error message to the fault detection service 60. The log files monitor 62 may also pass the error pattern that error message matched, and details about the application 61, such as the application name and version.
The configuration file for the log files monitor 62 may be updated manually when an application 61 is added, removed, or upgraded, or may be updated automatically from time to time. Automatic updates may be generated by the vendor system 40 and downloaded to the client system 24.
The remote trigger 70 enables a diagnostic data set 28 to be generated at the request of an entity remote from the client site 20. For example, if the vendor system 40 is monitoring the operation of the client system 24, the vendor system may use the remote trigger 70 to request a data set 28 at a time when no triggering event has occurred within the client system 24. User personnel 22 may use the manual trigger 68 to generate a data set 28 generally at any time. The logging API 64 or the log files monitor 62 may generate a trigger 30 if specific occurrences are logged, or regularly, for example, when a certain quantity of transactions have been logged.
When the fault detection service 60 receives a trigger 30 in the form of a report from any of the modules 62 to 72, the fault detection service 60 registers the report as a “fault detection” data type, and publishes the report to the coordination engine 52. The actual report may be stored in a database maintained by the coordination engine 52, or the report may be stored in publicly accessible memory elsewhere on the client system 24, with a pointer to the report in a database maintained by the coordination engine 52. The coordination engine 52 includes a continuously running coordination service that detects the newly published reports from the services 60, 74, 86, 100 attached to the coordination engine 52. The configuration files 54 contain a list of available services, and of the data types each of those services is interested in processing. The coordination engine 52 thus determines from the configuration files 54 which service to pass the newly registered report to. Where more than one available service has declared an interest in processing the same data type, the configuration files 54 also specify in what order those services should process the data, or specifies that two or more services may process the same data in parallel.
By way of example, a report of “fault detection” data type, or a specific subtype, may be passed to the data collection service or data collection layer 74 of the framework. The data collection layer 74 comprises data collection programs that may include one or more of operating system data collection 76, application data collection 78, or custom written data collection processes 80. The data collection layer 74 may include the local analysis service 82 and the inventory service 84. The data collection layer 74 may include an agent for collecting data identifying the vendor product or products involved in the diagnosis. The identifying information may include a product or license serial number or other information that directly or indirectly identifies the customer site 20.
The data collection service 74 may select one or more specific data collection program 76, 78, 80 that are appropriate to the specific error message or other data that the coordination engine 52 has passed to the data collection service 74. The data collection service 74 may select a system level collection agent 76 that collects configuration information about the client system 24. This configuration information may be compared with a previous system collection, to assist in determining whether a fault may be related to a recent change in system configuration.
Updated data collection agents 76, 78, and 80 may be downloaded to the client system 24 from the vendor site 40, as may be other agents and services. Customers may be permitted to add or modify data collection agents. However, the vendor system 40 may then verify that the data collected complies with a specified protocol, and may generate an error message if a non-compliant data set 28 is received at the vendor system 40.
A local notification layer 86, which may generate one or more of an external software application fault event 88, an external software application error message 90, and an extensible notification 92, may inform customer personnel 22 that the trigger 30 has been generated and/or that a data set 28 has been generated. The local notification layer 86 may provide additional information such as the nature of the trigger 30, where the framework supports distinct triggers, or the result of any analysis in the local analysis module 82. The local notification layer 86 may communicate with the personnel 22 through a local console 94, which may also be the input point for the manual trigger 68. Local routing devices 96 may transmit commands and data between the sensors 60, the data collection module 74, the local notification module 86, and the local console 94.
Other services available on the local computer system 24 may include the local incident analysis service 82. The local incident analysis service 82 may be part of the data collection service 74, as shown in
Updated local analysis modules 46 may be downloaded or otherwise shipped to the client system 24 from the vendor site 40. When an analysis module 46 is shipped to the client system 24, the analysis module 46 is generally accompanied by a registration instruction set to generate the correct entries in the subscription registry 57, so that the local incident analysis service 82 uses the new analysis module 46 correctly.
The local computer system 24 may also have the incident lifecycle service 98, which controls archiving or disposal of old fault and analysis data. The incident lifecycle service 98 may also control handling of new fault data, for example, by instructing the coordination engine 52 or the other services to ignore certain faults and/or suppress duplicate faults, and/or by controlling which faults are analyzed by the local analysis service 82 and which faults are passed to the vendor system 40.
Other services may be added to the coordination engine 52, and other agents or modules may be added to the services, either by the vendor or by the client personnel 22. For example, at a client site 20 where the client personnel 22 have a high level of expertise, a service may be provided to log faults to a client case management system (not shown), and faults may be passed to the vendor system 40 only after a review by the client personnel 22.
The transport service 100 on the local computer system 24 and the data retrieval service 105 (see
Referring now to
The analysis layer or analysis service 112 includes the coordination unit 44 and the analysis modules 46. The analysis may include one or more of diagnosis 114, configuration assessment 116, patch selection or generation 118, predictive maintenance 120, decision support 121, and license analysis 122. Decision support 121 and license analysis 122 are appropriate when the diagnosis suggests, or is carried out in contemplation of, an expansion or upgrade of the client system 24.
The result 48 of the completed analysis passes to the delivery service 123, which may comprise the publication layer or publication service 127 and the notification layer or notification service 128. The publication service 127 may include publication of the result to one or more of a customer portal 124 such as the website 41, from which the client's personnel 22 can access the result, or an engineer interface 126. The notification service 128, which may include an e-mailer 130 to send a message 50 (
Alternatively, or in addition, the delivery service 123 may send the result of the analysis directly to the client site 20, where the change and configuration agent 125 (see
Where only a message, or a script or other small file transfer, is required, a transport module 131 of the delivery service 123, acting similarly to the transport service 100 and the data retrieval service 105 but in the opposite direction, may deliver the result directly to the client site 20.
Where the result 48 generated by the analysis layer 112 is not sufficient, which may be determined by the analysis layer 112 or by the client's personnel 22, the diagnostic data set 28 passes to the escalation service or escalation layer 132. The escalation service 132 may include one or more of a case logging module 134, an escalation assessment module 136, a remote real-time diagnostic examination module 138 of the client system 24 by the vendor's personnel, a live discussion 140 between vendor's support personnel and client's personnel 22, or an interactive trouble-shooting module 142.
The case logging module 134 may automatically generate a log entry in the vendor 40's workflow management system. The escalation assessment module 136 may assess how urgent and how critical the problem is, and may influence the log entry generated by the case-logging module 134. The escalation assessment module 136 and the case logging module 134 may assign the case to a specific support person or to a general queue, and may in critically urgent cases generate an e-mail, pager, or other alert to request immediate attention. The remote real-time diagnostic examination module 138 or the interactive trouble-shooting module 142 may set up a connection enabling the vendor's personnel and/or the analysis service 112 to obtain additional information from the client system 24. These modules 138 and 142 may communicate directly with the client system 24 through the client services coordination engine 52, or may communicate with or through client personnel 22. The interactive trouble-shooting module 142 may include questioning the client personnel 22 and/or instructing the client personnel 22 to perform actions on the client system 24 that can be monitored by the data collection service 74.
The incident lifecycle service 166, similar to the client incident lifecycle service 98, may allow the vendor system 40 to ignore certain current faults, bypass certain other services for certain faults, and/or may handle archiving and disposal of old fault records.
The reporting service 168 provides the vendor with reports on the faults analyzed by the system 40, and on whether, and if so how, they have been resolved.
In each of the local computer system 24 and the vendor system 40 shown in
Referring now to
Referring now to
In step S228, the data set 28 may be passed to and analyzed locally by a local analysis module 82, and depending on the results of the local analysis the coordination engine 52 may pass the process back to the data collection service 74, and the process may return to step S224 to collect further data. The process may return to step S226 to generate further notifications to client personnel 22 at any appropriate stage.
In step S230, it is decided whether the analysis data set 28 is to be sent to the vendor system 40 for analysis. The decision may depend on the results of the local analysis. For example, the local analysis may have solved whatever problem is under consideration, so that analysis by the vendor is unnecessary, or the client personnel 22 may deny permission for the analysis data set 28 to be forwarded to the vendor system 40. The fault may be one that the incident lifecycle service 98 specifies should be forwarded to the vendor system 40, or should not be forwarded to the vendor system 40. The decision may be notified to the client personnel 22.
When the analysis data set 28 is forwarded to the vendor system 40, in step S232 the entitlement module 106 verifies that the data set 28 is a valid diagnostic data set and is eligible for analysis. If restrictions are to be imposed on the analysis to be carried out because of the identity of the client site 20, the identity of the client system 24, or the relationship between the client system 24 and the vendor system 40, such restrictions may be determined in step S232 and forwarded along with the data set 28 for use in subsequent steps. As discussed above, the data set 28 is forwarded from each relevant service to the next by the coordination engine 152.
In step S234, the coordination unit 44 of the analysis layer 112 receives the diagnostic data set 28, and one or more analysis modules 46 of the analysis layer are selected to analyze data in the data set 28 by a comparison of the data set with criteria established for selection of each available analysis module. In step S236, the one or more selected analysis modules 46 analyze the data, and in step S238 the original data set 28 is supplemented with results from the analysis. The process then returns to step S234, where one or more further analysis modules may be selected. In an embodiment, the criteria established for selection of some analysis modules 46 may include the fact that results are available from other modules, or that other modules have returned specific results.
The loop through steps S234, S236, and S238 may be repeated until all analysis modules applicable to the diagnostic data set 28 have been applied, or until it is otherwise determined that the analysis is complete, or until the analysis is otherwise terminated. When it is determined in step S234 that further analysis is not appropriate, the process proceeds to step S240 to generate a recommendation based on the data set 28 and the result of the analysis in step S236. At step S242, it may be decided whether or not there is a problem that merits escalation to layer 132. This determination may be based on the results of the analysis and/or the entitlements determined in step S232. In step S244, the recommendation generated in step S240 is made available by the publication service 127, and in step S242 the notification service 128 informs the client 20 how to access the recommendation. Where escalation is chosen in step S242, step S244 may be omitted. Escalation is then initiated in step S248, and the notification in step S246 may consist of a communication initiating the communication between the vendor system 40 and the client 20 that is necessary for escalation.
Various modifications and variation can be made in the present invention without departing from the spirit or scope of the invention. For example, although in the described embodiments the party responsible for system 40 is referred to as the “vendor,” the systems and methods described are not limited to situations where a vendor-purchaser relationship exists. The system 40 may be a system that provides the desired support and analysis functions for a “client” system 24, and the system 40 may be operated by the same entity that operates the client system 24.
As described above, the coordination engines 52,152 use databases such as the subscription registries 57,162 and the configuration files 54,156 to determine what messages and data sets 28 to pass to which services. The various services use databases of the available agents to determine what messages and data sets 28 to pass to which agents. Depending on the particular system, such information may be more or less divided into multiple databases or consolidated into fewer databases. The coordination engines 52, 152 may to a greater or lesser extent be able to see specific agents in at least some of the attached services, and to route messages and data sets 28 directly to specific agents in at least some of the attached services. The coordination engines 52, 152 may then be able to enable simultaneous processing of a data set 28 by agents or modules in different services, where other agents in the same services are not permitted to process similar data simultaneously.
Although
Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.