The present invention relates in general to the field of computers and similar technologies, and in particular to software utilized in this field.
Services Oriented Architecture (SOA) and reusable services are quickly becoming common in computer and business enterprises. SOA is an approach to software implementation where systems are composed of reusable components (referred to as “services”). A service is a software building block that performs a distinct function—such as retrieving customer information from a database—through a well-defined interface.
SOA organizes information resources as substantially independent, reusable services that create an inherently adaptable environment. Business and technical services may be published using open, standard protocols that create self describing services that can be used independently of the underlying technology. Technical independence allows services to be more easily used in different contexts to achieve standardization of business processes, rules and policies. Collaborations, internal and external to an enterprise, can more easily be established enabling improvements in process and information consistency.
The present invention includes, but is not limited to, a method, apparatus and computer-usable medium for dynamically and deterministically evaluating the priority to assign to fixing a failed service on a business process comprising multiple independent services. A connected monitoring service of a computer system monitors the process and dynamically detects one or more failed services among multiple existing services of the business process. When the one or more failed services is detected, a failure prioritization utility executing on the computer system automatically determines a level of importance of each failed service within the business process and then prioritizes the one or more failed services relative to each other based on the determined level of importance. Finally, the failure prioritization utility generates and issues a signal to a system administrator of the priority order for addressing/fixing the one or more failed service(s) to minimize the negative impact on the business process of the failed services.
The above, as well as additional purposes, features, and advantages of the present invention will become apparent in the following detailed written description.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further purposes and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, where:
With reference now to the figures, and in particular to
In addition to the above described hardware components of computer system 100, several software and firmware components are also provided within computer system 100 to enable computer system 100 to complete the process of monitoring various services and calculating priority of failed services, as described below. Among these software/firmware components are operating system (OS) 117 and Failure Prioritization (FP) algorithm/utility 119. FP utility 119 is illustrated as a separate component from memory 115. However, it is understood that, in alternate embodiments, FP utility 119 may be located on a removable computer readable medium or provided as a sub-component part of OS 117. When executed by processor 105, FP utility 119 executes a series of processes, which provide the various functions described below (referencing
The present invention provides an automated process that includes collection of services data and application of a algorithmic function/formula to the collected data, to automatically prioritize the order of repair for services within a service oriented architecture (SOA) when multiple services fail. A brief discussion of SOA and the failure risks is now provided to establish the necessity for the present invention. As previously described, SOA provides a modular approach to computing. There is, however, a need to provide some sort of centralized control over the various services, which have varying degrees of importance to the overall SOA. When there are multiple services provided different levels of functionality to an overall process, some services are typically more critical (or essential) than others to the process. The level of essentialness of each service relative to each other within the particular process falls within a range from the least essential/critical to the most essential/critical. Each process defines the critical nature of a service differently. Thus, a service may be critical (essential) in a first business process but non-critical (non-essential) in another.
According to the invention, these failures are signaled to the computer system 100 via a network (not shown) to which the services (S1-S7) and computer system 100 are communicatively connected. Those skilled in the art are familiar with SOAs and the communication amongst services via Internet-based SOA, which includes a SOAP/HTTP protocol (i.e., a SOAP message protocol using an HTTP transport binding (e.g., remote procedure calls (RPCs) on a service provider by sending one message for each call).
As utilized within the illustrative embodiments, computer system 100 provides a centralized control point for managing the various services within a business process. The computer system (and system administrators that receive, analyze and respond to data there-from) is also responsible for ensuring that essential services are adequately maintained and administered.
When a failure occurs with any one or more of the services contributing to completion of a business process, each failure has some impact on the overall business process(es), some more critical than others. When multiple services fail simultaneously/concurrently, the end user or system administrator conventionally addresses each failure in the order of occurrence or some user-determined/random order. This is because, in conventional failure response methods, the administrator was unaware whether any of the failures are more critical to the business process(es) than another. When multiple failures occur simultaneously/concurrently, however, a substantial amount of time can be spent handling failures of non-critical or non-essential services while the more critical service remains in the failed state, negatively affecting the forward progress of the business process(es).
With convention methods, the business impact is evaluated by the transaction failure at any edge point, and the user has to define the edge point to define a failure. When the same services are utilized by the different applications, failure of the service might affect one application but not the other. By defining the edges, the user needs to understand the edge and configure events for the failures, and it is also impossible to prioritize the services.
The methods provided by the embodiments of the invention enable the FP utility to (1) automatically determine which of the one or more failures needs to be first addressed, and/or the order in which the failed services should be fixed and (2) signal the administrator (or end-user) of that order.
With reference now to the flow chart of
When multiple failures are detected, the FP utility analyzes each failure utilizing a priority function described below and stored data retrieved during monitoring of the system, as indicated at block 212. The priority function utilized in the illustrative embodiment is as follows:
I(s)=R(s)*Fs(S)*Σfp(RS).
The following legend applies to the above function:
Thus the priority of a service failure is calculated based on overall impact to the business process of the particular failure. The higher the value calculated, the greater the impact on the business, and the sooner this service failure should be addressed. Notably, by utilizing the above priority function, the system administrator does not need to define an edge point to define a failure or configure events for the failure whenever the same service is being utilized by different applications.
The above analysis determines the relevant/critical nature of the failure and prioritizes the multiple failures relative to each other (i.e., calculate the business impact of each failure). The FCP utility then assigns the calculated priority to the associated failed services at block 213.
According to the illustrative embodiment, the FCP utility then determines, at block 214, whether they are relevant or critical failures identified, and if not, the FCP utility signals the priority order of the failed service to the system administrator, identifying them as being non-critical. In the illustrative embodiment, a threshold impact value is defined by the system administrator to determine when a failure is critical. If the calculated impact is above this threshold value, then the failure is critical. Returning to the figure, if there are critical failures identified, the FCP utility signals the critical failures to the system administrator, at block 216, with an urgent message indicating the priority status of the particular services, whose failure are determined to be critical. Again, the order of priority of these critical failures is provided to the system administrator. According to the illustrative embodiment, receipt of a signal indicating a critical failure initiates a pre-ordering service/system fix/response based on the priority of the particular critical failures, as shown at block 217. The process then ends at terminator block 218.
I(S5)=3 * 1 * 0=0
I(S4)=1 * 1 * 1=1
The higher the calculated value, the greater the impact of the failure on the business. Thus, applying the above formula to the above example results in a determination that S4's failure is more important to be fixed than that of S5. One advantage of applying this formula to the determination of which failure should be prioritized is that even if though S5 is receiving more requests per second than S4, the impact of S5's failure on any of the parent services is less than the impact of S4.
The embodiments of the invention are particularly effective and useful in SOA. With SOA, software applications may now be extensively re-used (where SOA technique is extremely powerful) and built only when necessary. Furthermore, in a SOA environment, the services come in many forms and shapes, and the implementation platforms and protocols utilized may be different.
It should be understood that at least some aspects of the present invention may alternatively be implemented in a computer-useable medium that contains a program product. Programs defining functions on the present invention can be delivered to a data storage system or a computer system via a variety of signal-bearing media, which include, without limitation, non-writable storage media (e.g., CD-ROM), writable storage media (e.g., a floppy diskette, hard disk drive, read/write CD ROM, optical media), and communication media, such as computer and telephone networks including Ethernet, the Internet, wireless networks, and like network systems. It should be understood, therefore, that such signal-bearing media when carrying or encoding computer readable instructions that direct method functions in the present invention, represent alternative embodiments of the present invention. Further, it is understood that the present invention may be implemented by a system having means in the form of hardware, software, or a combination of software and hardware as described herein or their equivalent. Thus, the method described herein, and in particular as shown and described in
While the present invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. Furthermore, as used in the specification and the appended claims, the term “computer” or “system” or “computer system” or “computing device” includes any data processing system including, but not limited to, personal computers, servers, workstations, network computers, main frame computers, routers, switches, Personal Digital Assistants (PDA's), telephones, and any other system capable of processing, transmitting, receiving, capturing and/or storing data.