Some embodiments of the invention are described with respect to the following figures:
A tool according to some embodiments is provided to enable extraction of events associated with business processes from various sources for the purpose of enabling reporting about such business processes. Examples of business processes include invoicing, shipping goods, paying bills, approving expenses or purchases, and so forth. To reduce the complexity and detail associated with the reporting of business processes, users of the tool can provide abstract process definitions for identifying a high level, simplified (or otherwise modified) version of the business process that is of interest for the purpose of reporting, as the high-level, simplified (or otherwise modified) version focuses on interesting (or business relevant) aspects, and abstracts out unnecessary details, of the actual business process. Also, process mapping definitions are provided to map events (which have been extracted from various systems that support the execution of the various steps of the actual process) to the steps of interest in the abstract business processes. An “abstract” business process refers to the business process with unnecessary details left out. Using the process mapping definitions and abstract process definitions, the tool according to some embodiments is able to group events into sets of related events, which sets of related events are then mapped to steps of the abstract business process. The sets of related events are also used to produce output information according to a predefined format (e.g., tables), which is then used to provide business process reporting. The output information is stored in a data warehouse for subsequent retrieval and/or manipulation. More generally, the output information is stored in a repository (which can be any storage location).
Although reference is made to business processes, it is noted that techniques according to some embodiments can be applied to other types of processes associated with other types of organizations, such as educational organizations, government agencies, and so forth. A process can be considered a set of one or more linked steps that collectively realize an objective (e.g., a business objective, an educational objective, a government objective, etc.) or a policy goal. An event represents an activity associated with a start or completion of a step in a process. The event also specifies one or more correlation parameters to correlate the event to other events. A data warehouse refers to a collection of one or more databases, implemented on one or more nodes, for storing information.
In some embodiments, the ETL 100 is a software tool executable on a central processing unit (CPU) (or multiple CPUs) 111 that are part of a computer 114. The computer 114 also includes a storage subsystem 116 that contains various files (e.g., databases, tables, etc.) for storing information usable by the ETL tool 100. In
Although the logs 118-126 are depicted as being stored in a storage subsystem 116 in the same computer 114 as the ETL tool 100, it is noted that one or more of the logs 118-128 can be located at a remote storage location on a node that is separate and distinct from the computer 114. In
The various sources of events that are coupled to the computer 114 over a data network 132 include, as examples, a web server 102, an application server 104, an enterprise resource planning (ERP) system 106, a message broker 108, and one of more other sources 110. In other implementations, many other types of sources can be provided. Examples of the data network 132 include a local area network (LAN), a wide area network (WAN), or the Internet.
Some of the sources 102-110 can be workflow engines that execute corresponding business processes. Sources may themselves provide an event log (such as from a workflow engine), or otherwise probes (e.g., probes 134, 136, and 138) may have to be provided to monitor information exchange of the source system and collect event information. For example, probes (in the form of a software application, for example) can be implemented as part of the ERP system 106 and message broker 108 to collect event information. The collected event information can be provided to respective logs 122 and 124.
The events collected into the logs 118-126 can represent invocation of application programs, invocation of software methods (e.g., such as Java routines), communication of data, action by a user, and so forth. Each event can be associated with one or more parameters. For example, an approval message may have the approver's name and the approval result as parameters. As discussed further below, the one or more parameters are used to correlate events to each other.
Each of the abstract process definitions 128 provides an identification of steps of a process that are of interest for reporting. Normally, to reduce the complexity and detail of information in reporting about execution of a process, the respective abstract process definition includes just a relatively small number of steps.
In
A data extraction module 140 in the ETL tool 100 extracts events from the logs 118-126, and provides the extracted events to an events staging area 142. The data extraction module 140 extracts just events that are of interest according to the abstract process definitions 128. The data extraction module 140 uses process mapping definitions 146 to identify events corresponding to subsets of steps that are of interest. Note that the logs 118-126 can contain events for all steps of each execution of a process. To reduce complexity and enhance efficiency (in terms of storage and processing), not all of the events are extracted by the data extraction module 140 from the logs.
The events staging area 142 is a temporary storage location, which can be part of the storage subsystem 116, for temporarily storing information pertaining to extracted events. A process mapping module 144 in the ETL tool 100 then retrieves information about the events from the staging area 142 and scans for events of interest for each particular execution of a process (the events that are mapped to steps identified by the abstract process definition). The process mapping module 144 uses process mapping definitions 146 that map events to corresponding process steps.
In the embodiment depicted in
The process mapping module 144 maps the events into respective execution sets, where each execution set contains events that are part of a particular execution of a process. The events in each execution set are related to each other according to one or more correlation parameters of the events and correlation conditions specified for those correlation parameters. The parameters and conditions are defined by the process mapping definitions 146. In some embodiments, events are correlated in a pairwise fashion. In other words, each given event is correlated to one other event based on some condition specified on a parameter (or plural parameters) of the events in the pair. Each pair of correlated events can then be correlated to one or more other pairs of events such that a chain of events can be defined for a particular execution of a process.
For example, if a given execution of a process has events A, B, C, D, E, and so forth, then the following pairs of correlated events may be specified {A, B}, {B, D}, {D, C}, {C, E}, and so forth. Note that pair {A, B} is correlated to pair {B, D} by event B, pair {B, D} is correlated to pair {D, C} by event D, and so forth. This chain of pairs of events allows all events for a particular execution set (associated with a particular execution of a process) to be identified.
In alternative embodiments, other techniques for correlating events can be utilized.
The abstract process definition includes the specification of which events correspond to the start or completion of each process step. The abstract process definition also specifies correlation parameters (and correlation conditions) of the corresponding events. For purposes of example, a business process can be an approval process (such as for approving a request for an expense, a purchase request, and so forth).
The abstract process definition 128 for the approval process can identify a subset (less than all) of the steps that are of interest for purposes of reporting, or the abstract process definition 128 can identify steps that correspond to a collection of steps in a lower level process. As an example, the abstract process definition 128 for the approval process 200 can identify the submit step 202, validate step 204, and approve step 206 as being the steps of interest for reporting. By omitting the remaining steps (208, 210, 212, 214, 216) in the abstract process definition for the approval process, information associated with such other steps are not extracted for the purpose of developing a report regarding execution of the approval process.
Events that correspond to the start and/or completion of a step can be specified by the process mapping definition 146. For example, for the approval process 200 of
In addition to specifying events (such as the workItemSelection and approval events above), the definer of the abstract process definition also specifies correlation parameters and conditions that allow events that belong to the same execution of a process to be matched (correlated). For example, assume the workItemSelection event has an example parameter approvalRequestID, and the approval event also has the same parameter. This parameter can then be used for matching the events by using the following correlation condition: workItemSelection.approvalRequestID=approval.approvalRequestID. The events can have other parameters.
In the example of
Initially, abstract process definitions 128 (and associated process mapping definitions 146) are defined (at 302) and received and stored by the ETL tool 100 in the computer 114 (
Next, events are extracted (at 304) from the various sources by the data extraction module 140 (
Using the process mapping definitions 146, all execution sets E of events are generated (at 312), where each execution set E contains events for a particular instance (execution) of a process. If only one execution of one process is being evaluated by the tool 100, then just one execution set E would be generated. Basically, each execution set E contains all events for a particular instance (execution) of a process. More precisely, to generate a particular execution set E, for each event e in the set, there is another event ei so that a correlation condition between these two events is defined and is true for the pair {e, ei}. As noted above, pairs of events {ej, ek} are correlated to each other such that a chain of events can be derived for inclusion in the execution set E until there is no event in the staging area 142 that is not in the particular execution set E and that is correlated to an event in E.
In some cases, an event may belong to multiple execution sets. Events that belong to more than one execution set are duplicated (or copied multiple times as appropriate) (at 314). Each execution set is assigned (at 316) an execution ID (which is unique to each execution set). Also, all events within a particular execution set are marked (at 316) with the same execution ID. If an event is copied multiple times because the event exists in multiple execution sets, the multiple copies of the events will have different execution IDs.
Next, the events are loaded (at 318) into the data warehouse 112. The events are loaded as output information in a format that is amenable to process reporting. As part of the loading process, the output information is converted from the execution sets. In one example embodiment, the format of the output information is in the form of various tables, such as the tables depicted in
In the example embodiment of
The process data table 402 according to the example of
There may be multiple event parameters tables 404 corresponding to different event types. Different types of events may have different parameters (and different numbers of parameters) that map to different data structures. For example, an approval request event may have the following parameters: requester name, expense item, and approval amount. The attributes of the event parameters table 404 include: StepName (the name of the step that the particular event is associated with); Time (which indicates the time of the event); StartOrEnd (to indicate whether the event is the start event or end event of a step); ExecutionID; and one or more Parameters (which are the parameters of the event).
Note that the execution ID value is what correlates the step data table 400, process data table 402, and event parameters tables 404. Moreover, the StepName attribute is used to correlate entries of the step data table 400 and the entries of one or more event parameters tables 404, and to denote that the step data refers to the value of the parameter after the step has been completed.
The data in the tables stored in the data warehouse 112 can be subsequently retrieved and presented as output to users. Alternatively, the tables can be manipulated to provide an output in a different form, such as in tables of different forms, charts, bar graphs, and so forth.
Instructions of software described above (including the ETL tool 100 and other software in
Data and instructions (of the software) are stored in respective storage devices (such as storage subsystem 116 in
In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.