The field relates generally to enterprise information systems (EIS). More specifically, the field relates to generating a process model from plain event logs and analyzing weak-spots in the generated process model.
Monitoring and improving business performance is an important area of business management activity. In order to monitor an organization's performance as a whole, business related aspects of the organization are represented as business processes in business process model. The business process model is a model of one or more business processes, and defines the ways in which operations are carried out to accomplish the intended objectives of an organization. Techniques to model business process include flow chart, functional flow block diagram, control flow diagram, Integration definition (IDEF), etc. The business process model typically shows business data and information flow associated with the business processes. By comparing and contrasting the business process model representing the actual performance of a business with a priori process model, the business analysts can define, understand, and validate their business performance.
However the challenge of monitoring and improving business performance in an enterprise information system (EIS), lies in performing business analysis for built-in business process that do not have a priori process models to facilitate analysis. Typically, in an EIS with explicitly modeled process logic, key performance indicators (KPIs) can easily be defined on the process level. Then, the data context attached to the corresponding process instance logs are examined to find patterns which may be reasonably explainable causes for KPI violations. However, in EIS such as a Business Suite, such a layer of explicit process logic is not available as these systems normally run built-in processes. As used herein, built-in refers to the execution of business logic without using an explicit process engine. Rather, the processes evolve according to the actual usage of a system, leading to an individual “implicit process logic” for almost every system operator.
Various embodiments of systems and methods for generating a process model from plain event logs and analyzing weak-spots in the generated process model are described herein. In an aspect, the method involves obtaining an event log that includes events grouped by process instances. Based on analyzing the event log a process graph is generated. In a further aspect, weak-spots within the event log are determined based on analyzing statistical information in the event log. In another aspect, one or more visual representations of the generated process graph, indicating the weak-spots, are generated. At least one of the one or more visual representations of the process model is rendered in response to receiving a selection of the at least one visual representation. In yet another aspect, the weak-spots are transformed into a data structure and provided as input to a rule mining algorithm for generating a set of rules defining the weak-spots. The set of rules received from the rule mining algorithm are rendered on a graphical user interface (GUI).
These and other benefits and features of embodiments will be apparent upon consideration of the following detailed description of preferred embodiments thereof, presented in connection with the following drawings.
The claims set forth the embodiments with particularity. The embodiments are illustrated by way of examples and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. The embodiments, together with its advantages, may be best understood from the following detailed description taken in conjunction with the accompanying drawings.
Embodiments of techniques for generating a process model from plain event logs and analyzing weak-spots in the generated process model are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the embodiments can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail.
Reference throughout this specification to “one embodiment”, “this embodiment” and similar phrases, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one of the one or more embodiments. Thus, the appearances of these phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The method 100 includes at least the following process illustrated with reference to process blocks 110-190. In an aspect, at block 110, an event log composed of events grouped by process instances is obtained. The term “event log” refers to a chronological record of computer system transactions, called events, which are persisted to a log file on the system. The event log includes meta-data regarding the circumstances under which an entity performed the transactions, including, time stamp, transaction history, originator, event-entity relationship etc. The log file can be reviewed to identify or audit an entity's actions on the system or processes occurring within the system. In an aspect, the transactions are recorded automatically and independently of the entity whose behavior is the subject of the transaction. In an example, the event log is the audit trail of a workflow management system or the transaction logs of an enterprise resource planning system. The event log is structured by grouping a set of identified activities/events under a process instance. The process instance is the subject that undergoes the activities associated with the recorded events. For example, activities such as ‘create purchase order,’ ‘change of line item,’ ‘sign,’ ‘release,’ ‘input of invoice receipt,’ and ‘pay’ may be grouped under a process instance purchase order line item. The event logs can be received from one or more data source systems such as a data warehouse, an integrated ERP system, CRM system, Workflow system, legacy system, external feed, web service, etc. An example XML format similar to a standard-format used in the field of process mining would be:
At process block 120, a process graph is generated using a process mining algorithm by analyzing the event log. The process graph represents a sequence of related, structured activities and tasks that serve a particular goal. The process mining algorithm builds the process graph showing process start tasks, process end tasks, successor relations, and various routing constructs such as mutual exclusivity, parallelisms, etc. In an aspect, the process graph is expressed in a visual form, for example, by using Petri Nets or event-driven process chains (EPC). The process graph is followed by nodes representing activities and paths between nodes representing the transition flow between the activities. In an example, time and location stamps in the event logs may be used to determine how a process was undertaken. The data in the “originator” field in the event log may be used to determine which entity was involved in the process. The transaction history and the relationship of entities involved in a particular transaction may be used to determine what happened in the process. Based on the transaction information contained in the event logs, a process model or process graph is deduced to reveal paths within a business process, without a-priori process to guide the process modeling function. In as aspect, the process graph is deduced by applying heuristics to transactions recorded in the event log using which the process graph is built. Applying heuristics to the transactions recorded in the event log yields a prediction of the flow of process that is likely to follow a particular task or process instance. In an aspect, the process graph is generated to include only core process instances by neglecting exceptional behavior within the event log. Such exceptional behavior is detected using statistical data from the event log. For example, the process mining algorithm may neglect low frequency behavior observed from the statistics derived from the event logs.
At process block 130, the process graph is analyzed to determine weak-spots in the process. In an aspect, the weak-spots refer to activities that deviate from a standard order of activities. Since the process model is generated without a-priori process model, the standard order of activities is derived using statistical information extracted from the event log. In an example, the statistical information may be derived from transaction history and optionally by applying heuristics on the transaction history. For example, a standard order of activity may require the activity of placing an order with supplier to be performed upon receiving an approval from the finance team. A weak-spot may be identified if an order is placed before getting the approval. Also, weak-spots encompass deviations in the time taken to perform an activity when compared to an average transaction time for a particular task. The average transaction time may be derived based on analyzing statistical information collected from the event log. In an aspect, the generated process graph is annotated with visual indicators indicating the weak-spots in the process.
At process block 140, a high-level representation and a low-level representation of the process graph is generated. The generated high-level and low-level representations include weak-spot indicators indicating weak-spots in the generated process graph. In an aspect, the high-level representation of the process graph provides a visual indication of frequently used paths and process behavior in the process graph and abstracts the process graph from semantics relating to process nodes and process paths in the process graph. The term “semantics” as used herein refers to specific modeling details such as the relationships between the nodes in the process graph, the routing constructs associated with a node such as parallelisms and mutual exclusivity, synchronization, exclusive choice, and loops.
On the other hand, the low-level representation of the process graph includes semantics information relating to process nodes and process paths in the process graph. For example, the low-level representation shows process transition patterns between the process nodes. In an aspect, the low-level view may not include all paths and exceptional behavior, rather only those paths essential to form an overview of the process behavior are represented, in an aspect, the process transition patterns are illustrated using gateway nodes in the process paths. The gateway nodes clarify the semantics of incoming and outgoing process paths associated with a task.
Since the low-level representation including detailed process information boosts the overall size of the visualized process model, a means for browsing the model in a step-by-step manner is provided. In an aspect, the model is structurally clustered and nodes which are in close proximity to each other are collapsed into groups. The groups constitute coherent segments of the process graph and can be explored step-by-step by expanding the collapsed group.
The high-level representation is rendered on a graphical user interface (GUI) at process block 155 based on determining that a selection for the high-level representation is received at process block 150. The selection for the high-level representation may be provided by a technical domain expert, a process analyst or any other user with limited or no technical background. Alternatively if a selection for a low-level representation is received at process block 150, a low level representation of the process graph is rendered on the GUI. In an aspect, the low-level representation may be initially rendered with collapsed process nodes, at process block 160, and the collapsed nodes may be rendered in expanded mode in response to receiving an input, at process block 165. Irrespective of the visualization type selected, the process graph is rendered on the GUI with visually marked weak-spots assigned to one or more nodes (tasks) or paths (transitions) within the process graph. The visually marked weak-spots are rendered selectable for further analysis. For example, a user may select a weak-spot for further analysis by simply clicking the weak-spot using an input means such as a mouse click. Alternatively, the weak-spot may he selected for analysis by simply hovering a cursor over the weak-spot in the process graph rendered on the GUI.
At process block 170, the selected one or more weak-spots are subject to further analysis by transforming the selected weak-spots into a tabular data structure suitable for rule mining. In an aspect, the tabular data structure is generated by creating a data variable column for each data context variable and a target class column for each selected weak-spot. In an aspect, the target class column receives binary values to indicate whether a weak-spot has occurred in a process instance or not, the process instance defined by a combination of the data context variables. For each process instance in the event log, a row in the tabular data structure is filled with actual data context values and the target class column is filled with information regarding the presence or absence of a weak-spot.
At process block 180, the generated tabular data structure is provided as input to a rule mining algorithm such as C5.0 or Fuzzy Unordered Rule Induction Algorithm (FURIA) or any other rule mining algorithm that provides unordered rules for the weak-spots, i.e., rules which can be interpreted each for its own independent of others. The set of rules generated by the rule mining algorithm is then rendered on the GUI, at process block 190.
Further, as shown in the figure, the process model includes visually marked weak-spots 252 and 255. In the given example, the path between the nodes ‘create sales order’ 210 and ‘update sales order’ 230 and the looping path associated with the ‘update sales order’ node 230 are indicated as weak-spots. The paths are marked as weak-spots to indicate a high transition time between the tasks ‘create sales order’ and ‘update sales order.’ For example, the transition time between the tasks may be detected as a deviation based on applying heuristics to the event log.
In an embodiment, upon selecting the option for viewing a low-level representation, the process model visualization tool renders the low-level representation 300 of the process model on the GUI as shown in
Irrespective of the type of view selected, the weak-spots 336 to 340 rendered in the visual representation of the process model are selectable from the GUI using any input means including hut not limited to mouse, keypad, keyboard, and touch display. In an aspect, upon selecting a weak-spot 336 to 339 or 340, an analysis 267 of the weak spot is presented and a rule-mining algorithm is triggered to discover business rules defining the weak-spots. The discovered business rules 268 are rendered on the GUI as part of the dashboard 260. The rules generated by the rule mining algorithm embody “explanations” for the weak-spots 336 to 339 and/or 340. In an example, based on the rule set the most characteristic data values of all the data values within the process instance logs, that are potentially “causing” the weak spot are identified and displayed to the user.
Upon selecting one or more weak-spots for further analysis, a tabular data structure 400 suitable for rule mining is internally generated as illustrated in
The rows 445, 450, 455, or 460 under each of the data variable field are populated with actual data context values as recorded in the process instance logs. Also the rows 445, 450, 455, or 460 under the target class fields indicate the presence or absence of a corresponding weak-spot. In an example, referring to row 445, it may be derived from a process log, that for an automotive industry in Germany, the project duration for an internal request type is 20 days and has an associated weak-spot in the process execution, in that, a rule that task A mast follow task B after two hours is violated. The generated data structure is provided as input to a rule mining algorithm such as C5.04 or Fuzzy Unordered Rule Induction Algorithm (FURIA) to deliver unordered rules for the weak spots, i.e. rules which can be interpreted each for its own without knowing the others. For example, the generated rules predict for which combination of data variables, the weak-spot occurs or does not occur. The mined rules can then be utilized to alter the information system to prevent problematic process instances by counteracting (only) in specific data contexts.
Further, the graphical user interface (GUI) 640 includes an interface for user interaction 645 and an interface for model exploration 650. The model exploration interface 650 enables the navigation of the process model that is rendered on the GUI 640. For example, the model exploration interface 650 enables a user to select one or more options including selecting a view (low-level/high-level) of the process mode, selecting a weak-spot, performing rule mining, etc. The user interaction interface 645 enables a user to set threshold values related to events in the process model.
Examples of the data source systems 615 include a data warehouse, an integrated ERP system, CRM system, workflow system, legacy system, external feed, and web service, and MXML event log file. In an embodiment, a business suite transfers its events into a data structure and provides it to the log import module 620. The log import module 620 abstracts (622) the event log and provides the abstracted event logs to the mining algorithm 625 for model generation (627). The process model generated by the mining algorithm 625 is provided to the export factory 630 for rendering on the GUI 640. The event log abstracted at 622 is also provided to the rule factory 635 for transforming the data context variables and weak-spots detected in the event logs into a data format for rule mining. The data structure is then provided to an appropriate rule mining algorithm, in this example implemented by a BOBJ predictive workbench 637. The ruleset from the rule mining algorithm is received by the rule factory 635 and provided to the GUI 640 for display, in an aspect, the analysis of the event log and the production of rule set are initiated in response to receiving a selection of one or more weak-spot in the process models that is rendered on the GUI 640. For example when a weak spot indicated in the process model is selected, a request for the rules defining the weak-spot is sent to the processor 610, where the input data structure for the rule mining algorithm is generated by the rule factory and provided to the rule mining algorithm 625. The ruleset generated by the rule mining algorithm is received by the rule factory 635 and sent back to the GUI 640 where the ruleset is displayed.
Therefore this work aims at enabling a model-driven analysis layer also for built-in business processes, requiring only correlated process instance logs and no further a-priori knowledge on the process behavior. The innovation of the method presented here consists in utilizing suitable chaining and data milliner components to presuppose a minimal amount of user intervention, making it a wizard-like or guided approach to quickly get from a plain event log, an overall overview of the “main” process behavior with individual weak-spots and their explanations.
Some embodiments may include the above-described methods being written as one or more software components. These components, and the functionality associated with each, may be used by client, server, distributed, or peer computer systems. These components may be written in a computer language corresponding to one or more programming languages such as, functional, declarative, procedural, object-oriented, lower level languages and the like. They may be linked to other components via various application programming interfaces and then compiled into one complete application for a server or a client. Alternatively, the components maybe implemented in server and client applications. Further, these components may be linked together via various distributed programming protocols. Some example embodiments may include remote procedure calls being used to implement one or more of these components across a distributed programming environment. For example, a logic level may reside on a first computer system that is remotely located from a second computer system containing an interface level (e.g., a graphical user interface). These first and second computer systems can be configured in a server-client, peer-to-peer, or some other configuration. The clients can vary in complexity from mobile and handheld devices, to thin clients and on to thick clients or even other servers.
The above-illustrated software components are tangibly stored on a computer readable storage medium as instructions. The term “computer readable storage medium” should be taken to include a single medium or multiple media that stores one or more sets of instructions. The term “computer readable storage medium” should be taken to include any physical article that is capable of undergoing a set of physical changes to physically store, encode, or otherwise carry a set of instructions for execution by a computer system which causes the computer system to perform any of the methods or process steps described, represented, or illustrated herein. Examples of computer readable storage media include, but are not limited to: magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs, DVDs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store and execute, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer readable instructions include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment may be implemented using Java, C++, or other object-oriented programming language and development tools. Another embodiment may be implemented in hard-wired circuitry in place of, or in combination with machine readable software instructions.
A data source is an information resource. Data sources include sources of data that enable data storage and retrieval. Data sources may include databases, such as, relational, transactional, hierarchical, multi-dimensional (e.g., OLAP), object oriented databases, and the like. Further data sources include tabular data (e.g., spreadsheets, delimited text files), data tagged with a markup language (e.g., XML data), transactional data, unstructured data (e.g., text files, screen scrapings), hierarchical data (e.g., data in a file system, XML data), files, a plurality of reports, and any other data source accessible through an established protocol, such as, Open DataBase Connectivity (ODBC), produced by an underlying software system (e.g., ERP system), and the like. Data sources may also include a data source where the data is not tangibly stored or otherwise ephemeral such as data streams, broadcast data, and the like. These data sources can include associated data foundations, semantic layers, management systems, security systems and so on.
In the above description, numerous specific details are set forth to provide a thorough understanding of embodiments. One skilled in the relevant art will recognize, however that the embodiments can be practiced without one or more of the specific details or with other methods, components, techniques, etc. In other instances, well-known operations or structures are not shown or described in detail.
Although the processes illustrated and described herein include series of steps, it will be appreciated that the different embodiments are not limited by the illustrated ordering of steps, as sonic steps may occur in different orders, some concurrently with other steps apart from that shown and described herein. In addition, not all illustrated steps may be required to implement a methodology in accordance with the one or more embodiments. Moreover, it will be appreciated that the processes may be implemented in association with the apparatus and systems illustrated and described herein as well as in association with other systems not illustrated.
The above descriptions and illustrations of embodiments, including what is described in the Abstract, is not intended to be exhaustive or to limit the one or more embodiments to the precise forms disclosed. While specific embodiments of, and examples for, the one or more embodiments are described herein for illustrative purposes, various equivalent modifications are possible within the scope, as those skilled in the relevant art will recognize. These modifications can be made in light of the above detailed description. Rather, the scope is to be determined by the following claims, which are to be interpreted in accordance with established doctrines of claim construction.