The invention relates to a method for generating, in a computer system, an event log for raw data stored in a source system. The generated event log may be provided to a process mining system.
To achieve business goals, most companies and institution have predefined business processes which have to be followed by the employees. These processes are designed in a way to be carried out which reaches the defined goals in an efficient way, thus adherence to these processes is vital to the companies' efficiency. Unfortunately, monitoring and analyzing processes and checking for irregularities can be time consuming and complex. To overcome this, process mining systems help to analyze the as-is processes.
Most tasks which are reflecting steps on the way of the process are conducted in an IT driven environment and leave traces in an IT system. Picking up these traces and reconstructing the as-is process from this data is the goal of process mining.
From a business process perspective a process involves the following process components:
In order to reconstruct the as-is processes by process mining systems at least one event log has to be provided to the process mining systems. Unfortunately almost no IT-system is prepared in a way that such an event log can be retrieved immediately from the raw data comprising the traces of executed processes. With such an event log the technical requirements are fulfilled to apply process mining techniques.
To address business process questions, the technical representation as an event log combined with process mining techniques at its own is mostly not sufficient for an end user in the role of a business process professional. Such a user requires a non-technical approach a process mining system with prepared analyses based on both event log and adjacent tables/files which contain process information.
It is an object of the invention to provide solutions to sense process steps and build up event logs based on raw data stored in a source system, i.e., to transform the source data into an event log format for further process mining analysis.
This object is solved according to the invention by a method as well as a system according to the independent claims. Preferred embodiments and further developments of the invention are specified in the respective dependent claims.
In one aspect of the invention, a method for generating, in a computer system having a processor and a storage means operatively coupled to the processor, an event log from raw data stored in a source system, the method comprising
Advantageous implementations can include one or more of the following features.
The identifier of a single process step and the unique identifier of the process element assigned to the single process step and the order assigned to the single process step may form a single data record which is stored with the predetermined data structure.
The order of the process step may comprise at least one of timestamp and time interval.
Advantageously, a number of process sensors may be provided to the processor. The number of process sensors may be executed by the processor in order to create a complete event log.
The number of process sensors may be combined according to a number of rules in order to create different event logs. The rules may be provided to the processor. Alternatively, the rules may be derived, by the processor, from the process sensors provided.
An event log package may be provided to the processor, the event log package containing the number of process sensors.
Advantageously, the at least one process sensor may comprise at least one sensor statement. The at least one process sensor may be executed by executing at least one sensor statement of the at least one process sensor.
The at least one sensor statement may be provided in an independent representation, where the processor converts the independent representation of the sensor statement into an executable representation for being executed with a predetermined execution environment.
The execution environment may comprise a database system, a data processing platform and/or an execution environment for data processing tasks.
Further, a set of parameters may be assigned to the at least one process sensor. The values of the parameters may control the behavior of the process sensor when executed by the processor.
Yet further, at least one value of the parameters assigned to a first process sensor may control the behavior of at least one second process sensor.
The generated data stored with the storage means may be provided to a process mining system.
Furthermore, the method may further comprise
Advantageously, the at least one process sensor is adapted to create the set of tables and/or files.
Further, the at least one process sensor may be adapted
The event log package may be provided to a package store for being downloaded by a user of a process mining system. The event log package may contain all process sensors which are necessary for the analysis of one specific business process within the process mining system.
The package store may provide functionality to the user of the process mining system to
The execution of the at least one process sensor may be triggered according to a time schedule.
Furthermore, the invention comprises a computer program product, comprising a computer readable storage means, comprising program code for performing the inventive method, when loaded into a computer system.
Further provided is a computer-based system, comprising:
The following advantages apply when using process sensors to create event logs instead of using database query scripts:
Details and features of the invention as well as specific embodiments of the invention can be derived from the subsequent description in connection with the drawing, in which:
The analysis of business process by a process mining application is for the most part conducted in two stages:
The relation and the content of both the event log package and the analytics package are depicted in
The above-mentioned process components are mapped to technical representations within the source system by executing one or more process sensors. Based on the technical representations the process can be reconstructed solely from the traces which have been left on the source system. The mapped representations are:
These technical components (i.e. the mapped representations) lead then to a data structure of an event log which is the input for process mining systems. The creation of the data structure is performed by the process sensors.
Thus, a process sensor is adapted to derive process data from the raw data comprising the digital traces of the executed processes, and to generate from the derived process data the event log.
The data structure consists in the simplest structure of three columns which reflect the aforementioned process components:
The order may be an attribute that allows sorting, preferably a single timestamp or multiple time stamps representing one or more time spans.
An example of an event log with a single timestamp is given in the following table.
An example of an event log with multiple timestamps is given in the following table.
Given such an event log, the as-is process may be reconstructed by process mining algorithms.
In the following, the components “Process sensor” and the “event log package” are explained in further detail.
A process sensor is able to sense the necessary data for a process step out of the source data, i.e., raw data stored in the source systems.
Thus, a process sensor is a unit which can be applied to the raw data of the source system to sense one or multiple process steps. The respective changes which were triggered in the raw data are then converted into the aforementioned structure of an event log. The minimal set of data which is sensed are the ones mentioned before:
One process sensor is preferably independent of the data storage layer.
In a source system which handles processes the data is usually saved into tables which reside in a data storage, e.g. in a database system. The different data fields which are sensed by the process sensor(s) are often scattered among different tables. The relations between the different tables have to be defined within the process sensor(s).
With this requirement a process sensor S can be defined as a tuple of the data set mentioned before with a unique identifier I, a process step P, an order attribute T and further attributes V:
S=
I,P,T,V
This n-tuple is evaluated during the execution of the sensor S and the results consists of an event log with these columns.
The unique identifier I can further be defined as:
I=
Δ∞ . . . ∞Γ,λ
where Δ and Γ stand for the tables containing the fields necessary for the unique identifier λ corresponding to the process object. The unique identifier λ can consist of any transformation of the fields contained in the consecutive join over (possibly) multiple tables from Δ to Γ.
The process step P can further be defined as:
P=
Π,π,d,Δ∞ . . . ∞Γ∞ . . . ∞Π
where Π corresponds to the table holding the process step description in the field π. d stands for a fixed prefix for the process step. The consecutive join over (possibly) multiple tables from the unique identifier Δ to the process step table Π allows to retrieve the process step name π from a transformation using the fields from all joined tables.
Similar, the order T can be defined as:
T=
Ω,ω,Δ∞ . . . ∞Γ∞ . . . ∞Ω
where Ω corresponds to the table holding the order attribute in the field ω. ω can either resemble a single value (e.g. a single timestamp) or a pair of values (e.g. timestamps to indicate start- and end time of an event). It can be retrieved from any transformation of the fields given by all joined tables.
The further attributes V corresponds to data which can be added directly to the event log data structure or is added to a different table or file with a relation to the event log. It can also be defined as:
V=
Σ,σ,Δ∞ . . . ∞Γ∞ . . . ∞Σ
where Σ corresponds to the table holding the additional attributes in the field σ. σ must not only be a fixed field but can also be the result of a transformation which is based on the fields given by all joined tables.
Since not every detail in the source systems is the same, process sensors can be configured. Every process sensor can have multiple parameters which can be adjusted. These adjustments are later necessary when creating an event log.
The configuration provides a set of value for parameters which were predefined in the process sensors. These values are then subsequently replaced in the process sensors, thus the sensors are then configured for a particular working environment.
Beside the simple configuration with a search/replace technique the process sensor also provides more complex configuration options to:
The process sensor also provides the possibility to maintain variables which can be set in first process sensor and affect the execution of at least a second sensor.
With these configuration options complex use cases can be configured.
In the following the creation of an event log is described in further detail.
The above-described components allow the creation of an event log based on a set of process sensors. In a common approach the event log can be created by a monolithic sequence of data query language commands (mostly SQL-Script). Such a schema is shown in
Multiple different process sensors can be used in order to generate a complete event log. In this case each process sensor senses a subset of process steps and therefore contributes its steps to the complete event log.
This enables to combine multiple process sensors in a modular fashion where each event log line can be the result of a different process sensor. The user then specifies the parameter settings to configure the execution of the process steps. This configuration is then applied to all process sensors. Furthermore, the user has to provide a data representation scheme (e.g. a database) where the source system data (i.e. the raw data) can be sensed. Then the sensors pick up the process steps and forms the event log. This sensing can not only be carried out in a one-way fashion but also in a scheduled manner. Thus, the sensing may for example take place every day at a fixed point in time.
After the sensing of the digital traces which leads to the event log data structure the foundation for applying process mining algorithms are laid out.
While the preconfigured process sensors are set up for an immediate usage by a non-expert user, an extensive development framework is also provided. This framework supports both interaction by a scripting language and with a graphical user interface.
The creation of the process sensors with all elements mentioned before can either be carried out in a scripted/programmed fashion or through a graphical user interface.
In the scripted interface the relations, source tables, target tables and other attributes are scripted in a programming language.
Besides the scripted/programmed interface a graphical user interface is provided to create process sensors solely via a graphical interface. The interface provides an overview over all tables where the user can then choose by mouse operations which fields and relations are resembling the process sensor.
Both, the scripted and the graphical user interface are independent from an underlying execution environment. From the graphical user interface input as well as from the scripted programming language the executable code for the underlying execution platform is generated during runtime.
One event log process package contains all process sensors which are necessary for the analysis of one or more specific business processes. Business processes consist of multiple process steps which cannot necessarily be combined into one process sensor. To account for multiple sensors, multiple process sensors can be combined within one event log process package. An example of an event log process package is depicted in
The process sensors itself only sense the process data and create the event log data structure. Based on this data structure process mining is already possible.
Besides the event log as a process mining capable data structure the package also contains a data model which describes the structure of all elements which have been created by the packages process sensors. This data model must not only contain data structures created by the process sensors but also additional tables which have been linked to the event log.
In the following two examples for process sensors are given: one simple process sensor (Invoice paid sensor) and one complex sensor (Change of content sensor).
The “Invoice paid sensor” senses the process step when an invoice has been paid in an ERP-system. To sense this process step all invoices are considered and as soon as an invoice has a valid date in the field “clear date” this process step is successfully sensed. The data which is then sent to the event log is:
Assuming this invoice paid sensor is used in a standard SAP FI environment, the above-mentioned formal definition I, P, T, V would be:
I=Δ,λ=BSEG,{MANDT,BUKRS,BELNR,GJAHR,BUZEI}
P=
Π,π,d,Δ∞ . . . ∞Π=BSEG,−,“Invoice paid”,−
T=Ω,ω,Δ∞ . . . ∞ΩBSEG,AUGDT,−
V=−
Since all fields are within the same table (BSEG) no joins were needed. The resulting output would then look like this:
The “Change of content sensor” senses all process steps which belong to a change of the content of an invoice. All fields which are analysed by the source system in change logs can then be used as a change process step. The data which is sent to the event log is:
Assuming this invoice paid sensor is used in a standard SAP FI environment the formal definition I, P, T, V would be:
I=Δ,λ=BSEG,{MANDT,BUKRS,BELNR,GJAHR,BUZEI}
P=
Π,π,d,Δ∞ . . . ∞Π=CDPOS,FNAME,“Change of:”,BSEG∞CDPOS
T=Ω,ω,Δ∞ . . . ∞Ω=CDHDR,UDATE,BSEG∞CDPOS∞CDHDR
V=−
In this more complex process sensor three tables are involved: the invoice table “BSEG” and two change log tables “CDPOS” and “CDHDR”. Since the unique ID is retrieved from the invoice table “BSEG”, the process step from the table “CDPOS” and the timestamp from the table “CDHDR” these three tables have to be related within this process sensor.
The resulting output would then look like this:
To answer business process questions the event log and the further tables are not sufficient. The business process professional requires a non-technical way to approach a process mining system. To bridge the gap from the technical event log to business process perspective the analysis package provides prepared analyses to the end user.
Based upon the process event log and the data model as a description of the process event log an analytics package is defined as a set of:
Such an analytics package is pre-configured for a process mining system to give an insight into a specific business process. The perspective of the view on the process is defined in such a way, that a particular business process question is tackled and the analytics package is therefore useful for a process professional.
Each analytics package has the following further properties:
Since all non-static components of the analytics package (e.g. plots, charts, process visualization) require data to be displayed, each analytics package is built on top of an event log package. The event log package provides by the means of a data model a comprehensive description of the event log and all connected tables. Since each event log package can contain information for multiple analytics packages the relation between the event log package and the analytics package is one (event log package) to multiple (analytics package). On the other hand, an analytics package can only be operated when the specified event log package is present.
Analytics package Examples
Two examples for analytics packages are given here for the “Purchase-to-Pay” (P2P) Process: A throughput time analytics package and a process conformance analytics package.
This analytics package gives the process professional the opportunity to determine the time which has passed between two process steps. It contains:
With this analytics package the process professional is able to:
This analytics package gives a process professional the opportunity to analyze the process conformance of the purchase to pay process. It contains:
With this analytics package the process professional is able to:
The package store (shown in
Since the event log packages provides the necessary technical foundation for the analytics package, one can:
To make use of the event log packages for the business process professional with analytics packages, one can use the platform to:
The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The invention can be implemented as a computer program product, that is, a computer program tangibly embodied in an information carrier, for example, in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, for example, a programmable processor, a computer, portable computer, smartphone, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the invention can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, for example, magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, for example, EPROM, EEPROM, and flash memory devices; magnetic disks, for example, internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry. The data can be stored in a database management system, e.g. a relational database management system, object oriented database management system, or hierarchical database management system.
The invention can be implemented in a computing system that includes a back-end component, for example, as a data server, or that includes a middleware component, for example, an application server, or that includes a front-end component, for example, a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, for example, a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The invention has been described in terms of particular embodiments. Other embodiments are within the scope of the following claims. For example, the steps of the invention can be performed in a different order and still achieve desirable results. Accordingly, other embodiments are within the scope of the following claims.