This section is intended to introduce the reader to various aspects of art, which could be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light and not as admissions of prior art.
Business processes within or across an enterprise are often partially or totally automated. This automation can be provided by many different systems from legacy applications, such as an Enterprise Resource Planning (“ERP”) system to more modern applications, such as java, web, or workflow applications. Such systems and applications can be heterogeneous, distributed, and independently managed by different entities across the enterprise.
This decentralization and distribution often makes it difficult to get a coherent picture of what processes are actually being performed across the enterprise. For many reasons, however, getting an understanding of what processes are actually performed across the enterprise is advantageous. First, it allows the enterprise to understand its own business operations, which can be helpful to improve those business operations. Second, understanding the processes simplifies the deployment of process monitoring tools. Third, having a process model simplifies fully automating business processes. Conventional process discovery systems are either very simplistic (e.g., they consider only simple cases such as sequential processes or loop-free processes) or limited to considering tasks that are instantaneous.
Advantages of one or more disclosed embodiments will become apparent upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments of the present technique will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation specific goals, such as compliance with system-related and business-related constraints, can vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming but would nevertheless be a routine understanding of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
The term process discovery refers to the extraction of a business process model from events, messages, or other data collected by computer systems. Embodiments of the present invention enable process discovery by extracting a process model from a log file or database comprising events corresponding to the execution of operations by one or more applications. This technique is able to discover complex process models, not just simple models that are sequential or loop-free. In addition, the present technique can be used with operations that are not instantaneous and thus can be characterized by both a start time and stop time. In one embodiment, the present technique is employed by a system that monitors business process and links business processes to Information Technology (“IT”) resource
In one embodiment of the invention, reading the process data comprises reading business process data from a data warehouse or database. In this embodiment, the data warehouse receives log files or database files from a plurality of systems distributed across an enterprise or network. These systems include, but are not limited to, web servers, application servers, ERP systems, message brokers, or other business process management and monitoring systems.
As described above, an entry in the log file contains information regarding the start time, stop time, and context of the steps in a business process. For example, the log file can include the start times and end times for a set of business tasks represented by T1, T2, . . . , Tn. Specifically, for a particular task, Ti, the log file can include a start time, Tis, and an stop time, Tie. The start and stop times for each of the tasks are referred to as an event. In one embodiment, T1, T2, . . . , Tn comprises customer orders and Tis and Tie comprises the start and completion times for a particular parts of a customer order.
Once the business process data has been read from the log file or database, the process 10 continues by creating a trace, as indicated in block 14. The trace comprises a collection of events corresponding to the execution of a business process. In one embodiment, the events in the trace are partially ordered by time. For example, the sequence T3sT2sT2eT3eT1sT1e is a trace, where T3s≦T2s≦T2e≦T3e≦T1s≦T1e. The postfix s and e in this example denote the start and completion of a task Tx. In one embodiment, the trace is created using computer software, such as a set of Structured Query Language (“SQL”) scripts. In one embodiment, the trace comprises a sequence of events that a customer order goes through during an order fulfillment process.
After the traces have been created, it is advantageous to reorganize the traces, as indicated by block 16. Reorganizing the traces is advantageous because it is not uncommon for the end of one event and the start of another to occur simultaneously. This typically occurs when the scheduling is so fast that the granularity of the log file does not distinguish between the end of one event and the start of the next event, or when the transaction monitoring system, if one is present, logs the same timestamp for both events. For this reason, the trace can be reorganized so that stop events are placed before start events if the time stamps for both events are the same. For example, if time stamp (T1e)=time stamp (T2s), then a subset of a trace of the form T2sT1e would be rearranged to T1eT2s. This is important as it helps organizing the events in the trace in a way that corresponds to the actual execution.
Once the traces have been reorganized as described above, model detection begins, as indicated in block 18. The model detection process will be described in greater detail below with regard to
One example of a process structure is a sequence. In a sequence, a task Y is enabled in the process structure after the completion of another task X. In such case, there exists a directed link from X to Y, which is denoted by Seq(X, Y).
Another example of a process structure is a split. In a split, a single process splits into multiple branches. For example, suppose that task X splits into tasks Y1, Y2, . . . , Yn. That is, there exists n directed links from X to Y1, Y2, . . . , Yn, respectively. There are three main types of splits: (1) an XOR-Split wherein exactly one of the branches is chosen to execute. The XOR-Split is denoted by XOR-Split(X; Y1, Y2, . . . , Yn); (2) an AND-Split, in which all of the branches are executed in parallel. (i.e., all the tasks are conducted simultaneously). The AND-Split is denoted by AND-Split(X; Y1, Y2, . . . , Yn); and (3) an OR-Split, which encompasses the remaining split process types that do not belong to the XOR-Split or the AND-Split. The OR-Split is denoted by OR-Split(X; Y1, Y2, . . . , Yn).
Yet another example of a process structure is a join. In a join, multiple process branches merge into a single process branch. For example, the tasks X1, X2, . . . , Xn could join into task Y. Similar to the split, there are three types of joins: (1) an XOR-Join, wherein exactly one of the branches merges with another branch. The XOR-Join is denoted by XOR-Join(X1, X2, . . . , Xn; Y); (2) an AND-Join wherein every one of the branches needs to be executed before the merging into a single merged flow. The AND-Join is denoted by AND-Join(X1, X2, . . . , Xn; Y); and (3) an OR-Join, which encompasses join structures not belonging to XOR-Join or AND-Join. The OR-Join is denoted by OR-Join(X1, X2, . . . , Xn; Y).
As indicated in block 32, the first step in model detection is to derive the ImmedFollow Set. Given tasks X and Y, X will be in the set ImmedFollow(Y) if (1) the sequence of events YeXs is contained in the trace (note that there is no event between Y and X), and (2) Xs−Ye is relatively small. The ImmedFollow set aims at identifying the possible causal relations between the tasks.
Generally, if X∈ImmedFollow(Y), there are a Sequence, XOR-split or XOR-join. If on the other hand, X∉ImmedFollow(Y), the link between X and Y can still be of the AND/OR-Split or AND/OR-Join type. This is the case because the order of subsequent events cannot be determined from the ImmedFollow set alone. For example, given AND-Split(A; B, C), each time A is completed, B always starts before C. Namely, AeCs never occurs, therefore C∉ImmedFollow(A) and yet A can still split into B and C. To handle such cases, the Follow set is derived, as indicated by block 34. Given task X and Y, X will be in the set Follow(Y) if (1) sequence Ye*Xs appears in at least one trace, wherein the asterisk denotes that there could be zero or more start events between Y and X, and (2) Xs−Ye is relatively small. From this, the reader will appreciate that for any task X, ImmedFollow(X) ⊂ Follow(X).
After the Follow Set has been derived, process structure detection proceeds, as indicated in block 36. The process structures are detected by employing the following heuristic rules. First, the process structure Seq(X, Y) is discoverable if (1) |ImmedFollow(X)|=1 and (2) Pr(XeYs|Xe) is high. In one embodiment, a high probability is defined to be greater than about 0.9. The first condition reflects that Seq(X,Y) is discoverable if there is only one event in the set ImmedFollow(X). The second condition reflects that the probability of the event Ys occurring after Xe is high. The second condition results because if it is always the case that whenever task X finishes, it is immediately followed by the start of task Y, then it is plausible that X causes Y (i.e., there is a directed link from X to Y). In one embodiment, the Seq(X,Y) is discovered if (1) |ImmedFollow(X)|=1 and (2) Pr(XeYs|Xe)>0.9.
Second, the process structure XOR-Split(X; Y1, Y2, . . . , Yn) is discovered if (1) one of Y1, Y2, . . . , Yn is in the set ImmedFollow(X) and (2) ∀i, j ∈[1, n], Pr(co-occurrence(Xe, Yis, Yjs)|Xe) is low. In one embodiment, a low probability is defined to be less than about 0.05. The second condition states that for every i and j ranging from 1 to n, the probability of Xe, Yis, Yjs occurring together in the same trace is low. This condition is needed to ensure that XOR-Split will only be discovered where only one of the branches Y1, Y2, . . . , Yn can be chosen. In one embodiment, XOR-Split (X; Y1, Y2, . . . , Yn) is discovered if (1) one of Y1, Y2, . . . , Yn is in the set ImmedFollow(X) and (2) ∀i, j ∈[1, n], Pr(co-occurrence(Xe, Yis, Yjs)|Xe)<0.05.
Third, the process structure AND-Split(X; Y1, Y2, . . . , Yn) is discovered if (1) Y1, Y2, . . . , Yn∈Follow(X) and (2) Pr(co-occurrence(Xe, Y1s, Y2s, . . . , Yns)|Xe) is high. The second condition states that for every i and j ranging from 1 to n, the probability of Xe, Yis, Yjs occurring together in the same trace is high. This condition is needed to ensure that AND-Split will only be discovered if where all of the branches Y1, Y2, . . . , Yn are executed once X finishes. In one embodiment, the process structure AND-Split(X; Y1, Y2, . . . , Yn) is discovered if (1) Y1, Y2, . . . , Yn∈Follow(X) and (2) Pr(co-occurrence(Xe, Y1s, Y2s, . . . , Yns)|Xe)>0.95.
Fourth, the process type OR-Split(X; Y1, Y2, . . . , Yn) is discovered if the trace contains a split process structure that is neither XOR-Split nor AND-Split. Lastly, those skilled in the art will appreciate that it is possible to compensate for any noise in a system by adjusting the particular thresholds employed.
Those skilled in the art will recognize that the heuristic rules to identify join process structures are symmetrical to the rules described above for split process structures. For example, AND-Join(X1, X2, . . . , Xn; Y) is discovered if (1) Y∈Follow(X1, X2, . . . , Xn) and (2) Pr(co-occurrence(Ys, X1e, X2e, . . . , Xne)|Ys) is high. The heuristic rules for the remainder of the join process structures discussed above can be derived from the split process structures in a similar fashion.
Further, as stated above, those skilled in the art will also appreciate that sequence, split, and join are only three possible examples of process structures. In alternate embodiments, additional process structures can be discovered. In those cases, heuristic rules similar to those stated above could be developed to permit the discovery of those additional process structures.
The modules (blocks 54, 56, 58, and 60) are hardware, software, or some combination of hardware and software. Additionally, an individual module does not necessarily solely comprise each module function as illustrated. In other words, the modules shown in the blocks 54, 56, 58, and 60 are merely one example and other embodiments can be envisaged wherein the functions are split up differently or wherein some modules are not included or other modules are included. The illustrated modules (blocks 54, 56, 58, and 60) comprise a process data extraction module (block 54) that extracts data from a log file or database, a trace creation module (block 56) that creates a trace based on the extracted data, a trace reorganization module (block 58) that reorganizes the trace, and a model detection module (block 60) that detects the process models within the trace. In this embodiment and in other envisaged computer system embodiments, a user incorporates the functionality of the computer 52 to enhance the performance of the process discovery technique previously discussed. For example, the computer (block 52) can discover process models, as described above, by utilizing the modules represented by blocks 54, 56, 58, and 60.
In one embodiment, a process discovery engine 106, such as the computer system 50 described above, extracts data from the log files 104a-104e, creates one or more traces based on the extracted data, reorganizes the traces, and detects the process models within the traces. For example,
While the invention can be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5276870 | Shan et al. | Jan 1994 | A |
5325525 | Shan et al. | Jun 1994 | A |
5412806 | Du et al. | May 1995 | A |
5546571 | Shan et al. | Aug 1996 | A |
5694591 | Du et al. | Dec 1997 | A |
5826239 | Du et al. | Oct 1998 | A |
5870545 | Davis et al. | Feb 1999 | A |
5937388 | Davis et al. | Aug 1999 | A |
6041306 | Du et al. | Mar 2000 | A |
6078982 | Du et al. | Jun 2000 | A |
6278977 | Agrawal et al. | Aug 2001 | B1 |
6308163 | Du et al. | Oct 2001 | B1 |
6728932 | Chundi et al. | Apr 2004 | B1 |
6938240 | Charisius et al. | Aug 2005 | B2 |
7236940 | Chappel | Jun 2007 | B2 |
20020138316 | Katz et al. | Sep 2002 | A1 |
20020161823 | Casati et al. | Oct 2002 | A1 |
20020170035 | Casati et al. | Nov 2002 | A1 |
20020174093 | Casati et al. | Nov 2002 | A1 |
20020194257 | Casati et al. | Dec 2002 | A1 |
20030018694 | Chen et al. | Jan 2003 | A1 |
20030023450 | Casati et al. | Jan 2003 | A1 |
20030028389 | Casati et al. | Feb 2003 | A1 |
20030083910 | Sayal et al. | May 2003 | A1 |
20030084142 | Casati et al. | May 2003 | A1 |
20030101089 | Chappel et al. | May 2003 | A1 |
20030120530 | Casati et al. | Jun 2003 | A1 |
20030144860 | Casati et al. | Jul 2003 | A1 |
20030149604 | Casati et al. | Aug 2003 | A1 |
20030149714 | Casati et al. | Aug 2003 | A1 |
20030153994 | Jin et al. | Aug 2003 | A1 |
20030154154 | Sayal et al. | Aug 2003 | A1 |
20030191679 | Casati et al. | Oct 2003 | A1 |
20030212569 | Casati et al. | Nov 2003 | A1 |
20030225604 | Casati et al. | Dec 2003 | A1 |
20030225644 | Casati et al. | Dec 2003 | A1 |
20030233273 | Jin et al. | Dec 2003 | A1 |
20030233341 | Taylor et al. | Dec 2003 | A1 |
20030236659 | Castellanos | Dec 2003 | A1 |
20030236677 | Casati et al. | Dec 2003 | A1 |
20030236689 | Casati et al. | Dec 2003 | A1 |
20030236691 | Casati et al. | Dec 2003 | A1 |
20040015378 | Casati et al. | Jan 2004 | A1 |
20040044636 | Casati et al. | Mar 2004 | A1 |
20040103076 | Casati et al. | May 2004 | A1 |
20040103186 | Casati et al. | May 2004 | A1 |
20040153329 | Casati et al. | Aug 2004 | A1 |
20040199517 | Casati et al. | Oct 2004 | A1 |
20040205187 | Sayal et al. | Oct 2004 | A1 |
20050256818 | Sun et al. | Nov 2005 | A1 |
Number | Date | Country | |
---|---|---|---|
20060167923 A1 | Jul 2006 | US |