This application claims priority under 35 U.S.C. 119 from Japanese Application 2010-148316, filed Jun. 29, 2010, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a business process analysis method, system, and program for extracting business processes by analyzing work logs recorded in a computer-readable medium.
2. Description of Related Art
In recent years, inevitable globalization of business and wide spread adoption of cloud computing services make it more and more difficult for interested parties to figure out their business process procedures. In the meanwhile, business process management (BPM) has been drawing increasing attention from corporate executive officers. For example, one of top priorities for corporate chief information officers is to improve their business processes.
Conventional commercial tools for BPM solutions mainly function to support a structured business process, i.e., a workflow based on routine and specific rules. Such tools are suitable for the automation of workflows given set formats, such as expense management and purchase process. The BPM technologies enable visualization of an actual operation situation by analyzing event logs generated by such a routine workflow.
There are, however, many application fields where it is difficult to build routine workflow models of their business processes. That is, business processes are hardly or not at all structured; rather, they are extremely dynamic, highly dependent on workers, and have an ad-hoc aspect.
The concept of case management or adaptive workflow represents a solution for an agile process that allows the user to dynamically change a process and create a new process in a desired form. For example, various risk evaluations in businesses, medical underwritings, and insurance assessments are some typical business processes in the real world that require dynamic and human-oriented determination by persons with various types of roles, such as a risk manager, an on-site assessor, an examiner, a doctor, a lawyer, and an assessor.
One of the major problems related to a process that is hardly or not at all structured is that it is difficult to visualize what is actually happening, e.g., who is performing which task in which order. If such a process is managed by a centralized operation engine, the visualization is not very difficult. In reality, however, people tend to cooperate with one another by using email, chat, and individual business tools, which makes it more difficult to visualize what is actually happening in business processes.
A conventional process mining technique such as the α-algorithm is effective for visualizing a business process which has been structured based on given event logs, but is not so effective for an unstructured business process. That is, applying the process mining to an unstructured business process only provides a complicated and disorganized result, which is far from what the analyst expects.
In view of such circumstances, a process mining technique called Heuristic Miner has been recently proposed by A. J. M. M. Weijin, W. M. P. van der Aalst and A. K. Alves de Medeirons, (Process mining with the heuristicsminer algorithm, Research School for Operations Management and Logistics, 2006).
In addition, a technique called Fuzzy Mining has been recently proposed by Christian W. Gunther and Wil M. P. van der Aalst (Fuzzy mining—adaptive process simplification based on multi-perspective metrics, In proceedings of the 5th International Conference on Business Process Management, 2007), and Wil M. P. van der Aalst and Christian W. Gunther (Finding structure in unstructured processed: The case for process mining, In Proceedings of the 7th International Conference on Application of Concurrency to System Design, 2007).
Algorithms provided by these techniques use measures, such as dependence probability, importance, and correlation, to collect nodes and disconnect links to provide a structure to an unstructured process. While these algorithms can efficiently handle exceptions and noises included in logs, only limited effects can be achieved in actual applications of certain types.
The following patent literatures will now be described as they relate to the present invention:
Japanese Patent Application Publication No. 2003-108574 discloses the following purchase rule model construction system: Specifically, from a database in which purchase records are recorded, the purchase records of customers are transformed into symbol strings by using another database containing a symbol list in which purchased goods are associated with specific symbols. The symbol strings obtained by the transformation are then substituted with the same or a fewer number of symbols so as to index the symbol strings. On the other hand, multiple regular expression candidates are generated by appropriately combining some of the symbols used in the symbol strings. Then, the indexed symbol strings are evaluated as to which candidates among the multiple regular expression candidates are included in the indexed symbol strings so that a useful purchase rule and pattern that exist in the purchase records may be found. In this way, an accurate purchase rule model can be constructed without relying on experts' abilities.
Japanese Patent Application Publication No. 2006-236262 discloses a system that allows general users to take out and utilize text contents holding useful information without analyzing tags or creating extraction rules. Specifically, the system includes: a recording unit that records a pattern format having a regular expression; an extraction rule generating unit that generates an extraction rule for taking out, from a HTML page, a text content that matches the pattern format; and a format transforming unit that performs transformation into a predetermined format on the basis of the extraction rule.
Nonetheless, neither of these patent literatures discloses a technique for extracting a meaningful rule from a log of an unstructured business process.
To overcome these deficiencies, the present invention provides a method of creating a workflow including: creating a work graph on the basis of a work log, wherein the work log is recorded through a series of operations performed by an operator; identifying and removing a redundant graph in the created work graph; simplifying the work log by deleting an entry corresponding to the removed redundant graph from the work log; reading a set of constraints to be satisfied by log entries, wherein each of the constraints defines an expression including a regular expression having a variable; changing a prepared regular expression by applying one of the constraints to an initial value of the prepared regular expression; determining whether the changed regular expression is appropriate for the simplified log; and creating a graph of a workflow by creating a finite state transition system on the basis of the changed regular expression in response to a determination that the changed regular expression is appropriate.
According to another aspect, the present invention provides an article of manufacture tangibly embodying computer readable instructions which, when executed, cause a computer to carry out the steps of a method for creating a workflow, the method including: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code configured to perform the steps of: creating a work graph on the basis of a work log, wherein the work log is recorded through a series of operations performed by an operator; identifying and removing a redundant graph in the created work graph; simplifying the work log by deleting an entry corresponding to the removed redundant graph from the work log; reading a set of constraints to be satisfied by log entries, wherein each of the constraints defines an expression including a regular expression having a variable; changing a prepared regular expression by applying one of the constraints to an initial value of the prepared regular expression; determining whether the changed regular expression is appropriate for the simplified log; and creating a graph of a workflow by creating a finite state transition system on the basis of the changed regular expression in response to a determination that the changed regular expression is appropriate.
According to yet another aspect, the present invention provides a system for creating a workflow including: means for creating a work graph on the basis of a work log, wherein the work log is recorded through a series of operations performed by an operator; means for identifying and removing a redundant graph in the created work graph; means for simplifying the work log by deleting an entry corresponding to the removed redundant graph from the work log; means for reading a set of constraints to be satisfied by log entries, wherein each of the constraints defines an expression including a regular expression having a variable; means for changing a prepared regular expression by applying one of the constraints to an initial value of the prepared regular expression; means for determining whether the changed regular expression is appropriate for the simplified log; and means for creating a graph of a workflow by creating a finite state transition system on the basis of the changed regular expression in response to a determination that the changed regular expression is appropriate.
Hereinbelow, an embodiment of the present invention will be described by referring to the drawings. Reference numerals that are the same across the drawings represent the same components unless otherwise noted. It is to be understood that what is described below is just one mode for carrying out the present invention and is not intended to limit the present invention to the contents described in the embodiment.
Referring to
The hard disk drive 108 stores, in advance, an operating system therein, though it is not illustrated here. This operating system may be any operating system that is compatible with the CPU 104, such as Linux®, Windows® 7, Windows® XP, or Windows® 2000 of Microsoft Corporation, or Mac OS® of Apple Inc.
The hard disk drive 108 further stores the following to be described later in detail: an operation log file; a group of log processing modules aimed to simplify a log; a group of log pattern refinement modules for acquiring an appropriate regular grammar on the basis of the simplified log; a module for transforming the acquired regular grammar into a finite transition system; a module for generating a workflow from the finite transition system; and the like. These modules can be created with a programming language processing system of any known programming language, such as C, C++, C#, or Java®. With the help of the operating system, these modules are loaded into the main memory 106 and executed as appropriate. Operations of the modules will be described later in more detail by referring to a functional block diagram in
The keyboard 110 and the mouse 112 are used for activating the following: the operation log file; the group of log processing modules aimed to simplify a log; the group of log pattern refinement modules for acquiring an appropriate regular grammar on the basis of the simplified log; the module for transforming the acquired regular grammar into a finite transition system; the module for generating a workflow from the finite transition system; and the like. The keyboard 110 and the mouse 112 are also used for typing characters, and the like.
The display 114 is preferably a crystal liquid display. One with any resolution, e.g., XGA (resolution: 1024×768) or UXGA (resolution: 1600×1200), may be used. The display 114 is used to display a graph generated from an operation log.
Further, the system in
The server (not illustrated) is connected to a client system (not illustrated) manipulated by an operator of a given work. When the operator manipulates the client system, an operation log file stored in the server is collected through the network into the system in
Next, by referring to
In
As shown in
Referring back to
The noise detection submodule 208 recognizes, as a noise, a node of an exceptional process in the graph created by the graph creation submodule 206.
The log deletion submodule 210 deletes an entry of a log that corresponds to a node recognized as a noise by the noise detection submodule 208. To show this in the example in
The score calculation submodule 212 has a function to apply various variations to the graph re-created by the graph creation submodule 206 from the operation log with a noise deleted therefrom, and to calculate a score for each variation. The processing by the score calculation submodule 212 will be described later in more detail.
The display submodule 214 has a function to display, on the display 114, the graph created by the graph creation submodule 206 or the graph with the variation applied thereto by the score calculation submodule 212.
The log processing module 204 transfers a simplified log, which is the result of the above processing, to a log pattern refinement module 216.
The log pattern refinement module 216 includes a refinement submodule 218, an examination submodule 220, a substitution submodule 222, and a transformation submodule 224. The log pattern refinement module 216 has a function to output a regular grammar based on the received simplified log by using data containing constraints 226 that are defined by the user and stored in the hard disk drive 108 or the main memory 106. The processing by the log pattern refinement module 216 will be described later in more detail.
A finite state transition system generation module 228 has a function to receive the regular grammar outputted from the log pattern refinement module 216 and to transform the regular grammar into a finite state transition system.
A workflow transformation module 230 has a function to generate a workflow from data of the finite state transition system received from the finite state transition system generation module 228.
Next, an overview of the processing according to the present invention will be described by referring to a flowchart in
In step 404, the graph creation submodule 206 reads the log 402 and creates a graph.
In step 406, the noise detection submodule 208 performs noise detection on the basis of the graph created by the graph creation submodule 206.
In step 408, the log deletion submodule 210 deletes an entry of a log recognized as a noise by the noise detection submodule 208.
In step 410, the graph creation submodule 206 reads the log 402 with the entry deleted therefrom and creates a new graph.
In step 412, the score calculation submodule 212 performs score calculation and displays scores of different variations for the graph. In step 414, the log processing module 204 displays the variations and the scores thereof, which are calculated by the score calculation submodule 212, on the display 114 and allows the user to select one of the variations.
If the user's determination in step 416 is such that the user accepts and selects one of the variations, a log 418 simplified in accordance with the result of such selection is sent to a log refinement step that follows. If the user's determination in step 416 is such that further simplification is determined to be necessary, the processing returns to the noise detection in step 406.
If the user's determination in step 416 is such that the user desires to manually select a log to be deleted, then in step 420, the log processing module 204 displays the graph on the display 114 and allows the user to select a node to be deleted in the graph through operations of the mouse 112 or the like. After that, in step 408, an entry of a log corresponding to the selected node in the graph is deleted, followed by the processing in and after step 410.
When the simplified log 418 is finally established, then in step 422, the log pattern refinement module 216 provides an initial log pattern which is defined by the user or scheduled in advance by the system.
In step 424, the log pattern refinement module 216 reads φ being one of the constraints 226 defined by the user.
In step 426, the log pattern refinement module 216 determines whether there is any unprocessed constraint φ. If there is, the log pattern refinement module 216 calls the refinement submodule 218 in step 428 to refine the log pattern. The log pattern refinement module 216 then calls the examination submodule 220 in step 430 to determine whether traces, which are a sequence of processes acquired from the simplified log 418, are valid. If it is determined that traces are valid, the log pattern refinement module 216 accepts the resultant log pattern. If not, the log pattern refinement module 216 rejects the resultant pattern.
The processing returns to step 426. If it is determined in step 426 that there is no unprocessed constraint φ, the processing proceeds to step 432 with the resultant log pattern as an output regular grammar. There, the finite state transition system generation module 226 transforms the regular grammar into a finite state transition system. Next, in step 434, the workflow transformation module 230 transforms the finite state transition system thus acquired into a workflow.
Next, the function of the noise detection submodule 208 in
A pattern shown in
Processing to detect a graph of the N-N node type as above will be described by referring to a flowchart in
A series of steps from step 704 to step 712 is performed sequentially on the elements i of N for i=1 to max_node. Here, max_node refers to the number of nodes to be processed.
In step 706, a function get_in(i) is called, and the number of input links of the node i is assigned to inNum variable.
In step 708, a function get_out(i) is called, and the number of output links of the node i is assigned to outNum variable.
In step 710, in accordance with vi=min(inNum,outNum), a value of either inNum or outNum, whichever is smaller, is assigned to vi.
By the time of the exit from the loop in step 712, the values of the variables vi are prepared for i=1 to max_num. Then, in step 714, the noise detection submodule 208 sorts V in a descending order. Thereafter, in step 716, the noise detection submodule 208 outputs V. Of the nodes with values obtained by min(inNum,outNum), a node with the greatest value appears at the top in V.
The node at the top in V is recognized as a node to be deleted, and the log deletion submodule 210 actually deletes the corresponding entry from the operation log 202.
Some other types of graphs which the noise detection submodule 208 recognizes as a deletion target include a subroutine type shown in
Processing to detect these types of graphs will be described by referring to flowcharts in
getMerge( ) detects a pattern in which the number of links outputted from a node is smaller than the number of links inputted to the node as shown in
getBranch( ) detects a pattern in which the number of links outputted from a node is larger than the number of links inputted to the node as shown in
A series of steps from step 1204 to step 1212 is performed sequentially on the elements i of N for i=1 to max_node. Here, max_node refers to the number of nodes to be processed.
In step 1206, the function get_in(i) is called, and the number of input links of the node i is assigned to inNum variable.
In step 1208, the function get_out(i) is called, and the number of output links of the node i is assigned to outNum variable.
In step 1210, in accordance with mi=inNum/outNum, a value obtained by dividing inNum by outNum is assigned to mi.
By the time of the exit from the loop in step 1212, the values of the variables mi are prepared for i=1 to max_num. Then, in step 1214, the noise detection submodule 208 sorts M in the descending order. Thereafter, in step 1216, the noise detection submodule 208 outputs M. Of the nodes with values obtained by min(inNum,outNum), a node with the greatest value appears at the top in M.
A series of steps from step 1304 to step 1312 is performed sequentially on the elements i of N for i=1 to max_node. Here, max_node refers to the number of nodes to be processed.
In step 1306, the function get_in(i) is called, and the number of input links of the node i is assigned to inNum variable.
In step 1308, the function get_out(i) is called, and the number of output links of the node in is assigned to outNum variable.
In step 1310, in accordance with bi=inNum/outNum, a value obtained by dividing inNum by outNum is assigned to bi.
By the time of the exit from the loop in step 1312, the values of the variables b, are prepared for i=1 to max_num. Then, in step 1314, the noise detection submodule 208 sorts B in the descending order. Thereafter, in step 1316, the noise detection submodule 208 outputs B. Of the nodes with values obtained by min(inNum,outNum), a node with the greatest value appears at the top in B.
Next, processing for getDistance(node1,node2) will be described by referring to
In step 1406, variables are set such that d_all=0, d_new=0, and target=0.
A series of steps from step 1408 to step 1430 is performed sequentially on cases of Case for i=1 to caseMax.
In step 1410, setting is performed such that d_new=0 and flag=false.
Next, a series of steps from step 1412 to step 1426 is performed sequentially for a variable j from j=1 to logMax on the pieces of log trace data Lj of Log.
In step 1414, it is determined whether getNode(Lj)=node1, i.e., whether Lj includes the node given as the first argument in getDistance( ).
If so, flag=true is set in step 1416.
In step 1418, it is determined whether or not flag=true. If so, d_new is incremented in accordance with d_new=d_new+1 in step 1420.
In step 1422, it is determined whether getNode(Lj)=node2, i.e., whether Lj includes the node given as the second argument in getDistance( ). If so, target is incremented in accordance with target=target+1 and flag=false is set in step 1424.
After exiting from the j loop in step 1426, d_new is added to d_all in accordance with d_all=d_all+d_new in step 1428.
After exiting from the i loop in step 1430, d is calculated from d=d_all/target in step 1430, and in step 1434 getDistance(node1,node2) returns the value d thus calculated.
Next, processing to detect a subroutine type graph by use of getMerge( ) getBranch( ), and getDistance( ) will be described by referring to a flowchart in
In step 1502, values are read for variables in advance. To be specific, L is a set that stores all pieces of log trace data. M is a set of outputs obtained from the merge-type detection algorithm. B is a set of outputs obtained from the branch-type detection algorithm. Dij is a distance between a node ni and a node nj. T is the number of times that serves as a threshold for filtering a target subroutine node.
In step 1504, with M=getMerge( ) and B=getBranch( ), the processing in the flowcharts in
A series of steps from step 1506 to step 1518 is performed on the elements of M for i=1 to T.
A series of steps from step 1508 to step 1516 is performed on the elements of B from j=1 to T.
In step 1510, with ni=getNode(M,i), the i-th node of M is taken out as ni.
In step 1512, with nj=getNode(B,j), the j-th node of B is taken out as nj.
In step 1514, with Dij=getDistance(ni,nj), a distance from the node ni to the node nj is calculated and assigned to Dij.
After exiting from the j loop in step 1516 and exiting from the i loop in step 1518, D including Dij as its element is sorted in the descending order in step 1520.
In step 1522, D is outputted.
Next, processing to detect a switch type graph by use of getMerge( ), getBranch( ), and getDistance( ) will be described by referring to a flowchart in
In step 1602, values are read for variables in advance. To be specific, L is a set that stores all pieces of log trace data. M is a set of outputs obtained from the merge-type detection algorithm. B is a set of outputs obtained from the branch-type detection algorithm. Dij is a distance between a node ni and a node nj. T is the number of times that serves as a threshold for filtering a target switch node.
In step 1604, with M=getMerge( ) and B=getBranch( ), the processing in the flowcharts in
A series of steps from step 1606 to step 1618 is performed on the elements of B for i=1 to T.
A series of steps from step 1608 to step 1616 is performed on the elements of M from j=1 to T.
In step 1610, with ni=getNode(B,i), the i-th node of B is taken out as ni.
In step 1612, with nj=getNode(M,j), the j-th node of M is taken out as nj.
In step 1614, with Dij=getDistance(ni,nj), a distance from the node ni to the node nj is calculated and assigned to Dij.
After exiting from the j loop in step 1616 and exiting from the i loop in step 1618, D including Dij as its element is sorted in descending order in step 1620.
In step 1622, D is outputted.
The processing in the flowchart in
Preferably, one of the above-described noise detection algorithms is used such that one loop of the steps would delete only one node in the graph. In this case, the operator may interactively select which one of the noise detection algorithms to use. Alternatively, one of the noise detection algorithms may be selected and used randomly. Still alternatively, by taking into consideration the effects of using the noise detection algorithms, the algorithm that offers the greatest effect may be used. For example, in a case of the N-N node type detection shown in
In a case of, in particular, the subroutine type noise detection shown in
In step 1802, Pi is defined as a variable representing a pattern obtained as a result of the i-th execution. Moreover, S is defined as a set of all calculation scores.
A series of steps from step 1804 to step 1816 is iterated for S for i=1 to max_iteration.
In step 1806, i1=getLinkNum(Pi) is calculated. getLinkNum(Pi) is a function that returns the number of links of Pi.
In step 1808, i0=getLinkNum(Pi-1) is calculated.
In step 1810, s_1i=(i0−i1)/i1 is calculated.
In step 1812, c=getCaseCoverage(Pi) is calculated. Here, getCaseCoverage(Pi) is a function that returns the number of cases in Case which the nodes remaining in Pi can cover.
In step 1814, s_2i=c/max_iteration is calculated, and in step 1816, si=normalize(s_1i)*normalize(s_2i) is calculated. Here, normalize(s_1i) is a value obtained by summing s_1j (j=1 to max_iteration) and dividing s_1i by the sum. normalize(s_2i) is calculated similarly.
After exiting from the i loop in step 1818, S is sorted in the descending order in step 1820. In step 1822, S is outputted.
Next, the log pattern refinement step will be described by referring to
First of all, by taking the work logs in
{“start-claim-processing”, “complete-preprocessing”, “start-checking”, “complete-checking”, “start-machine-based-claim-examination”}
Next, a regular grammar r is as follows:
r::=e|x|r·r|r*|r∩r′|r∪r′|r
c
Here, e denotes the element of Σ; x, a variable; r·r, a concatenation of regular grammars; r*, zero or more repetitions of r; r∩r′ the intersection of 2 regular grammars r and r′, i.e., the set of words that belong both to r and r′; r∪r′, the union of 2 regular grammars, r and r′, i.e., the set of words that belong to either r or r′; and rc, the complement of r, i.e., the set of words that do not belong to r.
For example, a regular grammar of {“start-claim-processing”}.*{“start-machine-based-claim-examination”} represents traces where {“start-machine-based-claim-examination”} will necessarily occur sometime after {“start-claim-processing”}.
Next, a constraint φ will be described. The constraint φ determines a condition which the regular grammar should satisfy.
The constraint φ is defined as follows:
φ0::=x=r|φ0φ0
φ::=φ0|φ0φ
Here, φ0, a basic constraint, is defined to be either ‘x=r’ (valuation of a variable x) or the conjunction of 2 basic constraints. In the second line, φ is defined to be either a basic constraint, φ0, or an implication, φ0φ.
For example, a constraint may be described as:
x=y·{“start-machine-based-claim-examination”}.*y=.*{“complete-preprocessing”}.*
This constraint represents a condition that if {“start-machine-based-claim-examination”} is present, {“complete-preprocessing”} must be present before it.
A constraint other than the above is given as:
x=y·{“start-machine-based-claim-examination”}y=[̂{“complete-checking”}]+
This constraint represents a condition that {“complete-checking”} is not included if the assessment ends in {“start-machine-based-claim-examination”}.
Still another example of the constraint is given as:
x=y·z
=(y=.*“inquire-code”).*=
z=.*{“inquire-code”}.*)
With the above constraints taken into consideration, this constraint represents a condition that if the assessment ends by issuing of a document and checking, and also code inquiry is made during the issuing of the document, the code inquiry is made also during the checking.
These constraints are described in advance by the user and stored in the main memory 106 or the hard disk drive 108 in such a manner that they can be called by the log pattern refinement module 216, as the constraints 226 in
The constraints are created by finding a certain rule through looking at and analyzing past operation logs of the same type.
Next, processing by the log pattern refinement module 216 will be described by referring to a flowchart in
The simplified log 418 is formed of multiple log traces. The log traces here form flows starting at one process and ending at another process. A set of such log traces T is formed of the following six elements:
T={
1,
In addition, the contents of these elements are as follows:
1={“start-claim-processing”}{“complete-preprocessing”}{“start-checking”}{“start-machine-ba sed-claim-examination”}{“register-completion”}
2={“start-claim-processing”}{“start-checking”}{“start-machine-based-claim-examination”}{“c omplete-checking”}
3={“inquire-code”}{“complete-preprocessing”}{“start-machine-based-claim-examination”}
4={“start-checking”}{“complete-checking”}{“start-machine-based-claim-examination”}
5={“inquire-code”}{“complete-preprocessing”}{“inquire-code”}{“start-machine-based-claim-examination”}
6={“start-checking”}{“inquire-code”}{“start-machine-based-claim-examination”}
In step 2102 in
In step 2104, the log pattern refinement module 216 reads one constraint φ out of the constraints 226 prepared in advance by the user.
In step 2106, whether the constraint φ has been successfully read is determined, and if so, the log pattern refinement module 216 calls the refinement submodule 218 and in step 2108, refines the regular grammar r on the basis of the constraint φ.
To be specific, a function refine( ) is called and r′=refine(r,{φ}) is executed. Processing for the function refine( ) being the refinement submodule 218 will be described later by referring to a flowchart in
r′ is obtained as a result of the processing in step 2108. Then, in step 2110, the log pattern refinement module 216 calls the examination submodule 220 to examine the regular grammar r′ on the basis of the trace set T. To be specific, with r′ and T as arguments, a function examine(r′,T) is called. Processing for the function examine( ) being the examination submodule 220 will be described later by referring to a flowchart in
In step 2110, if examine(r′,T) returns true, r is substituted with r′. On the other hand, if examine(r′,T) returns false in step 2110, r is not substituted.
The processing returns to step 2104. If the determination in step 2106 is such that there is not any constraint φ left, the log pattern refinement module 216 returns r in step 2114. This regular grammar r is transferred to the finite state transition system generation module 228.
Next, the processing for refine(r,Φ) executed by the refinement submodule 218 will be described by referring to the flowchart in
In step 2204, the refinement submodule 218 extracts an equality x=r0 for φ, which appears first, as a pair (x,r0).
In step 2206, the refinement submodule 218 calls transform(φ,x,r0,empty set) and assigns the return value thereof to rφ. transform( ) is executed by the transformation submodule 224. The processing therefore will be described later in detail by referring to a flowchart in
In step 2208, with r=r∩rφ, the refinement submodule 218 narrows the regular grammar r.
After a predetermined number of iterations, the refinement submodule 218 leaves step 2210, and returns r in step 2212.
Next, the processing for examine(r,T) executed by the examination submodule 220 will be described by referring to the flowchart in
A series of steps from step 2304 to step 2312 is iterated for each element of T (
In step 2306, it is determined whether match(r,
If it is determined in step 2306 that r accepts
Then, in step 2314, a logical value of nacc/(nacc+nrej)>threshold is returned. That is, if nacc/(nacc+nrej)>threshold, the ratio of the accepted traces is regarded as being larger than the threshold, and examine(r,T) returns true. If not, examine(r,T) returns false.
Next, the processing for transform(φ,x,r0,Γ) executed by the transformation submodule 224 will be described by referring to the flowchart in
In step 2402, the transformation submodule 224 determines whether φ=(y=r). If so, Γ=Γ∪{(y,r)} and the correspondence table is added to Γ in step 2404. Then, in step 2406, the transformation submodule 224 returns substr(r0,empty set)c∩substr(x,Γ). Note that processing for substr( ) will be described later in detail by referring to a flowchart in
On the other hand, if the transformation submodule 224 does not determine in step 2402 that φ=(y=r), the processing proceeds to step 2408, where whether φ=(y=rψ) is determined. If so, the correspondence table is added to Γ in step 2410 in accordance with Γ=Γ∪{(y,r)}. Then, in step 2412, the transformation submodule 224 recursively calls transform(φ,x,r0,Γ) and returns a result thereof.
If determining in step 2408 that φ=(y=r=ψ) is not true, the transformation submodule 224 returns r in step 2414.
Next, the processing for the function substr(r,Γ) executed by the substitution submodule 222 will be described by referring to the flowchart in
In step 2502, the substitution submodule 222 determines whether x is included in r. If so, the substitution submodule 222 determines in step 2504 whether (x,s)εΓ, i.e., whether a pair (x,s) is included in Γ. If so, a regular grammar, which is obtained by substituting x in r with s, is assigned to r′ in step 2506. If not, a regular grammar, which is obtained by substituting x in r with .*, is assigned to r′ in step 2508. In either case, substr(r′,Γ) is recursively called, and the return value thereof is returned.
If determining in step 2502 that x is not included in r, the substitution submodule 222 simply returns r in step 2512.
For a more thorough understanding of the processing by the above function, the aforementioned constraints are used again.
Now, for the initial value of grammar r=.*, refine(r,{φ}) is executed with φ as the constraint. Then, the following are obtained:
x=y·{“start-machine-based-claim-examination”}.*y=.*{“complete-preprocessing”}.* (1)
This means rφ=(. {“start-machine-based-claim-examination”}.*}c∪(.*{“complete-preprocessing”}.*{“start-machine-based-claim-examination”}.*).
x=y·{“start-machine-based-claim-examination”}
y=[̂{“complete-checking”}]+ (2)
This means rφ=(.*{“start-machine-based-claim-examination”}.*}c∪(.*[̂{“complete-checking”}]+{“start-machine-based-claim-examination”}).
x=y·z
(y=.*{“inquire-code”}.*
z=.*{“inquire-code”}.*) (3)
This means rφ=(.*{“inquire-code”}.*}c∪(.*{“inquire-code”}.*{“inquire code”}.*).
Here, it should be noted that the variables x and y are eliminated and thus rq, contains no variable.
Meanwhile, the aforementioned constraints are again cited as follows.
T={
1,
1={“start-claim-processing”}{“complete-preprocessing”}{“start-checking”}{“start-machine-ba sed-claim-examination”}{register completion}
2={“start-claim-processing”}{“start-checking”}{“start-machine-based-claim-examination”}{“c omplete-checking”}
3={“inquire-code”}{“complete-preprocessing”}{“start-machine-based-claim-examination”}
4={“start-checking”}{“complete-checking”}{“start-machine-based-claim-examination”}
5={“inquire-code”}{“complete-preprocessing”}{“inquire-code”}{“start-machine-based-claim-examination”}
6={“start-checking”}{“inquire-code”}{“start-machine-based-claim-examination”}
Then, the following can be found:
rφ, in (1) accepts
rφ, in (2) accepts
rφ, in (3) accepts
The role of the log pattern refinement module 216 is to apply such constraints, examine the acceptance rate for the log traces T, and refine the regular grammar in a stepped fashion. In this event, the transformation submodule 224 and the substitution submodule 222 are called by the refinement submodule 218 for the refinement processing.
The regular grammar finally obtained is transferred to the finite state transition system generation module 228.
In the following, the terms for describing the processing by the finite state transition system generation module 228 are defined again.
Specifically, Σ=set of alphabets, and Σ*=set of words obtained by joining an arbitrary number of alphabets.
The regular expression r is defined as r ::=ε|a|r∪r|r∩r|rc|r·r|r*, where a is an arbitrary element of the alphabet set Σ, and ε is a special symbol not belonging to Σ. Note that the regular expression r may also be called the regular grammar.
Moreover, a nondeterministic finite state transition machine including ε-transition (ε-NFA)M is defined as follows:
Q=set of states={q0, q1, q2 . . . }
Σ=set of alphabets
ε=special transition not belonging to Σ
Δ=set of state transitions (Δ⊂Q×(Σ∪{ε})×Q)
q0=initial state
F=set of final states
L(M)=set of words accepted by ε-NFA M
Now, assume that M1=(Q1,Σ∪{ε},Δ1,q1,F1) and M2=(Q2,Σ∪{ε},Δ2,q2,F2). With M1 and M2 as above, functions to be used are defined as follows:
disj(M1,M2)=ε-NFA accepting L(M1)∪L(M2), or a set of words defining ε-NFA such that the ε-NFA is branched to M1 or M2 by ε-transition;
conj(M1,M2)=ε-NFA accepting L(M1)∪L(M2), defined such that (q1,q2),a,(q′1,q′2) would be a transition of conj(M1,M2) when (q1,a,q′1)εΔ1 and (q2,a,q′2)εΔ2 for the direct product of transition sets Q1×Q2;
neg(M1)=ε-NFA accepting Σ*\L(M1), or a ε-NFA in which the accepting and non-accepting (rejecting) states are reversed;
concat(M1,M2)=ε-NFA accepting {w1·w2|w1εL(M1),w2εL(M2)}, or a ε-NFA in which M1 and M2 are joined by adding an ε-transition from F1 to q2; and
rep(M1)=ε-NFA accepting {w*|wεL(M1)}, or a ε-NFA in which an ε-transition from F1 to q1 and an ε-transition that ends without passing M1 are added.
Pseudo code which the finite state transition system generation module 228 uses for processing a function RE_to_eNFA(r) that transforms the regular expression into an equivalent ε-NFA(nondeterministic finite automaton) by using these functions are described as follows. As can be seen, this is recursive processing:
Next, another function of the finite state transition system generation module 228 is to transform the ε-NFA (nondeterministic finite automaton) acquired by RE_to_eNFA(r) into a DFA (deterministic finite automaton).
Here, definitions are given such that when the nondeterministic finite state transition machine (ε-NFA)M including ε-transition=(Q,Σ∪{ε},Δ,q0,F):
Q=set of states={q0, q1, q2 . . . }
Σ=set of alphabets
ε=special transition not belonging to Σ
Δ=set of state transitions (Δ⊂Q×(Σ∪{ε})×Q)
q0=initial state
F=set of final states
Meanwhile, a deterministic finite state transition machine (DFA)M=(Q,Σ,Δ,q0,F).
Here, functions to be used are defined as follows:
ε-closure(q)=set of states that are reachable from q while transitions other than ε-transition are removed. That is, qεε-closure(q), (q,ε,q′)εΔε-closure(q′)⊂ε-closure(q).
Set of states that are reachable from t(q,a) in an ε-transition and an a-transition (each of which is performed arbitrary times)=∪{ε-closure(q″)|q′εε-closure(q),(q′,a,q″)εΔ}.
Next, the processing to transform a ε-NFA into a DFA will be described by referring to a flowchart in
In step 2602 in
In step 2604, the finite state transition system generation module 228 searches for a transition destination of X through a, which has not been checked. Specifically, the finite state transition system generation module 228 searches for such XεQ′ and aεΣ that (X,a,Y) is not an element of Δ′ with any YεQ′.
In step 2606, it is determined whether the above are found. If not, the processing ends.
If it is determined in step 2606 that the above are found, Y=∪{t(q,a)|qεX}, Q′=Q′∪{Y}, and Δ′=Δ′u{(X,a,Y)} are set in step 2608, and the processing returns to step 2604.
The function of the finite state transition system generation module 228 is to generate a DFA from the regular expression r in the above manner. In the following, a description will be given of the function of the workflow transformation module 230 that generates a workflow from the generated DFA.
Due to its algorithm, the workflow transformation module 228 does not directly generate a workflow from the DFA, and instead generates a pseudo-workflow first.
In the following, variables and functions are defined for the purpose of describing the algorithm:
deterministic finite state machine DFA M=(Q,Σ,Δ,q0,F)
Q=set of states={q0,q1,q2, . . . }
Σ=set of alphabets
Δ=set of state transitions (Δ⊂Q×Σ×Q)
q0=initial state
F=final state
pseudo-workflow pWF=(N,E), a directed graph taking a transition a(εΣ) of DFA as a node and being used as a stage before generating a workflow
task node n=a(i,j), N=set of task nodes
a=element of Σ
i=number given to the entrance of task node n
j=number given to the exit of task node n
e=edge, E=set of edges
Functions to be used are defined as follows:
count(a)=the number of task nodes in N that are in the form of a(______,______)
init(e)=initial point of edge e (initial node)
term(e)=terminal point of edge e (terminal node)
Next, processing to generate a pseudo-workflow from the DFA will be described by referring to a flowchart in
In step 2702 in
In step 2704, the workflow transformation module 228 processes N=N∪{a(i,j)} for all the elements (qi,a,qj) of to thereby generate a node set N.
In step 2706, the workflow transformation module 228 processes E=E∪{a(i,j),b(j,k)} for all the elements a(i,j) and b(j,k) of N to thereby generate an edge set E.
Next, processing to generate a workflow from the pseudo-workflow will be described.
workflow WF=(N,E,X)
Here, the workflow is determined as a flowchart-like structure. The workflow is associated with a set of variables X, and may have update nodes of XεX (x:= . . . ) and branch nodes dependent on the values of x.
The node n is any one of the following:
update(x,v): updating the value of the variable x to v.
label(a): providing a as a label (a is an alphabet of the DFA). Note that in the workflow, there are at maximum two nodes that have the label of a.
branch.
The edge e connects nodes n and n′. The flow of the processing therefore is shown below.
In particular, an edge exiting from a branch node is associated with a condition “x=v” (that edge is selected when the value of x is v).
combine(A) creates WF nodes and edges corresponding to nodes gathered by A={a(i1,j1),a(i2,j2), . . . , a(im,jm)} among nodes in the pseudo-workflow.
Next, processing to generate a workflow from the pseudo-workflow will be described by referring to a flowchart in
In step 2802 in
In step 2804, the workflow transformation module 228 processes the following for all a in Σ.
A={a(i1,j1),a(i2,j2), . . . ,a(im,jm)}
(N″,E″)=combine(A)
N′=N′∪N″
E′=E′∪E″
Then, the workflow transformation module 228 ends the processing. After data of the workflow(N′,E′,{st}) is acquired in the above manner, appropriate drawing processing may be performed using the data to display the workflow on the display 114.
As an example, a regular expression r=([̂<“start-machine-based-claim-examination”>]*)c∪([̂<“start-machine-based-claim-exam ination”>]*<“complete-preprocessing”>[̂<“start-machine-based-claim-examination”>]*.*<“start-machine-based-claim-examination”>.*) is considered.
The present invention has been hereinabove described based on a particular embodiment. However, the present invention is not limited to a particular operation system or a platform, and can be carried out on any computer system.
Moreover, the operation log that serves as the base of the analysis is not limited to a particular operation log such as an insurance operation log. The present invention is applicable to any type of log as long as the log has operation contents, work contents, or IDs thereof arranged in a time-series manner and is stored in a computer-readable manner.
According to the present invention, the processing is performed in which a simplified log is first prepared by removing a node recognized as a noise from a log of a business process, and subsequently a regular grammar is refined based on constraints so that the regular grammar may be compatible with the simplified log. As a result, the log is fitted into the regular grammar. Accordingly, an advantageous effect can be achieved which allows the generation of a suitable workflow even from a log of an unstructured business process.
Number | Date | Country | Kind |
---|---|---|---|
2010-148316 | Jun 2010 | JP | national |