A process may be described as a series of nodes or steps (e.g., actions, changes, or functions) that bring about a result. Processes may be used to define a wide range of activities such as the steps in a computer program, procedures for combining ingredients, manufacturing of an apparatus, and so forth. Further, metrics or process measurements may be defined to allow for process monitoring and data retrieval.
Specifically, metrics may be defined as properties of a process or business that are pertinent or that a user finds interesting. For example, business metrics may reflect business goals and include such things as cost, quality, outcome, and/or duration. Additionally, service level agreements (SLAs) inherently have underlying metrics. For example, a duration metric underlies a SLA requiring delivery of items no more than twenty-four hours after an order is placed. The “no more than twenty-four hours” requirement is merely a condition on a duration metric. Further, values for metrics may be computed using process execution data.
Process execution data may be defined as information or data related to a process instance. Executions or execution results in a process instance may be recorded using monitoring equipment, thus creating process execution data. Examples of process execution data include time stamps, orders, starting time, and ending time. A process definition may be composed of nodes (steps in the process), and arcs (connectors that define an order of execution among the nodes). During a process instance (i.e., an execution of a process definition), a certain node or string of nodes in the process may be executed zero, one, or many times. Accordingly, when a process instance is active (i.e., during execution), the availability of node execution data from that particular instance may be limited. This limited data may be referred to as partial process execution data. Further, the number of node executions (e.g., zero, one, or many) may depend on a process definition or formal description of a business process.
Existing tools, systems, and techniques may allow for the defining and computing of business metrics on top of business process execution data. For example, a tool may allow a user to define metrics, which may then be used to provide reports and/or monitoring of execution data associated with the metrics. Additionally, methods and systems may exist for deriving explanations and predictions regarding such metrics. These techniques may contemplate computing prediction models using process execution data acquired from active process instances (i.e., partial process execution data). For example, a tool may contemplate using a data mining technique to provide, at the very start of a process instance, a prediction for the value of one or more metrics. Further, the tool may provide an updated prediction as the execution proceeds based on the more current execution data. While existing techniques may be useful, a method to address the problem of computing a point or stage in a process execution where it makes sense to collect data and generate a prediction may provide a desirable additional benefit. The present disclosure may address the above issues and provide other advantages.
One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
The present disclosure concerns a system and method for prediction of generic, user-defined metrics at different points during a process. Embodiments of the present invention are related to computing process execution stages, which may be important or necessary to make effective predictions for active process instances. Specifically, given a process and past process execution data, embodiments of the present invention may identify a set of stages and their corresponding nodes, and consequently a set of information that can be mined to generate prediction models. Embodiments of the present invention can use, for example, the start and end times of the identified nodes as features for generating the prediction model. Embodiments of the present invention deal with the problem of computing a point or stage in a process execution where it makes sense to collect data and generate a prediction. In particular, embodiments of the present invention address developing a set of executions whose data should be included in the computation of prediction models that correspond to different execution stages. Further, embodiments of the present invention address computing the current stage for a running process instance.
Additionally,
While there may be several types of traces, there are two general types. First, there is a start-time based trace, in which the nodes appear in the trace ordered by node activation time. Second, there is an end-time based trace, in which nodes appear in the string ordered by node completion time. In both start-time and end-time based traces, the time order is ascending (nodes that start or complete first also in some embodiments appear first). For example, ABCD and ABCDBCE are two possible traces of the process represented in the diagram 10. Generally, different instances of the same process may have different traces, and the number of different traces that a process can generate may be unbounded.
In one embodiment of the present invention, a separate model may be derived for every possible subtrace or substring of a given trace to make the best possible prediction. For example, in reference to diagram 10, separate models may be derived for subtraces AB, ABC, ABCDB, and so on. If it is desirable to make a prediction on a running process instance, the current subtrace should be examined and its corresponding model used for making a prediction. This approach may guarantee that all possible available information is used for the prediction, and that a model specifically derived for that very same portion of execution (i.e., same executed nodes) is used. However, this approach may not always be practical because the number of possible subtraces may be unbounded making it difficult if not impossible to compute the very large or infinite number of models. Accordingly, embodiments of the present invention address problems with deriving process prediction models that result from the potentially unlimited number of process definition traces.
In one embodiment of the present invention, the notion of a stage may be introduced to address the model derivation problems associated with the potentially unlimited number of process traces. Like a trace, a stage may be a string of node identifiers. However, unlike a trace, a stage may not necessarily reflect each one of the nodes executed up to a given point. Stages may be derived from traces by pruning repetitions of consecutive nodes (i.e., loops) and replacing them by a representative node or set of nodes as determined by a particular strategy. Accordingly, a limit may be placed on the number of stages for which it is practical to infer prediction models.
The algorithm illustrated in
Embodiments of the present invention may apply various different strategies. In one strategy, for each substring in which all elements are the same (e.g., AAAAA), only one occurrence of the node in the substring is kept as the trace (e.g., A). This strategy may have additional substrategies such as keeping only the first occurrence, only the last occurrence, only a randomly picked occurrence, or some other policy. A second exemplary strategy is keeping a maximum designated number of occurrences, n, where n is a user designated loop threshold. This strategy may incorporate different options depending on which n occurrences of a plurality of occurrences are chosen, such as the first n occurrences, the last n occurrences, or some other policy for designating occurrences.
Additionally strategies similar to those presented above for single nodes may be generalized for node substrings instead of individual nodes. For example, a case may involve a sequence of K nodes, wherein a substring is characterized by consecutive repetition of a certain pattern (e.g., ABABABAB). A strategy may apply to the repeated substring AB of the larger substring as applied to single nodes in the previously presented strategies. Such a strategy may also incorporate policies similar to those discussed above. Further, other strategies and other policies may be utilized regarding repeated node substrings in embodiments of the present invention.
Next, the algorithm 100 may proceed to a transition block (block 110). Block 110 may direct traces having a certain number of loops (e.g., 1, 2, or more) to a loop removal block (block 115), where loops are removed as described above. Block 115 may represent designating a strategy and removing loop repetitions that exceed a loop threshold based on the designated strategy. For example, once all the traces are determined, each trace may be processed in block 115 to remove all but one random occurrence of a node in accordance with a chosen strategy. This removal may be significant in embodiments of the present invention because, as previously discussed, problems associated with infinite traces and impractically high numbers of traces result from loops.
Block 120 represents removal of repeated nodes in a broken loop situation. In embodiments of the present invention, this removal is similar to block 115. However, it may be different in that it involves removal of repeated but non-consecutive appearances of a node or of a substring. For example, in the trace ABCAD, node A appears twice and hence there is a loop. However, while there is a loop, there is no consecutive appearance of any substring in the trace ABCAD. Block 120 may represent removal of such a loop in accordance with defined strategies, such as those discussed above (e.g., keep only the first occurrence, keep only the last occurrence, and so forth). Additionally, the generalization to substrings rather than single nodes also applies.
If no loops exist in a trace or once loops have been removed for each trace, the transition block 110 may direct the algorithm 100 to proceed with sorting the traces, as illustrated by block 125. Sorting the traces (block 125) may comprise defining an order for the node identifiers such as a lexicographic order. Further, sorting the traces (block 125) may comprise ordering the trace strings for the different process instances and creating an ordered list, which may comprise an array of sorted traces.
Next, the algorithm 100 may proceed to count all subtraces (block 130) and remove duplicate traces (block 135). First, regarding counting all subtraces (block 130), each left subtrace of every trace may be determined. A left subtrace is a left substring or a substring starting from the leftmost node identifier in the trace string. For example, AB is a left subtrace in diagram 10 and BC is not. Accordingly, a left subtrace may contain the node identifiers of a process instance at some point before the execution is completed (i.e., during the execution). In block 130, for each left subtrace of every trace, the number of instances that have that particular left subtrace may be counted. Further, in block 135 duplicate subtraces may be removed from the ordered list.
After sorting traces (block 125), counting subtraces (block 130), and removing duplicate subtraces (block 135), the algorithm may proceed to a determination block 140. The determination block 140 may represent determining whether the array of traces contains traces that should be processed in the remainder of the algorithm 100 or the array is ready for storage. If there are more traces for processing, the algorithm 100 may proceed in block 145 by defining a stage as the first distinct left subtrace. In other words, the first left subtrace that differs (i.e., subtraces are not equal) from any previous subtraces may be considered. Further, it should be noted that subtraces may be considered equal if they have the same nodes and the nodes are in the same order. For example, if the current trace is ABACD and stages A and AB are already in the list of stages, then ABA is the first distinct left subtrace.
Block 150 represents another determination block. In one embodiment of the present invention, block 150 represents a determination of whether there are more left subtraces to process. If there are more, the algorithm 100 may continue to block 155. Otherwise, the algorithm 100 may return to block 140. Block 155 may be a determination block wherein a determination is made as to whether a count (i.e., the number of past instances that produced that trace) for the subtrace being processed is greater than a count threshold. This count threshold may be a user defined limiting factor. If the count for the subtrace is more than a threshold, the subtrace (e.g., ABA in the previous example) may be added to a set of traces as represented by block 160. In other words, the stage may be added to the set of computed stages. This addition in block 160 may assure a sufficient number of instances are present to allow computation of an accurate prediction model.
After block 160, the algorithm may proceed to a conditional redefining of the stage in block 165. Alternatively, if the count for the subtrace is less than the threshold, block 155 may direct the algorithm 100 to bypass block 160 and proceed directly to block 165. Block 165 may represent a redefinition of the stage dependent upon, or conditioned on, whether a particular determination is made. For example, a determination may be made as to whether the left subtrace being considered is smaller than the whole trace (i.e., whether the trace has more elements). Next, if the trace has more elements, the next node to the right of the subtrace (e.g., node C in the previous example) may be added and considered the new left subtrace (e.g., ABAC based on the previous example) and the algorithm 100 may then return to the determination block 150. Alternatively, if the subtrace does not have more elements (i.e., the subtrace is equal to the full trace), the algorithm 100 may directly return to block 150. However, it should be noted that in other embodiments, different implementations may apply. For example, instead of basing the procedures on a left subtrace, a right subtrace may be incorporated.
As
In one embodiment of the present invention, completion of the algorithm 100 corresponds with a set of stages having been defined. Further, each stage of the set may be characterized by a set of node identifiers that, along with the definition of the strategies for eliminating or reducing repeated appearances, identifies a certain set of node executions whose data can be used to compute prediction models.
Specifically,
As discussed above, loops may be the source of problems with infinite and impractically high numbers of traces. Accordingly, much like the loop removal in algorithm 100, block 210 in the illustrated embodiment may represent removing loop repetitions that exceed a loop threshold based on loop handling strategy. Similarly, block 215 may represent removing repeated nodes in broken loop situations based on the strategy. The activity in block 215 may also be analogous to similar activity in the computation phase (algorithm 100). However, it may differ from the previous phase (algorithm 100) in that it only applies to the single process instance being considered. In one embodiment, the strategy forming the basis for removal in blocks 210 and 215 is the same as the strategy designated in block 115 of the stage computation phase (algorithm 100).
Block 220 represents defining the trace in its current state as the whole trace. In some embodiments, this may enable discernment of a stage that matches the whole trace. Further, the whole trace may have been cleaned from repetitions as described previously. Accordingly, full use of the information present in the trace may be achieved using a data mining model developed with the whole trace. Further, the computed model may be more accurate because it may be based on a large number of features.
Block 225 represents searching for an existing stage equal to the current trace. Specifically, in one embodiment, block 225 comprises searching for stages computed in the earlier phase (algorithm 100) that match the current trace. Additionally, block 225 may represent determining whether the current trace is a stage or not. For example, the current trace may not be promoted to the role of a stage because the current trace is not present in enough instances to compute a prediction model. Further, block 230 may represent determining whether a match was found in block 225 or not.
The algorithm may then proceed to either blocks 235 or 240 depending on whether the current trace matches an existing stage (block 230). If the current trace matches a stage as determined in block 225, the model corresponding to the match may be applied in block 235. This application (block 235) may be effective because the model will be based on information that is available from the current trace. Alternatively, if the current trace does not match a stage (block 225), the algorithm may remove the rightmost element from the trace thus creating a new trace (block 240) and, beginning with block 225, the algorithm 200 may be repeated using the newly created trace (i.e., attempt to match the newly generated trace with a stage). This repetition or process loop may end (block 240) upon finding a matching stage. In one embodiment, the repetition or process loop ends when the matching stage is an empty stage (i.e., a stage that corresponds to the beginning of the process, where only information available at the start of the process is used to generate the predictive model).
Each of the phases (algorithm 100 and algorithm 200) presented above may operate together or independently. In one embodiment, the algorithms 100, 200 cooperate to identify a set of stages and a set of candidate features to be considered when generating a plurality of predictive models. Additionally, the algorithms 100, 200 may cooperate to facilitate a determination of which of the plurality of predictive models will be most effectively used on a running process.
While the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5276870 | Shan et al. | Jan 1994 | A |
5325525 | Shan et al. | Jun 1994 | A |
5412806 | Du et al. | May 1995 | A |
5546571 | Shan et al. | Aug 1996 | A |
5651099 | Konsella | Jul 1997 | A |
5694591 | Du et al. | Dec 1997 | A |
5704053 | Santhanam | Dec 1997 | A |
5729666 | Konsella et al. | Mar 1998 | A |
5732151 | Moon et al. | Mar 1998 | A |
5796752 | Sun et al. | Aug 1998 | A |
5826239 | Du et al. | Oct 1998 | A |
5862381 | Advani et al. | Jan 1999 | A |
5870545 | Davis et al. | Feb 1999 | A |
5909519 | Gunning et al. | Jun 1999 | A |
5937388 | Davis et al. | Aug 1999 | A |
6014673 | Davis et al. | Jan 2000 | A |
6029002 | Afifi et al. | Feb 2000 | A |
6041306 | Du et al. | Mar 2000 | A |
6043816 | Williams et al. | Mar 2000 | A |
6078982 | Du et al. | Jun 2000 | A |
6195377 | Bell et al. | Feb 2001 | B1 |
6230313 | Callahan et al. | May 2001 | B1 |
6266058 | Meyer | Jul 2001 | B1 |
6301706 | Maslennikov et al. | Oct 2001 | B1 |
6308163 | Du et al. | Oct 2001 | B1 |
6338159 | Alexander et al. | Jan 2002 | B1 |
6343274 | McCollom et al. | Jan 2002 | B1 |
6349406 | Levine et al. | Feb 2002 | B1 |
6463547 | Bailey et al. | Oct 2002 | B1 |
6463548 | Bailey et al. | Oct 2002 | B1 |
6467083 | Yamashita | Oct 2002 | B1 |
6487715 | Chamdani et al. | Nov 2002 | B1 |
6574001 | Klosterman et al. | Jun 2003 | B2 |
6609247 | Dua et al. | Aug 2003 | B1 |
6629108 | Frey et al. | Sep 2003 | B2 |
6651243 | Berry et al. | Nov 2003 | B1 |
6675379 | Kolodner et al. | Jan 2004 | B1 |
6678876 | Stevens et al. | Jan 2004 | B2 |
6685290 | Farr et al. | Feb 2004 | B1 |
6688786 | Brown et al. | Feb 2004 | B2 |
6694453 | Shukla et al. | Feb 2004 | B1 |
6697089 | Bryan | Feb 2004 | B1 |
6748583 | Aizenbud-Reshef et al. | Jun 2004 | B2 |
6751789 | Berry et al. | Jun 2004 | B1 |
6817010 | Aizenbud-Reshef et al. | Nov 2004 | B2 |
6817013 | Tabata et al. | Nov 2004 | B2 |
6862727 | Stevens | Mar 2005 | B2 |
6971092 | Chilimbi | Nov 2005 | B1 |
7035206 | Brewer et al. | Apr 2006 | B2 |
7043668 | Treue et al. | May 2006 | B1 |
7069544 | Thekkath | Jun 2006 | B1 |
7076776 | Kim et al. | Jul 2006 | B2 |
7086043 | Roediger et al. | Aug 2006 | B2 |
7134116 | Thekkath et al. | Nov 2006 | B1 |
7140008 | Chilimbi et al. | Nov 2006 | B2 |
7155708 | Hammes et al. | Dec 2006 | B2 |
7165190 | Srivastava et al. | Jan 2007 | B1 |
7168066 | Thekkath et al. | Jan 2007 | B1 |
7174543 | Schwemmlein et al. | Feb 2007 | B2 |
7185234 | Thekkath | Feb 2007 | B1 |
7200588 | Srivastava et al. | Apr 2007 | B1 |
7207035 | Kobrosly et al. | Apr 2007 | B2 |
7228528 | Wang et al. | Jun 2007 | B2 |
20020010623 | McCollom et al. | Jan 2002 | A1 |
20020087954 | Wang et al. | Jul 2002 | A1 |
20020095666 | Tabata et al. | Jul 2002 | A1 |
20020120918 | Aizenbud-Reshef et al. | Aug 2002 | A1 |
20020181001 | Klosterman et al. | Dec 2002 | A1 |
20030023955 | Bates et al. | Jan 2003 | A1 |
20030041315 | Bates et al. | Feb 2003 | A1 |
20030051231 | Schwemmlein et al. | Mar 2003 | A1 |
20030204513 | Bumbulis | Oct 2003 | A1 |
20040015934 | Muthukumar et al. | Jan 2004 | A1 |
20040088670 | Stevens et al. | May 2004 | A1 |
20040088689 | Hammes | May 2004 | A1 |
20050071832 | Kawahito | Mar 2005 | A1 |
20050183075 | Alexander et al. | Aug 2005 | A1 |
20050223364 | Peri et al. | Oct 2005 | A1 |
20050246700 | Archambault et al. | Nov 2005 | A1 |
Number | Date | Country | |
---|---|---|---|
20050278705 A1 | Dec 2005 | US |