1. Technical Field
Embodiments of the present disclosure are directed to predicting the potential risks of a new opportunity in terms of the observed root causes of similar historical contracts.
2. Discussion of the Related Art
Information technology (IT) service contract risk prediction is a major challenge facing IT service providers today. Service providers need to know about the potential risks for a given new opportunity ahead of contract signing to make educated decisions about whether to undertake the IT operations of a potential client, how to be proactive about mitigation planning if they are willing to take on a risky opportunity, and to price the contract accordingly to cover for risks that cannot be mitigated.
Existing risk management processes have limitations. Service providers often need to decide on whether to undertake a contract with limited access to the client's IT environment and without thoroughly understanding potential risks. In addition, there is lack of a quantitative approach to objectively evaluate risks and prioritize risk management tasks.
It is, therefore, useful to have reliable risk prediction algorithms that can take into account the performance of similar historical contracts to expose all relevant potential risks in a systematic manner.
According to an embodiment of the disclosure, there is provided method for predicting risks for information technology (IT) service contracts, including calculating a probability of occurrence of each of one or more target risks in a target contract, constructing one or more clusters of root causes observed in historical contracts similar to the target contract, where two root causes are in the same cluster if both root causes occur in one or more contracts in the set of historical contracts, where two root causes co-occur if both root causes are in the same cluster, for each of the one or more clusters, identifying root causes that co-occur with one or more target contract risks by searching each cluster for root causes of similar historical contract risks such that the identified root causes represent additional new contract risks, and calculating the probability of occurrence of each new target risk identified for the target contract based on root causes identified in the similar historical contract risks.
According to a further embodiment of the disclosure, calculating a probability of occurrence of each of the one or more target risks in the target contract includes calculating a similarity between the target contract and each historical contract, and for each historical contract whose similarity with the target contract is above a similarity threshold, and for each risk associated with the target contract, summing the similarity for each historical contract in which the risk occurs, and dividing by a sum of the similarities of all historical contracts in the set of similar historical contracts.
According to a further embodiment of the disclosure, constructing one or more clusters of root causes of the one or more target contract risks includes constructing a graph of the root causes for the one or more target contract risks, and forming root cause co-occurrence clusters from the graph. Two root causes are connected by an edge if the two root causes frequently co-occur in the set of similar historical contracts, the two root causes are defined to frequently co-occur if each of the two root causes occurs for a same subset of the set of similar historical contracts, and a size of the subset with respect to the size of the set of similar historical contracts is greater than a predetermined threshold,
According to a further embodiment of the disclosure, forming root cause co-occurrence clusters from the graph includes computing a Laplacian matrix L∈n×n of the graph, where n is a number of root causes, computing a first k eigenvalues of the Laplacian matrix, where k<n, computing a reduced dimensional matrix T∈n×k from the predetermined number of eigenvalues clustering points (yi), i=1, . . . , n, that correspond to rows of the reduced dimensional matrix into k clusters Ci, and generating co-occurrence clusters Si, i=1, . . . , k, from the point clusters where Si={j|yj∈Ci}.
According to a further embodiment of the disclosure, the method includes using a k-means algorithm to cluster points (yi), i=1, . . . , n, into k clusters Ci.
According to a further embodiment of the disclosure, calculating the probability of occurrence of each new target risk includes calculating a weighted average of a number of occurrences of each new target risk across historical contracts whose similarity may or may not exceed the similarity threshold, where a weight is determined by the contract similarity.
According to a further embodiment of the disclosure, the method includes adjusting the probability of occurrence of each target risk identified for the target contract based on additional root causes identified through co-occurrence clusters in the similar historical contract risks by adding an adjustment weight to the occurrence probability.
According to a further embodiment of the disclosure, the adjustment weight for each target risk based on root causes identified through co-occurrence clusters in the similar historical contract risks is calculated based on business logic.
According to a further embodiment of the disclosure, the adjustment weight for each target risk based on root causes identified though co-occurrence clusters in the similar historical contract risks is calculated by multiplying the occurrence probabilities of each target risk in a chain of target risks, where each successive target risk in the chain is dependent upon a preceding target risk in the chain.
According to a further embodiment of the disclosure, the method includes predicting a set of risks that impact profitability of a new services contract from the one or more target risks in the target contract and the new target risk identified in the similar historical contract risks, and predicting an the overall aggregated risk impact on contract profitability in terms of an achieved gross profit percentage compared to a planned gross profit percentage.
According to a further embodiment of the disclosure, the method includes eliminating target risks before contract signing.
According to a further embodiment of the disclosure, the method includes predicting other co-occurring risks based on risks observed during a post contract-signature delivery phase.
According to another embodiment of the disclosure, there is provided a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for predicting risks for information technology (IT) service contracts.
a)-(d) illustrate several kinds of clusters around observed root causes, according to an embodiment of the disclosure.
Exemplary embodiments of the invention as described herein generally include systems and methods for predicting risks of troubled contracts in terms of the observed root causes of similar historical contracts. Accordingly, while embodiments of the invention are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit embodiments of the invention to the particular forms disclosed, but on the contrary, embodiments of the invention cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.
Embodiments of the present disclosure focus on predicting the potential risks of a new opportunity in terms of the observed root causes of similar historical contracts by using co-occurrence algorithms. While there is several previous work on risk management of information technology (IT) contracts, they are either specific to the post-contract signature phase or do not focus on risk prediction in terms of the root causes observed in similar historical contracts. Although financial risk analytics (FRA), disclosed in “Financial Risk Analytics for Service Contracts”, U.S. application Ser. No. 13/685,362, filed on Nov. 26, 2012, the contents of which are herein incorporated by reference in their entirety, does perform risk prediction in terms of the root causes observed in similar historical contracts, the underlying algorithms do not leverage co-occurrence. Algorithms according to embodiments of the present disclosure extend the FRA algorithms.
Methods according to embodiments of the disclosure for risk prediction rely on co-occurrence algorithms. According to embodiments of the disclosure, co-occurrence can be used for risk prediction as follows.
The risks of a given new opportunity can be predicted by keeping track of the observed root causes and their frequency in similar historical contracts. While this method does provide a way to predict risks for a given new opportunity, it does not leverage the inter-relationships or dependencies of root cases. Embodiments of the disclosure can use root cause co-occurrence clusters in a pre-contract signature (engagement) phase to strengthen the contract similarity-based prediction by identifying additional potential risks that may be missed by a contract similarity model. Embodiments of the disclosure can also use root cause co-occurrence clusters in a post-contract signature (delivery) phase to predict likely risks in terms of observed root causes for a service contract for pro-active mitigation given the materialization of root causes residing in the co-occurrence clusters. Delivery risks result from activities after contract signing or after projects start, such as a failure to meet targeted Service Line Agreements (SLAs), a project manager leaving in the middle of project, whereas engagement risks result from activities before contract, such as, under-estimating the number of resources needed to complete a project during the contract design phase, not allocating enough time to complete a project, etc.
As disclosed above, according to embodiments of the disclosure, it is possible to build several different kinds of clusters around root causes, such as temporal (root cause A occurs after root cause B), shown in
To form a cluster according to an embodiment of the disclosure, start with a set of contracts C and a contract c in C. Let RC be the set of all possible root causes, and let RC(c) be the subset of root causes for the contract c. This relationship may be denoted symbolically as RC(c)⊂RC. Two root causes r1, r2∈RC are said to co-occur if r1∈RC(c) and r2∈RC(c) for some c∈C. A co-existence cluster is shown in
Two root causes r1 and r2 are said to “frequently” co-occur if r1∈RC(X) and r2∈RC(X) for some set of contracts X∪C, and |X|/|C| is greater than some threshold, where |X| is the size of the set X, and |C| is the size of set C. Given RC and C, a co-occurrence graph CoG(V,E) can be constructed, where V is a set of root causes and E is a set of edges such that (r1, r2)∈E if r1 and r2 “frequently” co-occur. Given a co-occurrence graph CoG(V,E), there exist graph clustering algorithms that can perform clustering. Given a co-occurrence graph G, a cluster forming algorithm according to an embodiment of the disclosure can construct k clusters.
where w(r1, r2) is a weight of edge (r1, r2), and d(r1) is a degree of each node, which is the sum of edge weights incident on node r1. The weight of an edge (r1, r2) may be a measure of co-occurrence of the root causes r1 and r2.
Let u1, u2, . . . , uk be the corresponding eigenvectors from U with U∈n×k. Next, at step 33, a matrix T∈n×k may be constructed as follows:
This matrix T contains reduced dimensional data upon which clustering will be performed. Then, for i=1, . . . , n, let y1∈k be the vector corresponding to the i-th row of T. Next, at step 34, cluster the points (yi)i=1, . . . , n into clusters C1, . . . , Ck. An exemplary, non-limiting algorithm for forming clusters C1, . . . , Ck is a k-means algorithm. Finally, generate the clusters S1, . . . , Sk with Si={j|yj∈Ci} at step 35.
Each cluster is a root cause co-occurrence cluster. Let D={d1, d2, . . . , dn} be a set of RC clusters. If two root causes frequently co-occur, then they belong to the same cluster. Note that D is a equivalence relation.
The accuracy of a risk prediction can be improved based on contract similarity and co-occurrence clusters. For a given new opportunity, for which contract risks are to be predicted in terms of historically observed root causes, one first determines a set of similar historical contracts. Contract similarity is determined by calculating a distance between each historical contract and the new opportunity using several contract fingerprints, such as geography, total contract value (TCV), risk assessment surveys, etc. Once a subset of similar historical contracts is determined, embodiments may keep track of which observed root causes from similar historical contracts occur with what frequency to determine how likely it is for a given root cause to also occur in the new opportunity.
While this method does provide one way of predicting root causes for a given new opportunity, it does not leverage the inter-relationships and/or dependencies of root causes.
According to an embodiment of the disclosure, root cause co-occurrence clusters described above may be used to strengthen the contract similarity determination by predicting additional risks that may be missed by the original determination.
A risk prediction method according to an embodiment of the disclosure is based on measuring a similarity between a given new opportunity and a set of historical contracts based on their fingerprints. Two contracts are similar if they have similar contract fingerprints. In a data set for testing embodiments of the invention, there are more than 300 features in a contract fingerprint, but not all features are equally important or useful for risk predictions. To ensure that more significant features provide a greater contribution to the similarity measure, higher weights are assigned to them. Since a goal of determining contract similarity is to predict risks, weights are assigned to features based on their correlation with the actual similarity between a pair of contracts, in terms of their reported root causes. The higher the correlation, the higher the weight.
Based on the weighted fingerprint, which is a vector of weighted features, one may calculate the Euclidian distance between the new opportunity and each historical contract. The contract similarity Sim(i,j) between the new opportunity i and each historical contract j can then be calculated as Sim(i, j)=1−Dist(i, j) where Dist(i, j) is the Euclidian distance between the new opportunity i and historical contract j.
A final step is predicting risks for the new opportunity based on its similarity to historical contracts by considering how often certain root causes occurred in similar historical contracts. In other words, one may calculate the probability of a given risk occurring for the new opportunity by taking a weighted average of its number of occurrences across all similar contracts such that the weight is determined by the degree of contract similarity. A risk prediction algorithm according to an embodiment of the disclosure is illustrated in
Note that the formula for r_probabilityk in statement 5 of the algorithm indicates that if root cause rk occurs in all historical contracts j, then the probability r_probabilityk=1. However root cause rk does not necessarily occur in all historical contracts, so the probability is calculated based on the historical contracts that observe this root cause rk.
The concept of contract similarity can ensure that risks for a new opportunity are predicted/determined based on using only very similar historical contracts' observed root causes. This means that, depending on a similarity threshold, the original model may miss some risks, which can be caught by the extended algorithm's co-occurrence component.
For example, assume a similarity threshold of 0.75, and assume there are 7 historical contracts, 4 of which are similar to the new opportunity by having a similarity measure above the threshold. Assume the following contracts (C) and their observed risks (R):
Since the similarity of contracts C5, C6, and C7 with the new opportunity is less than the threshold of 0.75, these contracts would not be used in the original algorithm calculation. The original algorithm would only use contracts C1 through C4 in the calculations and yield predicted risks for new opportunity as: R1, R2, R3, and R4 in that order with decreasing probability. The original algorithm would, however, miss the fact that, in less similar contracts C5 through C7, R5 always co-occurs with R3 and is therefore highly likely to happen to contracts where R3 occurs.
The extension identifies other likely risks through co-occurrence clusters, such as Risk 5, and calculates their probabilities by also considering the relatively less similar 3 historical contracts they may occur in. Those 3 historical contracts that had observed Risk 5 were not originally part of the initial risk prediction algorithm as their similarity did not meet the threshold. The extension implies that just because the historical contracts that had observed Risk 5 are not very similar to the new opportunity does not mean that Risk 5, which is observed to always follow Risk 3, which is observed in the similar contracts, will not materialize in the new opportunity.
According to further embodiments of the disclosure, the above algorithm can be extended to include a co-occurrence algorithm according to an embodiment of the disclosure as illustrated in
For example, if k==RC3, and RC5 is in a dependency cluster of k, include RC5 as a predicted risk, if it is not already among predicted risks, as RC5 will tend to follow RC3 based on historical data. The algorithm of
As can be seen from
The probability of any additional risk identified by the extension, such as Risk 5 in the right hand side list, may be calculated by taking a weighted average of its number of occurrences across less-similar contracts such that the weight is determined by the degree of contract similarity. Less-similar means it did not meet the similarity threshold of the algorithm, but still has a similarity value assigned to it.
Calculating the probability of the newly identified risks through the co-occurrence extension by leveraging less similar contracts has now been described. However, risks already identified through the initial similar contract algorithm may also be identified by the co-clustering. The probabilities of the risks already identified with the original algorithm may be directly used by the extension. Sometimes, those probabilities may need to be updated.
For example, if RC3 in the above diagram had an arrow pointing to RC4 (or Risk 4) instead of RC5, that means Risk 4 is not only identified by the contract similarity algorithm but also through the co-occurrence extension. Therefore it should be emphasized over other risks that were identified through the similarity or extension algorithms alone. According to an embodiment of the disclosure, to address this, the probability of RC4 occurring for new opportunity is boosted by adding an adjustment weight to the probability calculated through the contract similarity algorithm. So the final probability would be 0.7+adjustment_weight, where adjustment_weight could be defined through business logic or by multiplying the respective probabilities of RC3×RC4.
Once co-occurrence cluster have been identified, they can be used to predict other co-occurring risks that may materialize having observed a given risk during post contract-signature (delivery) phase. According to further embodiments of the disclosure, contract profiles, contract similarity and co-occurrence algorithms can be used to create a predictive model that can predict a set of key risks that impact profitability of a new services contract, and predict the overall aggregated risk impact on contract profitability in terms of achieved gross profit (GP) percentage compared to the planned GP percentage. The output of such a predictive model can be used to proactively eliminate predicted target risks defined before contract signing and to generate other risk assessment and mitigation insights.
System Implementations
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system”. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The computer system 91 also includes an operating system and micro instruction code. The various processes and functions described herein can either be part of the micro instruction code or part of the application program (or combination thereof) which is executed via the operating system. In addition, various other peripheral devices can be connected to the computer platform such as an additional data storage device and a printing device.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
While the present invention has been described in detail with reference to exemplary embodiments, those skilled in the art will appreciate that various modifications and substitutions can be made thereto without departing from the spirit and scope of the invention as set forth in the appended claims.