TEMPORAL EXPLANATIONS OF MACHINE LEARNING MODEL OUTCOMES

TECHNICAL FIELD

The subject matter described herein relates to machine learning models, and more particularly to a system and method for providing temporal explanations of machine learning model outcomes in transactional systems.

BACKGROUND

Machine learning models are designed to learn decision boundaries based on input data, and they find widespread applications in all types of business use cases. A machine learning model, M, takes an input vector x, and produces a score, y. The input x, is a set of input variables associated with the entity that is being scored by the model, M. The score, y, represents a particular outcome. For example, in payment card fraud detection systems, M could be a neural network model, x could be the values of the various variables associated with a given transaction, and y could be the score representing how likely it is that the transaction is fraudulent.

Often in reality, for transactional systems, the above description is too simplistic and would yield very poor detection of fraud. This is due to the fact that the likelihood of fraud is not only a function of the current transaction, but also (and many times, more importantly) the context in which the transaction happened. This often means evaluating the current transaction in the context of everything else that has transpired on that payment card's account. Thus, the input variables are computed in such a manner that they can reflect the contextual aspect of the transactions. The historical context of impact of past transactions on a current score is critically important to provide human understandable explanations that isolate the past actions. This has become even more important for regulations such as the General Data Protection Regulation (GDPR), in which a human analyst needs to be able to speak to impacted customers of an automated decisioning system, and provide specific events in the pre-history that led to the automated decision. The term “transaction” is used throughout this document to refer to an event associated with an entity that is being evaluated or monitored.

In the above-mentioned use case of fraud detection for payment cards, this context could be provided by using variables that incorporate the information from the past transactions along with the current transaction, to define the input x. Thus, the input may comprise of variables such as, for example: number of transactions in the last 7 days; amount spent in the last one day as a proportion of the average amount spent in the last 30 days; etc. In more advanced systems, such as FICO's Falcon® fraud detection system, more sophisticated techniques including Recursive Bayesian Estimation, time and event decayed averages, and other means are employed to incorporate the historical information in x. The only downside is that this type of “summarization” loses the granularity of the information present in the historical transactions.

The current state of the art for explaining the score, S, generated by a model M, deploys a variety of explanation systems, E, that explain the score, S, as a function of the input components of x. While this works fine in systems where x represents instance features, say an image recognitions system, this doesn't often explain the specific precursors to the score in the transaction history that drove the score outcome. To understand this, consider the above-mentioned use case where a high score indicative of fraud can be attributed to an unusually high value of a variable, say, amount spent in the last one day as a proportion of the average amount spent in the last 30 days. What conventional explanation systems would fail to identify is which of the transactions in the past drove the value of the ratio spend in last day to 30 days to lead to the high fraud score.

Consider another example of a machine learning model driven detection of cyber security breaches. In such a system, a sudden surge in activity on an uncommon port could be the indicative of a breach, and consequent, a high score representing such breach. But the actual reason that the explanation system may focus on could be the volume through that port, instead of recognizing the first opening and use of the port as the core events in the prehistory of merit in terms of investigation and reasons for the outcome. Note that the vector x would be comprised of various input variables, {x₁, x₂, x₃, . . . }, which are the input variables, and a set of scalars, bounded by the dimensionality of the vector x.

Accordingly, there is a need for an explanation system that can identify relevant transactions in the past that led to the eventual high score by a transactional analytics system configured to address threats, such as stopping payment card fraud, detecting cyber security threat, credit risk, and identifying money laundering activities, to name a few. There is also a need for such transactional analytics systems to be aware of the past transactions, to isolate the relevant offending past transaction that are the cause of the current high score, and isolate those transactions that drive the main driving predictor variables in x and consequently the score being explained.

SUMMARY

This document describes an explanation system and method that can identify relevant transactions in the past that led to a high score in a transactional analytics system, for analyzing transactions for various purposes, such as stopping payment card fraud, detecting cyber security threat, credit risk, and identifying money laundering activities, to name a few. Such a system and method not only looks at the predictor variables, x, that define the decision model, M. A system and method as disclosed herein is configured to be aware of the past transactions, and can isolate the relevant offending past transaction(s) that is/are the cause of the current high score, and isolate those transactions that drive the important main driving predictor variables in x, and consequently the score being generated and explained.

In one aspect a system and computer program product are disclosed, as well as a method executed by the system and computer program product. The method includes the steps of receiving transactional data of an entity over a period of time, the transactional data representing a plurality of transactions of the entity. The method further includes deriving an input vector from the plurality of transactions. The method further includes generating, by a scoring model of the transactional analytics system, a score based on the input vector derived from the plurality of transactions, the scoring module model generating the score based on the transactional data as an input. The method further includes generating, by an explanation model of the transactional analytics system, a weighted reason vector and associated top ranked reasons based on the input vector derived from the plurality of transactions, the weighted reason vector and top ranked reasons providing a set of top contributor variables in the input vector and latent features of the scoring model that explain the score.

The method further includes using the scoring model, recursively omitting selected transactions of the plurality of transactions from the input to determine a maximal effect of at least one of the plurality of transactions on the score, the weighted reason vector, and/or the associated top ranked reasons. The method further includes generating, based on the omission of at least one of the plurality of transactions having the maximal effect on the score, the weighted reason vector, and/or associated top ranked reason, an importance measure that is a function of a change in the score and a change in the weighted reason vector and/or the top ranked reasons. The method further includes using an importance measure and based on the at least one of the plurality of transactions, determining the at least one of the plurality of transactions that has the maximal importance measure. The method further includes outputting the at least one of a plurality of transactions in an output file to a computer to enable review of the temporal events and transactions most responsible for the entity's current score and reason vector and/or top ranked reasons.

Implementations of the current subject matter can include, but are not limited to, methods consistent with the descriptions provided herein as well as articles that comprise a tangibly embodied machine-readable medium operable to cause one or more machines (e.g., computers, etc.) to result in operations implementing one or more of the described features. Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors. A memory, which can include a non-transitory computer-readable or machine-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein. Computer implemented methods consistent with one or more implementations of the current subject matter can be implemented by one or more data processors residing in a single computing system or multiple computing systems. Such multiple computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.

The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims. While certain features of the currently disclosed subject matter are described for illustrative purposes in relation to a system and method for providing temporal explanations of machine learning model outcomes in transactional systems, it should be readily understood that such features are not intended to be limiting. The claims that follow this disclosure are intended to define the scope of the protected subject matter.

DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,

FIG. 1 shows a series of scores of an entity in a transaction analytics system, where the profile or state variables are impacted by current as well as past transactions;

FIG. 2 shows a series of scores of an entity in a transaction analytics system, showing the impact of removing a past transaction, Tn−2 on the subsequent profile variables;

FIG. 3 illustrates an impact of each transaction, k, on score, S₁₀, is measured by quantifying the score, S^k₁₀as a result on eliminating the k^thtransaction;

FIG. 4 illustrates a computation of change in energy of the reasons R^k_nafter elimination of transaction T_kwith respect to reference set of reasons with all the transactions, R_n, such that if only top N reasons are reported then the delta vector is represented for the top reasons in R^k_nonly;

FIG. 5 illustrates an impact of each transaction, k, on score, S₁₀, and reason codes R₁₀being measured as a change in energy O_k,10as well as change in rank order P_k,10as a result on eliminating the k^thtransaction;

FIG. 6 illustrates an impact of each transaction, k, on score, S₁₀, and reasons R₁₀, being measured by I_k,10as a result on eliminating the k^thtransaction; and

FIG. 7 is a schematic representation of computations required to determine the impact of the last 3 transactions, N=3, on the most recent score and reasons, where the last 3 transactions, T_n−1, Tn_n−2, T_n−3, along with the current transaction, T_n, and the 4^thpast profile, x_n−4are used, such that recursively each of the last 3 transactions are dropped to compute the impacted current profile variable and the corresponding score.

When practical, similar reference numbers denote similar structures, features, or elements.

DETAILED DESCRIPTION

This document describes an explanation system and method that is designed to identify relevant past transactions that led to a high score in a transactional analytics system for analyzing transactions for various purposes, such as stopping payment card fraud, detecting cyber security threat, credit risk, and identifying money laundering activities, to name a few. Such a system is designed to not only look at the predictor variables, x, that define the decision model, M, but to also be aware of the past transactions, and can isolate the relevant offending past transaction that is the cause of the current high score, and isolate those past transactions that drive the important main driving predictor variables in x, and consequently the score being explained.

In some implementations, a transactional analytics system and method is designed to analyze a number of past transactions that contribute to a score generated by a model M, and identify a past transaction that is most responsible for the current high score of the model, M. In doing so, the system and method can also leverage any instance based explanation system to provide the drivers of the score. Furthermore, the system and method can also quantify impact of each of the past transactions on such instance based explanation system, along with their impact on the scores. Accordingly, the system and method disclosed herein implement a sophisticated methodology and framework that can isolate the impact of each of the past N transactions on a current score generated by an arbitrary model, M. It is worth noting that in a transactional analytics system, each past transaction impacts the values of the current input variables, x, in complex ways that are not discernable by a human, in the mind or using pen and paper. With so many scores, and infinitely greater number of transactions that support the scores, the present system and method can only be executed by a computer. As part of determining the impact of the past transaction on the current score, the relationship between the past transaction and the current values of the input variables, x, must also be reproduced in the analysis.

The temporality of the explanation is reflected by the system being designed to determine which of the past transactions maximally influence the current score. Such a determination can be made based on each individual transaction, or as an aggregate of multiple transactions. Each past event or transaction impacts the values of the input variables, x, in an intricate manner, defined by the variable transformation function, F, that computes and updates the input variables, x, as new transactions unfold. The impact of a particular transaction T is assessed by measuring the change in the values of x, when the said transaction T is missing. This allows us to accurately quantify the impact of the transaction T on the current value of input variables, say, x′. If the modified value of x′ is then scored using the model M, the score S′ that it produces could differ from the original score S, as can the driving model reasons. The quantum of change in the score (S−S′) is the impact of the transaction T on the score S. Such ‘what if’ analysis of historical transactions is important in order to provide human understandable explanations that isolate the past actions and their impact on the score. This is important for regulation such as GDPR where a human analyst needs to be able to speak to impacted customers of an automated decisioning system and provide specific events in the pre-history that lead to the automated decision.

More than one past transaction can be the reason for the current score, and can be used to explain the score S. Thus, more than one temporal value can be identified as the driver of the explanation. Accordingly, a system and method can include a mechanism for arbitration based on the quantum of change in the score (S−S′). As a quantum of change in a score is measured, (S−S′), the changes in the reasons of S′ as described by the instance based explanation system, E are recorded. For example if two different past transaction removals yield the same score difference, yet a first transaction does not have any change in one or more of the original reasons whereas a second one does, then the two past transactions are considered to have a different impact on S, irrespective of the identical quantum of change in the score, S−S′. The identified transactions provide additional narrative to the explanations from the instance based explanation system, E, already in use in transactional analytic systems, and allows these systems to meet a standard set by regulation where specific human understandable explanations are necessary for automated decision systems.

In some implementations, for instance, such as in a case of detecting money laundering, if the identified transaction represents a large deposit 4 days ago, and the explanation model, E, identifies the variables, amount transferred out of the account in the last one day and amount deposited into the account in the last one week, both as top explanations, then an understanding of why the account has been flagged as a potential money laundering case in terms of the specific driving transaction or transactions can facilitate the correct event-based explanations. Further, if the explanation associated with the modified score S′ changes to merely amount transferred out of the account in the last one day, it provides additional details about the reason for score S. Thus, not only the change in the score points to the impact of the past transaction on the current score, but the change in the reasons from the instance based explanation system, E, provides additional insight into the nature of the impact.

Machine Learning Models for Transactional Analytics

A machine learning model is trained by presenting a training dataset to a learning algorithm. In cases where it is learning based on supervised information, it learns relationships between the predictor variables, x and the outcome variable, t, and encapsulates them as learnt model parameters, θ. Thus, an arbitrary model, M, is represented by an underlying function, custom-character , driven by the machine learning model's architecture, and the learnt model parameters, θ.

M(x)= custom-character (x,θ) (1)

where,

- M: an arbitrary machine learning model
- x: an input instance represented as a vector comprising of the constituent variables
- θ: learnt parameters of the model based on historical data
- : a function representing the machine learning model's underlying architecture

Such a model, M, generates a score, S as a function of an instance of input variables, x. This can be represented as follows:

S=M(x) (2.a)

where,

- S: score generated by the model, M for the instance of the input, x.

Often, an instance based explanation system, or model, E, is used to identify the constituent variables of the input vector, x, or their groupings, latent features, or a set of reasons, {r₁, r₂, r₃, . . . }, distinct from the input variables or latent features, that explain the score, S. Without loss of generality, we will use the notation of {r₁, r₂, r₃, . . . } when talking of reasons, even when we are using the individual or groupings of input variables {x₁, x₂, x₃, . . . , x_N} or associated latent features as reasons. Such systems also provide a rank ordering of the importance of each of the reasons in terms of their explanation of the score. This can be represented as follows:

R=E(x,S) (2.b)

R is an ordered set of reasons {r_p, r_q, r_r, . . . } generated by the explanation system, or model, E, that explains the score S, of the input instance, x. Reason r_phas a rank of 1, reason r_qhas a rank of 2 and so on. Often only a subset of top m reasons with the highest ranks are reported. An “energy” associated with each reason is dependent on the nature of the explanation system, E, and the internal mechanism using which each reason can be generated:

L=L(x,S,E)={l_p,l_q,l_r, . . . } (2.c)

- where, l_pis energy associated with r_p, and so on. The rank ordering of the reasons is based on the values of l_p, l_q, 1_r, . . ., with l_p>l_q>l_r, > . . . and so on.

Where transactional analytics differ from other types of analytics is the nature of the input variable vector x. Many non-transactional analytics systems consider past events, but only by pre-processing past transactions into event summarization. The systems and methods described herein deal with transaction streams, as opposed to event summarization. Where event summarization is employed, analysis similar to that described below would have to be constructed based on versions of the event summary, such as missing only two payments versus three payments in the last 12 months. It is worth noting that in transactional systems, the input vector is a function of not only the current transaction, T_n, but all the past transactions, T₁, T₂, . . . , T_n−3, T_n−2, T_n, as well, where the subscripts represent the enumeration of the transactions, or events, of the entity being analyzed. Thus, a series of transactions for an entity, T₁, T₂, . . . , T_n−3, T_n−2, T_n−1, T_n, are recorded, with T_nbeing the most current transaction. This is distinct from non-transactional systems, where the system gets to see only event summarization but not the actual transactions, where once again retro or ‘what-if-analysis’ would be need to be done on events that lead to specific important event summarization variables driving reasons.

FIG. 1. illustrates a series of scores of an entity in a transaction analytics system, where the profile or state variables are impacted by current as well as past transactions. The corresponding states that fully describe the entity are represented by a sequence of instance vectors, x₁, x₂, . . . x_n−3, x_n−2, x_n−1, x_n. The state is often referred to as profile in a transaction analytics system. A profile vector is a multidimensional representation of the entity's state, and each of its dimension is called a state variables or profile variable and are functions of the prior state and current transaction:

x
_n
=F(x_n−1,T_n) (3.a)

Equation 3.a is equivalent to considering some or all of the prior transactions using an alternative but computationally equivalent transformation function, F′:

x
_n
=F′(T_n,T_n−1,T_n−2, . . . ) (3.b)

While using either of the two approaches, 3.a or 3.b, the value of x_nis impacted if one or more of the past transactions are missing. In some implementations, the recursive version F in equation 3.a can be used as a reference method for profile computation and update.

If a machine learning model is used to predict an outcome, such as a likelihood of a transaction T_nof an entity being fraudulent, the relationship is manifested in terms of the entity's current state. In all such models, the score, S_n, generated by a machine learning model, M, is represented as follows:

S
_n
=M(x_n) (4.a)

and the scores are explained by a set of reasons, R_n. Consider the instance based explanation system, E, being used to explain the drivers of a score instance. Let R_n={r_np, r_nq, r_nr, . . . } be a set of reasons identified by the explanation system as the reasons for the score. Let, {r_np, r_nq, r_nr, . . . }∈{r₁, r₂, r₃, . . . }, a superset of reasons identified or learned during the training or creation of the instance based explanation system, or model, E. In an alternative implementation, {r_np, r_nq, r_nr, . . . }∈{x₁, x₂, x₃, . . . , x_m} i.e., they could be the scalar components of the input vector x. In yet other alternative implementations, grouping of input variables and latent features can be used in lieu of individual variables as explanations. In such cases, {r_p, r_q, r_r, . . . } represent set of input variable groups and equations of features instead of literal input variables. Without loss of generality, the notation of {r₁, r₂, r₃, . . . } is used herein when describing reasons, even when the individual input variables or their groupings or latent features are used as reasons.

Thus:

R
_n
=E(x_n,M(x_n))={r_np,r_nq,r_nr, . . . } (4.b)

where, M represents the learnt relationship in the context of the specific machine learning architecture and the learnt model parameters or weights based on the historical training data. The corresponding energy associated with the reasons are given by:

L
_n
=L(x_n,M(x_n),E)={l_np,l_nq,l_nr, . . . } (4.c)

- where, l_np>l_nq>l_nr> . . .

Note that these are the transactional analytics equivalent of the equations (2.a), (2.b) and (2.c) respectively. The benefit of the above equations is that a scoring system does not have to have access to the entire transaction history to be able to provide a meaningful score S_nas it utilizes the state updated at T_n.

Temporal Score Explanations For Machine Learning Models

The importance of each of the past transactions, T_k, is assessed for the current score, S_n, where, k=(n−1, n−2, . . . ). This goal is achieved by using the counterfactual scenario of the same sequence of transactions, but with T_kmissing. Such a sequence would look like T₁, T₂, . . . , T_k−1, T_k+1, . . . , T_n−1, T_n. The effect of the missing transaction, T_k, is felt on each of the subsequent states, due to (3.a) and (3.b). Thus, the subsequent states or profiles would change from x_k+1, x_k+2, . . . , x_n−1, x_nto x^k_k+1, x^k_k+2, . . . , x^k_n−1, x^k_nrespectively, due to equations (3). The superscript of k indicates that the k^thtransaction has been eliminated. It follows from (4) that the subsequent scores would also change from S_k+1, S_k+2, . . . , S_n−1, S_nto S^k_k+1, S^k_k+2, . . . , S^k_n−1, S^k_nrespectively. FIG. 2 illustrates a series of scores of an entity in a transaction analytics system, showing the impact of removing a past transaction, Tn−2 on the subsequent profile variables. In FIG. 2, the T_n−2transaction is represented as missing and hence n−2 is the superscript in subsequent transactions representing the missing transaction for the impacted states and scores.

Thus, the contribution of a transaction, T_k, can be assessed on a particular score, S_n, by measuring the quantum of change in the score, when that transaction, T_k, is missing. This change also affects the set of reasons associated with the modified input vector, x^k_n, and the modified score, S^k_n, where:

S
^k
_n
=M(x^k_n) (5.a)

The impact of the transactions, T_k, on the reasons as explained by explanation system, E, can be identified by this mechanism as well. Accordingly, a new set of explanations corresponding to each impactful transaction, T_k, can be generated as follows:

R
^k
_n
={r
^k
_p
,r
^k
_q
,r
^k
_r
, . . . }←E(x^k_n,M(x^k_n)) (5.b)

The corresponding energy associated with the reasons are given by:

L
^k
_n
=L(x^k_n,M(x^k_n),E)={l^k_np,l^k_nq,l^k_nr, . . . } (5.c)

- where, l^k_np>l^k_nq>l^k_nr> . . .

Thus, an importance measure, I_k,n, quantifying the contribution of a transaction, T_k, on a particular score, S_n, and set of reasons R_kcan be identified as follows:

I
_k,n
=Q(ΔS_k,n,O_k,n,P_k,n) (5.d)

- where, k<n,

x
^k
_n
=F(x_n−1,T_n),

- where T_kis missing

ΔS_k,n=S_n31 S^k_n

O
_k,n
=O(R_n,R^k_n)

P
_k,n
=P(R_n,R^k_n)

where:

- O is a function to quantify the impact of the change in energy of reason codes due to the absence of the transaction T_k,
- P is a function to quantify the impact of the change in rank order of reason codes due to the absence of the transaction T_k, and
- Q is a function to quantify the cumulative impact of change in score and change in reason codes due to the absence of the transaction T_k.

Thus, the top-most influential transaction, T_k′, impacting the current score S_n, is given by:

T_k′=top_transaction(S_n)=argmax_k(I_k,n) (6.a)

- where contributions to I_k′,nare:
- ΔS_k′,n
- O(R_n, R^k′_n)

P(R_n,R^k′_n)} (6.b)

Comparing R^k′_nwith R_nalong with the change in score, ΔS_k′,n, provides insights into the nature of influence of T_k′ on S_n. By carefully choosing the nature of functions O, P and Q, the influence score, I_k′,n, provides a mechanism for arbitration between two values of k′ where, the respective R^k_nmay demonstrate different degrees of change with respect to R_neven though the corresponding scores, S^k′_nare same, or vice versa. These are described more fully below.

Three distinct scenarios arise for S_n−S^k_n: 1) The value remains range bound. This means that there is not much difference in S_n−S^k_n, and indicates that the transaction T_khas minimal impact on the score S_n; 2) The value is positive and reasonably large. This means that the absence of the transaction T_kleads to reduction in score, S_n, indicative of influence on the score S_n; and 3) The value is negative. This means that the absence of the transaction T_kleads to increase in score, S_n. To understand this, consider the case of fraud detection on a payment card, such as a debit card, credit card, or the like. If a high score indicates higher likelihood of fraud, then in such a scenario, T_krepresents a very normal behavior on the card in the presence of other fraudulent transactions. Hence, its presence could depress the score, however its absence makes the offending transactions look even more suspicious and therefore a higher score, S^k_n.

FIG. 3 shows three types of transactions in terms of their impact on the score S_n. FIG. 3 illustrates the impact of each transaction, k, on score, S₁₀, which is measured by quantifying the score, S^k₁₀as a result on eliminating the k^thtransaction. Note that as per 5, k<10. Score, S₁₀is shown as a dotted horizontal line, and ΔS_8,10=S₁₀−S⁸₁₀, the impact of the 8^thtransaction, T₈, on score S₁₀as a dotted vertical line. The impact is largely positive for transactions, 6, 8 and 9, as shown by smaller circles. The impact is negative for transaction 2, shown by larger circle. The impact remains range bound for the other transactions.

Comparing {r_p, r_q, r_r, . . . } against {r^k_p, r^k_q, r^k_r, . . . } corresponding to each of the identified impactful transactions, T_k, provides additional insight into the reasons behind the score S_n. This is a critical, as the impact of top most influential transactions, T_k, on the score S_n, is not only due to the quantum of the influence on the score, S^k_n, but also on the impact of the reasons from the instance based explanation system itself. This also comes in handy to quantify the impact on reasons, when the impact of two different transactions, T_k′ and T_k″, on the overall score S_nneed to be compared in terms of how they impact the reasons. The importance measure, I^k_n, described in equation (6) thus not only incorporates the impact on the score itself, but also on the nature of impact on the score in terms of influence on the reason codes.

In some implementations, to quantify the impact of the change in energy of reason codes due to the absence of the transaction T_k, O_k,nvia function O, the energy associated with each reason code for R_nand the their energies for R^k_nis determined. Then, the change in energy of each of the reasons in R^k_n, is measured, and this change is represented as a vector, whose length provides us a quantification of the change. The function O can usually be defined as follows:

O
_k,n
=O(R_n,R^k_n)=∥L^k_n−L_n∥₂

O
_k,n
=O(R_n,R^k_n)=∥l^k_np−l_np,l^k_nq−l_nq,l^k_nr−l_nr, . . . ∥₂ (6.c)

which is the L₂norm of the delta vector L^k_n−L_nwith rank ordering on L^k_n. Rank ordering on L^k_nmeans that l^k_np>l^k_nqand l^k_nq>l^k_nrand so on. The indices p, q, r etc. denote specific reasons. Thus, l^k_np−l_nprepresent change in the energy of reason r_passociated with R^k_nand R_n, and so on.

In alternative implementations, the maximum change in energy of any of the reasons can be determined and examined. This is defined as follows:

O
_k,n
=O(R_n,R^k_n)=max(l^k_np−l_np,l^k_nq−l_nq,l^k_nr−l_nr, . . . ) (6.d)

If only the top few reasons are reported, then the change in energy of only the reported reasons in R^k_nare considered. For example, if only top 3 reasons are reported with the top three reasons being r^k_np, r^k_nqand r^k_nr, then equation (6.c) gives way to the following:

O
_k,n
=O(R_n,R^k_n)=∥l^k_np−l_np,l^k_nq−l_nq,l^k_nr−l_nr∥₂ (6.e)

Similarly, in such a scenario, equation (6.d) gives way to the following:

O
_k,n
=O(R_n,R^k_n)=max(l^k_np−l_np,l^k_nq−l_nq,l^k_nr−l_nr) (6.f)

FIG. 4 shows the computation of change in energies, O, as an illustrative example. FIG. 4 illustrates a computation of change in energy of the reasons R^k_nafter elimination of transaction T_kwith respect to reference set of reasons with all the transactions, R_n. If only top N reasons are reported, then the delta vector is represented for the top reasons in R^k_nonly.

In some implementations, to quantify the impact of the change in rank order of reason codes due to the absence of the transaction T_kvia function P, a simple heuristic is used to determine the impact of transaction T_kon the rank order of the reasons, by way of reason impact point, P_k,n, quantifying the measure of difference between R^k_nand R_n:

- 1. P_k,n=0
- 2. Let L=number of reasons=|R^k_n|=|R_n|
- 3. For each r^kin R^k_n, that is not in R_n
  - a. Update P_k,n=P_k,n+(L-rank+1), where rank of r^kis its position in the ordered set R^k_n
- 4. For each r^kin R^k_n, that is also in R_n
  - a. Assign |rank^k_n−rank_n| points, where rank^k_nof r^kis its position in the ordered set R^k_nand rank_nis its position in the ordered set R_n
  - b. Update P_k,nby
- 5. Report P_k,n

Comparing P_k,nfor two transactions T_k′ and T_k″ also allows to understand the impact on the reasons by way of points calculated above, P_k′,nand P_k″,n. Other ways for determining the quantum of impact on reasons can be implemented as well. Thus, the above heuristic can be used to define P(R_n, R^k_n) in equation 6 as follows:

P(R_n,R^k_n)=P_k,n (6.g)

With this equation, an impact of two important transactions, T_k′ and T_k″ can be ascertained by comparing their importance measure, I_k′,nand I_k″,n. Alternative implementations of I_k,nare possible. Top transactions and associated reasons can be identified by rank ordering I_k,nand identifying the corresponding transactions, as in equation (6.a). It should be noted that irrespective of the quantum of impact of the prior transactions on the score, they may have significant impact on the reasons, R^k_nas well.

This nuanced aspect of the impact of a prior transaction on current score is shown in FIG. 5, which illustrates that an impact of each transaction, k, on score, S₁₀, and reason codes R₁₀is measured as change in energy O_k,10as well as change in rank order P_k,10as a result on eliminating the k^thtransaction. For transactions 1, 3, 4, 5 and 7 we notice change in energy of the reasons without any change in the rank ordering of the rank ordering. Note also that lower change in energy may be associated with higher change in rank order and vice versa. Hence it is important to consider both along with the quantum of change in score to determine the most impactful transaction, T_k′.

The importance measure, I_k,ncan be decomposed into the score impact, S_n−S^k_n, and the reason impact, P_k,n. This decomposition provides us a richness in explanation. While I_k,nis used for rank ordering on transactions to determine the most impactful transactions, the impact on the score, S_n−S^k_n, and reason impact P_k,naid in understanding the nuanced difference when I_k′,nand I_k″,nare equal for transactions, T_k′ and T_k″. In some cases though, where the focus is on determining the impact on the reasons more than on the score itself, P_k,ncan act as the primary metric for rank ordering to determine the most impactful transactions. The nature of the function Q, often derives from the nuances of explanations expected in a particular system.

In some preferred exemplary implementations, a replay method can be used. In a replay method, the model, M is used, and the available transactions to explain the current score, S_nare used. One transaction T_kis dropped at a time, to process the remaining transactions in the same sequence as original, through F, to generate the profile x^k_n, and generate the score S^kn.

From an implementation perspective, a pre-history of transactions is retained, and past state profiles requires storage and computational time that is linear in terms of the number of transactions. For automated decisioning systems, the analysis of the offending transactions responsible for driving the reasons are computed at time of investigation by human-analyst or an automated communication system.

The impactful transactions need to be identified at the time of case generation and customer dialogue in the case management phase, and not in terms of any streaming real-time score production. As such, generation of the transactions driving explanation should not slow down real-time scoring systems, and only a small fraction of accounts may need such detailed causal transaction analysis. Thus, the computational requirements of the causal transaction explanation analysis do not impact the production use of the transactional analytics system, but operate post score-generation in a case management environment or automated customer communication system where explanations are generated. Further, given the often-decreasing value of older transactions in explanation, the following restrictions can be applied to the value of k in (5.a) to restrict the number of past transactions that are identified as having impact on the current score, S_n:

n−N≤k<n (⁷)

where, N is a system parameter determined during the time of the system design.

Further restrictions can be applied based on the value of I_k,n. In one approach, a threshold Δ is applied, on I_k,n. Δ can be absolute or relative. FIG. 6 shows the most influential past transactions, rank ordered on the I_k,nunder two different thresholds. An impact of each transaction, k, on score, S₁₀, and reasons R₁₀, is measured by I_k,10as a result of eliminating the k^thtransaction. When a threshold of Δ1 is applied, then 3 transactions are identified as impactful on score S₁₀. They are 8^th, 9^thand 6^thtransactions respectively, in that order, marked by circles. On the other hand, when a threshold of Δ2 is applied then only the 8^thtransaction is identified as impactful, which is marked with two circles. Note that transaction number 2 has an impact which is not considered as per equation (5). Just as in case of FIG. 4, the reason impact, P_k,ncan be used to further understand the nature of impact.

Note that as per equation (3.a), the last profile value and the last transaction are required to compute the current profile variable. Consider the sequence of transactions, T_n−N, T_n−N+1, . . . , T_k+1, T_k−1, . . . , T_n−1, T_nwith T_kmissing, where n−N≤k<n. The equation (3.a) can be leveraged in an iterative fashion for computing x^kn as follows:

x
_n−N
=F(x_n−N−1,T_n−N)

x
_n−N+1
=F(x_n−N,T_n−N+1)

x
_k−1
=F(x_k−2,T_k−1)

x
^k
_k+1
=F(x_k−1,T_k+1)

x
^k
_n−1
=F(x^k_n−2,T_n−1)

x
^k
_n
=F(x^k_n−1,T_n) (8)

The set of equations (8) provide an easy way to compute the value of x^k_n. Then, using equation set (5), S^k_n, R^k_nand I_k,nare computed. Note that x_n, S_nand R_nhave to be persisted from the production as well. Using equation set (8), the past N transactions, T_n−N, T_n−N+1, . . . , T_n+2, T_n−nare retained, as well as the current transaction, T_nin a database, such as a NoSQL database. The profile value prior to the transaction T_n−Nis also retained, that is x_n−N−1in the NoSQL database. Implicit here is the need to compute O_k,nand P_k,nbased on R_nand R^k_nto determine the value of I_k,n. Based on I_k,nthe transactions, T′_kare identified that are most impactful based on equation set (6). FIG. 7 below shows a schematic of the computation required to quantify the impact of each of the past transactions, where N=3.

FIG. 7. is a schematic representation of computations required to determine the impact of the last 3 transactions, N=3, on the most recent score and associated reason(s). Using the last three transactions, T_n−1, T_n−2, T_n−3, along with the current transaction, T_n, and the 4^thpast profile, x_n−4, recursively each of the last 3 transactions are dropped to compute the impacted current profile variable and the corresponding score. Corresponding reason codes for the current impacted score are also generated using the instance based explanation system. This in conjunction with the current transaction and the reasons for the current (un-impacted) score provide a comprehensive picture of what's driving the score in a transactional analytics system. I_k,nis used for determining the most impactful transactions. In each iteration, only x_n−4along with the subsequent transactions, excluding the dropped transactions, are required to compute the updated x^k_n, S^k_nand R^k_n.

As the transactions unfold, the persisted transactions and past input vector that have been stored in the NoSQL database need to be updated. In some implementations, this is managed by storing the transactions in a queue data structure, TS, and flushing out the oldest transaction when adding the most recent one. Simultaneously, the persisted input vector is updated by a more recent copy of the input vector.

One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

These computer programs, which can also be referred to programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.

To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.

In the descriptions above and in the claims, phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” Use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.

The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.

	Number	Date	Country
Parent	16460934	Jul 2019	US
Child	17746802		US

TEMPORAL EXPLANATIONS OF MACHINE LEARNING MODEL OUTCOMES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Continuations (1)