The present disclosure generally relates to machine learning and more specifically to first-to-saturate single modal latent feature activation to explain and interpret machine learning models.
Machine learning models, such as neural networks, may be used in critical applications such as in the healthcare, manufacturing, transportation, financial, information technology industries, among others. In these and other applications, explanations to a user related to why the model generated a specific prediction for a particular input, what data, models, and processing have been applied to generate that prediction, and/or the like can be useful, and in some instances, required. However, conventional explainable machine learning methods inefficiently and/or inaccurately demonstrate the properties of the hidden units of the models as they fail to address the multi-modal nature of unconstrained latent feature activation. As a result, explainability methods on conventional models provide inconsistent and unreliable explanation associated with the model output.
Methods, systems, and articles of manufacture, including computer program products, are provided for generating explanations for single modal latent feature activation using first-to-saturate latent features in machine learning. In one aspect, there is provided a system. The system may include at least one processor and at least one memory. The at least one memory may store instructions that result in operations when executed by the at least one processor. The operations may include: training, based at least on a plurality of training examples including a plurality of input features, a first machine learning model including at least one hidden node. The operations further include determining, for each of the plurality of training examples and the at least one hidden node and based on the first machine learning model, a plurality of subsets of the plurality of input features including a minimum combination of the plurality of input features first to cause saturation of the at least one hidden node. The operations further include determining, for the at least one hidden node and based on the plurality of subsets of the plurality of input features for each of the plurality of training examples, a hidden node ordered saturation list including a subset of the plurality of subsets. The operations further include generating a sparsely trained machine learning model to determine an output for a training example of the plurality of training examples based on at least one input feature of the subset included in the hidden node ordered saturation list corresponding to the at least one hidden node. The at least one input feature first causes saturation of the at least one hidden node for the training example.
In another aspect, a computer-implemented method includes training, based at least on a plurality of training examples including a plurality of input features, a first machine learning model including at least one hidden node. The method further includes determining, for each of the plurality of training examples and the at least one hidden node and based on the first machine learning model, a plurality of subsets of the plurality of input features including a minimum combination of the plurality of input features first to cause saturation of the at least one hidden node. The method further includes determining, for the at least one hidden node and based on the plurality of subsets of the plurality of input features for each of the plurality of training examples, a hidden node ordered saturation list including a subset of the plurality of subsets. The method further includes generating a sparsely trained machine learning model to determine an output for a training example of the plurality of training examples based on at least one input feature of the subset included in the hidden node ordered saturation list corresponding to the at least one hidden node. The at least one input feature first causes saturation of the at least one hidden node for the training example.
In another aspect, there is provided a computer program product including a non-transitory computer readable medium storing instructions. The instructions may cause operations may executed by at least one data processor. The operations may include: training, based at least on a plurality of training examples including a plurality of input features, a first machine learning model including at least one hidden node. The operations further include determining, for each of the plurality of training examples and the at least one hidden node and based on the first machine learning model, a plurality of subsets of the plurality of input features including a minimum combination of the plurality of input features first to cause saturation of the at least one hidden node. The operations further include determining, for the at least one hidden node and based on the plurality of subsets of the plurality of input features for each of the plurality of training examples, a hidden node ordered saturation list including a subset of the plurality of subsets. The operations further include generating a sparsely trained machine learning model to determine an output for a training example of the plurality of training examples based on at least one input feature of the subset included in the hidden node ordered saturation list corresponding to the at least one hidden node. The at least one input feature first causes saturation of the at least one hidden node for the training example.
In some variations, one or more features disclosed herein including the following features can optionally be included in any feasible combination of the system, method, and/or non-transitory computer readable medium.
In some aspects, an explanation corresponding to at least one training example of the plurality of training examples is generated. The explanation includes an input feature-level contribution to the output.
In some aspects, generating the explanation includes: determining the at least one input feature of the subset first causing saturation of the at least one hidden node for the training example. The generating also includes determining, for the at least one hidden node of the sparsely trained machine learning model, a hidden node weight contribution to the output. The hidden node weight contribution corresponds to the at least one input feature. The generating also includes determining, for the at least one hidden node of the sparsely trained machine learning model, a relative importance of the at least one input feature of the subset based on the hidden node ordered saturation list, the hidden node weight contribution, and a weight corresponding to the at least one input feature. The generating also includes defining the input feature-level contribution to the output by at least aggregating a list of most important input features based on the relative importance of the at least one input feature for each subset of the plurality of subsets.
In some aspects, when saturation of the at least one hidden node for the training example occurs prior to reaching an end of the hidden node ordered saturation list, at least one remaining input feature of the subset is ignored.
In some aspects, when saturation of the at least one hidden node for the training example fails to occur prior to reaching an end of the hidden node ordered saturation list, the at least one input feature includes all input features of the subset.
In some aspects, determining the ordered hidden node saturation list for the at least one hidden node includes determining a most frequently occurring subset of the plurality of subsets of the plurality of input features causing saturation of the at least one hidden node. Determining the ordered hidden node saturation list also includes defining the ordered saturation list as the most frequently occurring subset of input features of the plurality of subsets of the plurality of input features.
In some aspects, the plurality of subsets of the plurality of input features causes hidden node saturation of the least one hidden node when a weight contribution of at least one of the plurality of subsets of the plurality of input features is greater than a predetermined saturation threshold.
In some aspects, determining the hidden node ordered saturation list of the at least one hidden node further includes ranking each input feature of the plurality of subsets of the plurality of input features based on at least one of a weight assigned to the input feature and a frequency of the input feature.
In some aspects, the weight is assigned during the training of the first machine learning model.
In some aspects, the training includes inputting the plurality of input features for each of the plurality of training examples in a predetermined order or a random order.
In some aspects, a hidden node of the at least one hidden node is determined to be antipolarized based on a first proportion of the plurality of training examples meeting a positive saturation threshold and a second proportion of the plurality of training examples meeting a negative saturation threshold.
In some aspects, the at least one antipolarized hidden node is replaced with a first newly created hidden node and a second newly created hidden node. Determining the hidden node ordered saturation list of the at least one hidden node includes: determining, for the first newly created hidden node, a first hidden node ordered saturation list of the plurality of input features causing positive saturation of the at least one hidden node. Determining the hidden node ordered saturation list of the at least one hidden node also includes determining, for the second newly created hidden node, a second hidden node ordered saturation list of the plurality of input features causing negative saturation of the at least one hidden node.
In some aspects, each of the plurality of training examples includes an input vector containing the plurality of input features.
In some aspects, the subset includes one or more input features of the plurality of input features.
Implementations of the current subject matter can include methods consistent with the descriptions provided herein as well as articles that comprise a tangibly embodied machine-readable medium operable to cause one or more machines (e.g., computers, etc.) to result in operations implementing one or more of the described features. Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors. A memory, which can include a non-transitory computer-readable or machine-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein. Computer implemented methods consistent with one or more implementations of the current subject matter can be implemented by one or more processors residing in a single computing system or multiple computing systems. Such multiple computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including, for example, to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims. While certain features of the currently disclosed subject matter are described for illustrative purposes in relation to generating explanations for single modal latent feature activation using first-to-saturate latent features in machine learning, it should be readily understood that such features are not intended to be limiting. The claims that follow this disclosure are intended to define the scope of the protected subject matter.
The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,
When practical, like labels are used to refer to same or similar items in the drawings.
Explainable machine learning models provide users with explanations regarding the predictions and outputs generated by the machine learning models. However, conventional methods for providing explanations generally have significant weaknesses. A common approach is to perturb the input data of the machine learning model to understand which features are most sensitive for a particular model output. These methods may suffer from unpredictable behavior where the perturbed input data behaves poorly due to model extrapolation effects, and so may not be satisfactory. Another explainable machine learning approach is to construct simpler models, which are intended to explain the true model in a limited region. However, explanations based on a simplified local model, instead of the actual model, do not generally accurately reflect the decision function of the original complex machine learning model. These simplified models also often do not meet the regulatory needs for explaining the decision model.
Further complicating explanation generation in machine learning is that the latent features, which are calculated at hidden nodes, are often the true quantities requiring explanation in terms of attribution to inputs. In the context of artificial neural networks, because of the nonlinearities used in the activation function (generally a sigmoid or tanh), hidden units often reach saturation. In densely connected networks, a large number of permutations of inputs can cause a hidden unit to be in the same saturated state. In other words, conventional neural networks, with dense connections and many free parameters, generally reach saturation with multiple different subsets of input features which we refer to as multi-modal saturation. This means that for a single input vector for an example record, multiple modes of saturation (and different subsets of features) exist, removing certainty form explanation assignment. In these situations, finding a unique unambiguous deterministic single modal explanations for a hidden unit is generally impossible because a single latent feature or hidden unit could have multiple behaviorally different and merged groups of inputs from often many different overlapping groups of inputs that might be responsible for saturation. These different configurations of inputs per hidden unit imply that there are multiple different explanations based on the features active for a particular input sample. Accordingly, conventional models are not capable of providing a single unambiguous deterministic single modal explanation for saturation of a hidden unit, and are thus unable to provide a single, accurate, and reliable explanation.
Consistent with implementations of the current subject matter, the explainable machine learning system described herein generates highly interpretable machine learning models (e.g., neural networks) and a specific unambiguous single modal method for producing explanations from those models. For example, the explainable machine learning system generates explanations by at least determining which input features to the machine learning models are minimally sufficient to drive hidden units of the models into saturation.
Further, the explainable machine learning system generates an accurate and consistent explanation based at least on a determined first-to-saturate subset of input features to each hidden unit. This prevents generation of multiple modes of saturation for an example, obscuring accurate explanations for the output of the model. For example, the explainable machine learning system consistent with implementations of the current subject matter generates an ordered saturation list of input features into each hidden unit of the machine learning model. In this approach, the input features may be included in the saturation list of input features until saturation is first met. The list of input features may define a unique ordered set of input features attributed to the saturation of the hidden unit, and subsequently the relative importance of each feature to that first saturation. Accordingly, the explainable machine learning system consistent with implementations of the current subject applies a first-to-saturate principle, which is applied during both training and inference.
As a result, the explainable machine learning system described herein excludes multi-modal saturation and the associated ambiguity in providing explanations, improving the accuracy, reliability, and consistency of explanations provided by the explainable machine learning system. Moreover, the explainable machine learning system described herein generates a highly interpretable sparse machine learning model based on the ordered set of input features so that the output of the model can be directly explained in terms of either a set of latent feature modes or a set of ranked input features that are significant contributors to the output decision. Additionally and/or alternatively, the explainable machine learning model consistent with implementations of the current subject matter reduces the computational burden of providing an accurate and reliable explanation at least because the explainable machine learning system may only consider the subset of input features and/or hidden nodes based on the first-to-saturate principle. Therefore, the explainable machine learning system described herein produces unambiguous explanations by at least applying the first-to-saturate principle, generating highly interpretable hidden units, and generating an unambiguous deterministic single modal explanation based on a deterministic ordered set of features in priority order to constrain and determine the minimum set of features to saturate the hidden units. The explainable machine learning system applies internal weights, activations, and saturation states of the machine learning model. As such, the explainable machine learning system provides a direct unambiguous explanation of the model used for generating outputs.
As noted herein, the generated explanations are determined using a unique ordered list (e.g., per hidden unit per training example) of input features to the hidden unit ranked by their contribution to the network output value. Further, by consolidating and rank ordering the input features of the ordered list of input features driving saturation of each hidden unit over the entire training batch, the explainable machine learning system determines overall rank-ordered lists, which can be used for feature selection, as well as for enforcing simplified neural network structure (e.g., by masking the weight matrix to only allow feature combinations already proven relevant based on their subset of the input features driving hidden units to saturation).
The machine learning engine 110 includes at least one data processor and at least one memory storing instructions, which when executed by the at least one data processor, perform one or more operations as described herein. The machine learning engine 110 train the machine learning model 120 based on one or more training examples including one or more input features. As described herein, the one or more training examples may each include an input vector containing a plurality of input features.
In some implementations, the machine learning engine 110 trains the machine learning model 120 based on all of the plurality of input features. In this example, the machine learning engine 110 trains the machine learning model 120 as a dense model or network. Additionally and/or alternatively, the machine learning engine 110 trains the machine learning model 120 based on a subset of the input features, such as the subset of input features included in the ordered saturation list described in more detail below. In this example, the machine learning engine 110 trains the machine learning model 120 as a sparse model or network, since the machine learning model 120 is trained based on only a subset of the input features. In some implementations, the input features included in the subset of the input features may be assigned a non-zero weight, while the input features of the plurality of input features not included in the subset of the input features may be assigned a zero weight. In this way, only the input features included in the subset of the input features may contribute to the output of the machine learning model 120, such as the sparsely trained machine learning model 120.
The machine learning model 120 may include a neural network, and/or the like.
The machine learning engine 110 may train the machine learning model 120 to generate the output (shown as z) 306 by, for example, inputting the plurality of input features (and corresponding training examples) and/or the subset of the plurality of input features to the one or mode hidden nodes. For example, to train the dense machine learning model 120, the machine learning engine 110 may input all of the plurality of input features (and corresponding training examples), and/or assign weights to all of the plurality of input features, to the one or mode hidden nodes. Additionally and/or alternatively, to train the sparse machine learning model 120, the machine learning engine 110 may input the subset of the plurality of input features and/or assign non-zero weights to only the subset of the plurality of input features, to the one or mode hidden nodes.
The one or more hidden nodes (shown as y) 304 may be positioned between the input 302 and the output 306 of the machine learning model 120. Each hidden node 304 may produce a defined output based on the inputted one or more input features. For example, each hidden node may be associated with an output for which a desired explanation is provided. As an example, a hidden node may be associated with existence of a medical condition, non-existence of a medical condition, fraudulent behavior, non-fraudulent behavior, and/or the like.
At the one or more hidden nodes, weights (e.g., zero or non-zero weights) are applied to the input features. The weighted input features are directed through an activation function as an output of the one or more hidden nodes. In other words, the one or more hidden nodes perform linear or nonlinear transformations of the one or more input features of the machine learning model 120. The one or more hidden nodes may be arranged into one or more hidden layers. As an example of a single hidden layer, the following notation may be used: the activation function may be denoted as f(⋅) and g(⋅). Referring to
Consistent with implementations of the current subject matter, the activation function is generally described herein as tanh(⋅) nonlinearity, which has natural concepts of positive and negative saturation. However, the hidden node 304 may include a rectified linear unit (ReLU) activation function. The ReLU activation function does not generally have an upper limit, and so the corresponding hidden node does not generally positively saturate (unlike sigmoid non-linearities). Generally, once hidden nodes reach saturated values, they can become more difficult to train, due to the gradients becoming smaller the further the node is pushed into saturation. Techniques like ReLU and batch normalization may be used to avoid this issue. Batch normalization with ReLU units does affect a type of squashing function. However, tanh(⋅) (or other saturating nonlinearities) may be implemented to improve the stability and robustness of training. The first-to-saturate principle (with saturating non-linearities) may also address the vanishing gradient problem, in that the activations will be limited as units first reach saturation.
Referring back to
At 202, the machine learning engine 110 may train a first machine learning model, such as a dense neural network, based on a plurality of training examples including a plurality of input features, such as a plurality of input features contained within an input vector. The dense neural network may be trained based on a full set of the input features. In other words, the machine learning engine 110 may train the dense neural network based on a dense weight matrix including the plurality of input features and/or corresponding weights assigned to the plurality of input features, such that any hidden node of the dense neural network can receive any one or more of the plurality of input features. Thus, by at least training the machine learning model 120, the machine learning engine 110 may collect the saturation properties and activation modes for saturating each hidden node of the machine learning model 120.
The machine learning engine 110 may train the dense neural network to minimize misclassification using a loss function. The machine learning engine 110 may also regularize the weight matrix by adding a penalty on W to the cost function to control complexity. The loss function may include an L1 loss, which is the sum of weights Σ|wij, and L2 loss, which is the sum of squared weights Σ|wij|2. The L1 and L2 constraints (e.g., loss functions) may be implemented as additional terms to be minimized during the gradient descent training of the machine learning model 120 (e.g., the dense neural network). Training the machine learning model to minimize the regularized loss functions may result in fewer input features of the one or more input features having significant contributions to activation or saturation of a hidden node of the machine learning model and removes unnecessary spurious multi-modal behaviors. The incorporation of the regularized loss functions may further improve transparency within the machine learning model 120 and may further improve the interpretable and single modal explainable machine learning model described herein.
Referring again to
The subset of input features may include one or more features of the plurality of input features inputted to the hidden node. The machine learning engine 110 determines the subset causes saturation of the hidden node when a weight contribution of the subset meets (e.g., is greater than or equal to) a predetermined saturation threshold. In some implementations, the weight contribution is determined by at least aggregating (e.g., totaling) an absolute value of a weight assigned to each of the plurality of input features of the subset of the plurality of input features. For example, the saturation threshold may be 0.95 such that the hidden node is considered saturated when |yi |>0.95. In other implementations the saturation threshold may be 0.85, 0.90, 0.97, 0.99, and/or other ranges therebetween, greater, or lesser. The absolute value of the weight contribution is used so that the input features having the largest magnitude weights are included as part of the subset, regardless of the sign (e.g., negative or positive) of the weights. However, the sign (e.g., positive or negative) may still be considered depending on whether the machine learning engine 110 determines the hidden node is antipolarized, or is positively or negatively saturated.
At 406, the hidden node yi receives the input vector of the plurality of input features and the corresponding weight vector including the sorted corresponding preactivation terms. In this example, the hidden node yi is represented by activation function yi=f(wi·x+b1), where bi=0, and wi·x is the preactivation term and input matrix. Further, in this example, the predetermined saturation threshold is yi>0.95. As a result, the hidden node is saturated with five input features. For example, the aggregated absolute value of the weights corresponding to the first five input features in the input vector is 1.9. Applying the aggregated absolute value to the activation function, yi=tanh(1.9)=0.9567. Thus, as shown at 408, the first-to-saturate list for the hidden node yi is {x1, x2, x3, x4, x5}. This subset of input features was minimally sufficient for the hidden node to saturate since its weight contribution of 1.9 is meets the saturation threshold (1.8318) needed for |yi|>0.95 (e.g., the predetermined saturation threshold). In some implementations, the aggregated total of the activations may reach a negative saturation first (for activation yi<−0.95), in which case the unit is considered to be negatively saturated, and the first-to-saturate list would contain those input features needed to reach that negative saturation.
In some implementations, the machine learning engine 110 determines an ordered saturation list of the plurality of input features causing the saturation of the hidden node for each hidden node of the machine learning model 120 and based on the subset of the plurality of input features for each of the plurality of training examples. To determine the overall ordered saturation list, the machine learning engine 110 may determine a most frequently occurring subset of the plurality of input features causing saturation of the hidden node. In such instance, the machine learning engine 110 would define the overall ordered saturation list as the most frequently occurring subset of the plurality of input features. Referring to the example shown in
Referring back to
For example, the machine learning engine 110 may determine that a hidden node has significant saturation at both positive and negative values across the plurality of training examples. These cases are often the result of disjoint subsets of input features (e.g., polarized saturation modes), each leading to saturation of two polarities. As described herein, having multiple saturation modes can be problematic for explainability, transparency, and interpretability of the machine learning model 120. As an example, a single hidden node may respond strongly to input features {x1, x2} or {x3, x4}, and assigning a human-interpretable meaning to such a hidden node is generally more difficult—these two saturation modes could be paths both leading to positive saturation, both leading to negative saturation, or one positive and one negative saturation.
The machine learning engine 110, consistent with implementations of the current subject matter, prevents such explainability difficulties caused by antipolarized hidden nodes. For example, the machine learning engine 110 may determine the hidden node is antipolarized based on a first proportion of the plurality of training examples meeting a positive saturation threshold and a second proportion of the plurality of training examples meeting a negative saturation threshold. In other words, the machine learning engine 110 determines the hidden node is antipolarized if it saturates at both the positive and negative extreme for more than a defined percentage threshold (e.g., an antipolarized threshold) of training examples. For antipolarized hidden nodes, the machine learning engine 110 helps resolve explainability issues by, for example, splitting the hidden node into two newly created hidden nodes and/or determining two ordered saturation lists—for a first newly created hidden node, a first ordered saturation list of the plurality of input features causing positive saturation of the hidden node (corresponding to one of the newly created nodes) and for a second newly created hidden node, a second ordered saturation list of the plurality of input features causing negative saturation of the hidden node (corresponding to the other one of the newly created nodes). In other words, based on a determination a hidden node is antipolarized, the machine learning engine 110 creates two new hidden nodes (e.g., a positive hidden node and a negative hidden node) corresponding to the antipolarized hidden node. As noted, the newly created positive and negative polarized hidden nodes each have single or multiple modes of saturation.
As an example, in some implementations, after training of the dense network is complete, the machine learning engine 110 determines the per-example saturation lists for each example per hidden node and combines those lists to form an aggregate ordered saturation list. As described herein, the ordered saturation list may be per hidden unit, as aggregated over all the training examples. In some implementations, however, the machine learning engine 110 determines a hidden node is antipolarized, such as when a first proportion or ratio (e.g., a percentage) of training examples of the plurality of training examples meets (e.g., is greater than or equal to) a positive saturation threshold, and a second proportion or ratio (e.g., a percentage) of training examples of the plurality of training examples meets (e.g., is greater than or equal to) a negative saturation threshold.
If the machine learning engine 110 determines both the first proportion of training examples meets the positive saturation threshold and the second proportion of training examples meets the negative saturation threshold, the machine learning engine 110 determines the first ordered saturation list including the plurality of input features causing positive saturation of the hidden node and the second ordered saturation list including the plurality of input features causing negative saturation of the hidden node. To do so, the machine learning engine 110 may apply the first-to-saturate principle, as described herein.
Additionally and/or alternatively, if the machine learning engine 110 determines the first proportion of training examples meets the positive saturation threshold and the second proportion of training examples fails to meet the negative saturation threshold or alternatively, the first proportion of training examples fails to meet the positive saturation threshold and the second proportion of training examples meets the negative saturation threshold, the machine learning engine 110 compares the first proportion to the second proportion. When the machine learning engine 110 determines, based on the comparison, the first proportion (associated with positive saturation) is greater than the second proportion (associated with negative saturation), the machine learning engine 110 generates only the first ordered saturation list including the plurality of input features contributing to positive saturation of the hidden node. Otherwise, when the machine learning engine 110 determines, based on the comparison, the first proportion (associated with positive saturation) is less than the second proportion (associated with negative saturation), the machine learning engine 110 generates only the second ordered saturation list including the plurality of input features contributing to negative saturation of the hidden node.
In some implementations, the machine learning engine 110 determines an average number of input features needed to saturate (e.g., positively saturate and/or negatively saturate). In other words, the machine learning engine 110 determines the subset of the plurality of input features including the minimum combination of input features that causes saturation (e.g., positive and/or negative saturation). The machine learning engine 110 determines the ordered saturation list (e.g., the first ordered saturation list and/or the second ordered saturation list) as the most frequently occurring minimum combination of input features causing saturation across the plurality of training examples. Additionally and/or alternatively, the machine learning engine 110 determines the ordered saturation list using one or more other techniques for ranking and filtering the ordered saturation lists for each training example, such as aggregating the preactivation values of the subset of input features, implementing a binary classifier (e.g., single output node) by backpropagating the weighted contribution to the output node to provide additional evidence-based weighting to the ordered saturation list of input features, and/or the like.
Referring back to
For example, the resulting sparsely trained network may have a sparse input feature-to-hidden node weight matrix, e.g., with only about 10% of the weights being non-zero. The sparsely trained network retains a large amount of the predictive power of the original dense network, while providing a high level of explainability and transparency at the per-example level. Accordingly, the determined ordered saturation list provides input feature importance in capturing the behaviors that drive the outcomes of a dense network, and thus provides a natural way to construct highly explainable sparsely trained neural networks.
For example, in training (e.g., retraining) the sparsely trained neural network, each hidden node of the sparse network may receive a different number and/or order of allowed input features compared to the dense neural network, and may be restricted to the determined subset of the plurality of input features. In some implementations, a binary sparse matrix M of the same size as W∈R{m×n} is used for masking, to ensure that only weights in the ordered saturation lists are allowed to be non-zero during training of the sparse network. During training of the sparse network by the machine learning engine 110, the forward pass uses the first-to-saturate principle, which for each example only considers those input features needed to saturate a hidden node. If an input feature is not needed to saturate a hidden node, the machine learning engine 110 assigns the input feature a zero value weight for further determinations (e.g., forward-pass activations and/or the gradient updates). In turn, the output unit activation is also found by implementing the first-to-saturate principle, and so hidden nodes of the sparse neural network not needed to saturate the output unit are similarly set to zero.
To evaluate the detection performance of the generated sparse network, an experiment was performed using a transactional fraud detection data set with several million transactions (e.g., training examples) across time for many users, and from each transaction, in this experiment, 144 input features were constructed. The performance metric used in this experiment is left-area-under-curve (LAUC), which is the area under the receiver operating characteristic curve to the left of a threshold. The LAUC metric is used in rare-event problems, such as fraud detection, disease identification, and/or the like, because the operating point needs to be at a fairly low false positive rate, to avoid impacting large numbers of legitimate customers or healthy patients, respectively. In this example, the threshold of non-fraud false positive rate<1% was used for the region where LAUC is calculated.
As shown in the table 600, the “sparse with antipolarized node splitting” network shown in the column 606 generally performed best, but note that this case generally selected more input features than the other case from the other sparse network shown in the middle column of the same row. The last row shows the performance of a logistic regression model when trained on the full set of 144 input features. For certain configurations of the sparse networks (the first four rows), these networks outperform logistic regression (on LAUC) while using many fewer input features. Also, these sparse networks also outperform logistic regression, which can be considered a baseline explainable model.
Further, as shown in the table 600, the total number of input features used in the sparse cases (e.g., the second column 604 and the third column 606) is much lower than the total number of input features (e.g., shown in the first column 602). This means that isolating modes of behavior helps isolate driving features. For example, with 20 hidden nodes, only 32 of the 144 input features from the dense network were used (e.g., a 78% reduction in input features), while still providing a large fraction (0.516 vs 0.611 LAUC, or about 84%) of the fully-connected detection performance on an in-time in-sample evaluation. Thus, by allowing antipolarized node splitting (16 nodes were added), the third column shows a 0.537 LAUC vs. 0.611 LAUC, or about 88% of detection.
Referring again to
Again referring to
For example,
Referring to
At 804, for each hidden node of the sparsely trained machine learning model 120 (e.g., the sparse network), the machine learning engine 110 may back propagate the weighted contribution of the hidden nodes to the input vector including the plurality of input features (e.g., the subset of the plurality of input features or the ordered saturation list of the plurality of input features). For example, only a subset of input features are used until saturation is met and only those input features are attributed importance corresponding to the hidden node that the input features saturate. Hidden nodes not used in the output due to first-to-saturate have no contributions. Hidden nodes that have contributions to output node activations flow from the hidden nodes backward to the relevant input features. This results in a set of weighted input features, such as a hidden node weight contribution to the output, per each relevant hidden node. Explainability could stop only at the hidden nodes, as each hidden node provides some explanation of the cause of saturation, such as for the sparsely trained machine learning model described herein. In these cases, the observed saturation models are assigned a reason and those reasons are then provided in a set ordered by highest importance. In other instances, explanations will also flow to input features, as shown in
Referring to
At 802, for each of the relevant input features, the machine learning engine 110 determines a sum of the contributions from each of the selected hidden nodes. The machine learning engine 110 ranks (e.g., sorts) the corresponding input features by the sums, which are weighted contributions traced back (e.g., back propagated) from the output at 806, and which directly explain the output value of the sparsely trained machine learning model 120 (e.g., the sparse network). In other words, these identified input features constitute the input features responsible for the relevant hidden nodes and ultimately the output unit value of the sparsely trained machine learning model 120 (e.g., the sparse network).
For example, the machine learning engine 110 may determine a relative importance of the identified input features based on the hidden node ordered saturation list, the hidden node weight contribution, and a weight corresponding to the at least one input feature. The machine learning engine 110 may determine the relative importance of the at least one input feature, for the hidden node. The machine learning engine 110 may define the input feature-level contribution to the output by at least aggregating (e.g., summing) a list of most important input features based on the relative importance of the input features. Accordingly, the machine learning engine 110 reliably and consistently provides accurate explanations for the sparsely trained machine learning model 120.
The input feature relevance can be summed over any of the hidden nodes that rely on that input feature, and then can be ranked and filtered to provide the final explanations to the user. For example, as shown in
Referring again to
At 1002, the machine learning engine 110 may train a first machine learning model (e.g., the machine learning model 120) including at least one hidden node. The at least one hidden node may include one or more hidden nodes. The first machine learning model may include a neural network or the like. The machine learning engine 110 may train the first machine learning model based at least on a plurality of training examples including a plurality of input features. Thus, the machine learning engine 110 may train the first machine learning model to generate a densely trained machine learning model based on all or a batch of input features of each of the plurality of training examples. In some implementations, each of the plurality of training examples includes an input vector containing the plurality of input features. As at least a part of training the machine learning model 120, the machine learning engine 110 may input, to the hidden node, the plurality of input features for each of the plurality of training examples in a predetermined order or a random order. The predetermined order may be based on a value of a weight assigned to each of the plurality of input features.
At 1004, the machine learning engine 110 may determine a plurality of subsets (e.g., a subset, one or more subsets, etc.) of the plurality of input features including a minimum combination of the plurality of input features first to cause saturation of the at least one hidden node. For example, the machine learning engine 110 may determine the plurality of subsets of the plurality of input features based at least on the dense network (e.g., the first machine learning model). The machine learning engine 110 may determine the plurality of subsets of the plurality of input features for each of the plurality of training examples and/or for the at least one hidden node.
The plurality of subsets of the plurality of input features causes saturation of the at least one hidden node when a weight contribution (e.g., a total weight contribution) of at least one of the plurality of subsets of the plurality of input features meets (e.g., is greater than or equal to) a predetermined saturation threshold. The predetermined saturation threshold may be 0.95, 0.90, 0.85, or the like. The predetermined saturation threshold indicates a threshold at which the hidden node is considered to be sufficiently saturated. The weight contribution may be determined by at least aggregating an absolute value of a weight assigned to each of the plurality of input features of the subset of the plurality of input features.
At 1006, the machine learning engine 110 determines a hidden node ordered saturation list including a subset of the plurality of subsets. The machine learning engine 110 may determine the hidden node ordered saturation list for the at least hidden node and based on the plurality of subsets of the plurality of input features for each of the plurality of training examples. The machine learning engine 110 may determine the hidden node ordered saturation list by at least determining a most frequently occurring subset of the plurality of subsets of the plurality of input features causing saturation of the at least one hidden node. Additionally and/or alternatively, the machine learning engine 110 may define the hidden node ordered saturation list as the most frequently occurring subset of input features of the plurality of subsets of the plurality of input features. In some implementations, determining the hidden node ordered saturation list of the at least one hidden node further includes ranking each input feature of the plurality of subsets of the plurality of input features based on at least one of a weight assigned to the input feature (e.g., assigned during the training of the first machine learning model) and a frequency of the input feature appearing within the subset of the plurality of input features across the plurality of training examples.
In some implementations, the machine learning engine 110 may determine a hidden node of the at least one hidden node is antipolarized. The machine learning engine 110 may determine the at least one hidden node is antipolarized based on a first proportion or ratio of the plurality of training examples meeting (e.g., is greater than or equal to) a positive saturation threshold and/or a second proportion of the plurality of training examples meeting (e.g., is greater than or equal to) a negative saturation threshold. This indicates that a sufficient quantity of the training examples (including subsets of the input features) positively and negatively saturate the hidden node.
In some implementations, based on determining the hidden node of the at least one hidden node is antipolarized, the machine learning engine 110 may replace the at least one antipolarized hidden node with a first newly created hidden node and a second newly created hidden node. In some implementations, such as when the hidden node of the at least one hidden node is antipolarized, the machine learning engine 110 determines the ordered saturation list of the at least one hidden node by at least determining, for the first newly created hidden node, a first hidden node ordered saturation list of the plurality of input features causing positive saturation of the at least one hidden node, and determining, for the second newly created hidden node, a second hidden node ordered saturation list of the plurality of input features causing negative saturation of the at least one hidden node. This may more accurately indicate an explanation for saturating the hidden node.
At 1008, the machine learning engine 110 generates a sparsely trained machine learning model. For example, the machine learning engine 110 may train the first machine learning model to predict an output for a training example of the plurality of examples based on at least one input feature of the subset included in the hidden node ordered saturation listcorresponding to the at least one hidden node. The at least one input feature first causes saturation of the at least one hidden node for the training example.
In some implementations, machine learning engine 110 generates the sparsely trained machine learning model by at least retraining the first machine learning model based on the ordered saturation list of the plurality of input features. The sparsely trained machine learning model may be a sparse neural network. In some implementations, the remaining input features that not included in the ordered saturation list may be assigned a zero weight or otherwise not contribute to predicting the output of the sparsely trained machine learning model.
In some implementations, the machine learning engine 110 may generate an explanation corresponding to at least one training example of the plurality of training examples. The explanation may include an input feature-level contribution to the output.
For example, the machine learning engine 110 may determine the at least one input feature of the subset first causing saturation of the at least one hidden node for the training example. Additionally and/or alternatively, the machine learning engine 110 may determine, for the at least one hidden node of the sparsely trained machine learning model, a hidden node weight contribution to the output, corresponding to the at least one input feature. This may indicate the contribution of the hidden node of the sparsely trained machine learning model.
Additionally and/or alternatively, the machine learning engine 110 may determine a relative importance of the at least one input feature of the subset based on the hidden node ordered saturation list, the hidden node weight contribution, and a weight corresponding to the at least one input feature. The machine learning engine 110 may determine the relative importance of the at least one input feature, for the at least one hidden node of the sparsely trained machine learning model.
Additionally and/or alternatively, the machine learning engine 110 may define the input feature-level contribution to the output by at least aggregating a list of most important input features based on the relative importance of the at least one input feature for each subset of the plurality of subsets.
In some implementations, when saturation of the at least one hidden node for the training example occurs prior to reaching an end of the hidden node ordered saturation list, at least one remaining input feature of the subset is ignored. Additionally and/or alternatively, when saturation of the at least one hidden node for the training example fails to occur prior to reaching an end of the hidden node ordered saturation list, the at least one input feature includes all input features of the subset.
The conventional explanation method in this example produces example-level explanations. The conventional explanation method is agnostic to the classifier type used and can provide explanations for arbitrary classifiers (e.g., neural networks, support vector machines, etc.). This baseline method has two phases, one at training time and the other during evaluation. After classifier training, the method bins the output values and input feature values, and stores those for lookup during production evaluation. At evaluation, the method uses the correlation of output values with the feature values, and selects as the explanation those input features which are most correlated with the output value at that score range based on the representation learned on the training data. However, this method does not consider the classifier's internal calculation used to arrive at the output, in contrast with the explainable machine learning system 100, which may be directly driven by the structure of the machine learning model 120 and application of the first-to-saturate method to allow only one mode of saturation when a hidden mode is saturated.
First, the internal consistency between time steps t and t+1 was compared. For example,
As shown in column 1102, the explainable machine learning system 100 has on average 67% intersection (e.g., approximately three of the top five explanations are the same) from t to t+1. As shown in column 1104, using the first-to-saturate network with the explanations from the conventional method, there is only a 45% mean interaction, so the explainable machine learning system 100 provides more internal consistency between time steps. Further, as shown in column 1106, comparing to a standard dense network with explanations from the conventional method, there is only 41% intersection, or approximately two of the top five input features. This shows that the explainable machine learning system 100 provides more internally consistent and accurate explanations across time compared with the conventional methods.
As shown in
The memory 1220 is a computer readable medium such as volatile or non-volatile that stores information within the computing system 1200. The memory 1220 can store data structures representing configuration object databases, for example. The storage device 1230 is capable of providing persistent storage for the computing system 1200. The storage device 1230 can be a floppy disk device, a hard disk device, an optical disk device, or a tape device, or other suitable persistent storage means. The input/output device 1240 provides input/output operations for the computing system 1200. In some implementations of the current subject matter, the input/output device 1240 includes a keyboard and/or pointing device. In various implementations, the input/output device 1240 includes a display unit for displaying graphical user interfaces.
According to some implementations of the current subject matter, the input/output device 1240 can provide input/output operations for a network device. For example, the input/output device 1240 can include Ethernet ports or other networking ports to communicate with one or more wired and/or wireless networks (e.g., a local area network (LAN), a wide area network (WAN), the Internet).
In some implementations of the current subject matter, the computing system 1200 can be used to execute various interactive computer software applications that can be used for organization, analysis and/or storage of data in various (e.g., tabular) format (e.g., Microsoft Excel®, and/or any other type of software). Alternatively, the computing system 1200 can be used to execute any type of software applications. These applications can be used to perform various functionalities, e.g., planning functionalities (e.g., generating, managing, editing of spreadsheet documents, word processing documents, and/or any other objects, etc.), computing functionalities, communications functionalities, etc. The applications can include various add-in functionalities or can be standalone computing products and/or functionalities. Upon activation within the applications, the functionalities can be used to generate the user interface provided via the input/output device 1240. The user interface can be generated and presented to a user by the computing system 1200 (e.g., on a computer screen monitor, etc.).
One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs, field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
These computer programs, which can also be referred to as programs, software, software applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example, as would a processor cache or other random access memory associated with one or more physical processor cores.
To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. Other possible input devices include touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive track pads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. For example, the logic flows may include different and/or additional operations than shown without departing from the scope of the present disclosure. One or more operations of the logic flows may be repeated and/or omitted without departing from the scope of the present disclosure. Other implementations may be within the scope of the following claims.