This disclosure relates generally to the field of explainable artificial intelligence, and, more particularly, to a method and system to explain a cause of a mistake in a machine learning model using a diagnostics artificial intelligence model.
A machine learning model may be a mathematical representation and/or an algorithm that is trained on data to make predictions (or decisions) without being explicitly programmed. The machine learning model may be designed to learn patterns and relationships from input data, which could be numerical values, text, images, and/or any other type of structured or unstructured data. During the training process, the machine meaning model may be presented with a set of labeled examples, known as the training data, and may adjust its internal parameters to find patterns and correlations in the data.
Unfortunately, the machine learning model may not always be accurate. When the machine learning model makes mistakes, negative consequences, such as making wrong decisions in business processes and/or choosing sub-optimal solutions to real-life problems can arise. For example, a medical diagnosis model making false negatives (failing to identify a disease) or false positives (incorrectly identifying a disease) can have serious implications for patient health. In addition, mistakes made by the machine learning model can lead to financial losses, particularly in industries where automated decision-making is prevalent. For instance, an algorithmic trading model making incorrect predictions can result in significant financial losses for investors or institutions.
Moreover, if the machine learning model is trained on biased data or exhibits biases in their predictions, it can lead to unfair treatment or discrimination against certain individuals or groups. This can perpetuate existing social biases and exacerbate societal inequalities. In addition, the machine learning model can inadvertently reveal sensitive or personal information through their mistakes. For instance, a recommender system recommending inappropriate or sensitive content to users can compromise privacy and cause harm.
Repeated mistakes by the machine learning model can erode trust and confidence in the system. Organizers and sponsors of online data science competitions may find it difficult to detect all errors and gain insight into the performance of solutions submitted by contestants. Therefore, they may have difficulties in suggesting improvements to the models and facilitate their deployment in production-ready environments. Users may lose faith in the technology, leading to a reluctance to adopt or rely on it. This can hinder the widespread acceptance and utilization of machine learning solutions. In addition, mistakes made by the machine learning model can raise legal and ethical concerns. If mistakes result in harm to individuals or violate regulations, there can be legal consequences for the organization responsible for deploying the model. In the end, mistakes made by the machine learning model can have broader societal impacts. For example, an autonomous vehicle's incorrect decision-making could result in accidents, injuries, or loss of life.
Disclosed are a method and/or a system to explain a cause of a mistake in a machine learning model using a diagnostics artificial intelligence model.
In one aspect, a method includes forming a diagnostic model using machine learning by ingesting trusted operational data. The trusted operational data may be a supply chain data, a sales data, a purchase data, a fulfillment data, a sensory capture data, an observation data, an empirical data, a historical data, an industrial data, and/or a financial data. The method may determine that a predictive data produced by an evaluated model for a period of time does not match a known trusted data for the period of time. The method analyzes whether the predictive data produced by the evaluated model for the period of time may be an optimal result of the evaluated model. A determination may be made that a mistake occurred when the predictive data is not the optimal result of the evaluated model. A cause of the mistake may be explained using the diagnostic model. Lastly, fine-tuning the diagnostic model may be performed based on learnings from a past predictive data for different periods of time when compared with past known trusted data for the different periods of time using a processor and a memory.
The method may compare the predictive data of the evaluated model with a simulated prediction of the diagnostic model when the mistake is observed. The diagnostic model may explain the cause of the mistake. Explaining the cause of the mistake may be applied to diagnose a black-box model without requiring any knowledge about technical specifications of a specific machine learning algorithm of the black-box model and without having direct access to it. The diagnostic model may be applied on a diagnosed dataset and an output of the evaluated model comprising at least one of a prediction and a classification, without using predictions on a training set.
The method may generate a most probable explanation of the cause of the mistake as a natural language text. The method may further determine the cause of the mistake may be because a labeling of a historical training data set on which the evaluated model was formed was erroneous. The method may further determine the cause of the mistake may be because an external condition changed which caused the predictive data produced by the evaluated model to no longer conform to predictive trends. The method may further determine the cause of the mistake may be because of an error in an input data to the evaluated model which may be caused by any one of an inaccurate sensor reading, a human error, and an anomaly. The method may further determine the cause of the mistake may be because an input data may be a novel scenario from previous input scenarios, and the evaluated model may be unprepared in the novel scenario.
The method may further determine the cause of the mistake may be because of concept drift in a relationship between an input data and the predictive data caused because a property of a target variable has changed over time. The method may further determine the cause of the mistake may be because the evaluated model may be underfitted because while a similar input data to an input data happened in the past, the evaluated model was not sufficiently fitted to the input data. The method may further determine the cause of the mistake may be because the evaluated model may be overfitted because the diagnosed machine learning model may be unable to generalize away from a narrow band of deep optimizations to extrapolate to a general case.
The method may further determine the cause of the mistake may be because the evaluated model may be based on an anomaly meaning that normally the evaluated model would be correct and that a human decision maker would most likely make the same mistake in this special case because of a unique condition of an input data now received and this mistake may be caused by a non-determinism of a problem. The method may further determine the cause of the mistake may be because an input data to the evaluated model may be based on an intentional attack caused by malignant actors attempting to undermine an integrity of the evaluated model and this intentional attack may be an intentional modification of the input data to the model.
The method may further determine the cause of the mistake and a corresponding fix recommendation, which suggests how to improve performance of the evaluated model. The method may then generate a visual report emphasising a ranked set of important findings based on an order of importance, which may comprise relevant statistics related to the evaluated model, the quality of its approximator, and the distributions of a diagnostic attribute. Furthermore, the visual report may include an interactive plot to help to explore diagnoses for individual instances and analyze their statistics for specific groups. The visual report may provide insights on the importance of original attributes, approximated by significance of attributes in the diagnostic model.
The method may further generate reports containing relevant statistics related to the diagnostic model, the quality of its approximator, and the distributions of diagnostic attributes. Furthermore, the diagnostic model may be a system that may be responsible for making a diagnosis of causes of errors made by the evaluated model that may be being diagnosed and whose prediction is already a concrete cause of error, and the approximator may be encapsulated within the diagnostic model comprising of an ensemble of rough-set models for determining approximations and neighborhoods.
The method may further generate interactive plots to explore diagnoses for individual instances and analyze their statistics for specific groups. The method may further determine the importance of original attributes, approximated by the significance of attributes in the diagnostic model, and the diagnostic model may be a surrogate model. The method may further generate a set of historical neighborhoods comprising a set of historical instances that were processed in a similar way to the current instance on which mistakes of the diagnosed machine learning model may be observable. The method may further form a set of diagnostic attributes which describe a current instance through analysis of contents of the historical neighborhoods. The method may form the diagnostic model as a decision model which obtains vectors of the set of diagnostic attributes as an input data and deliver a most probable cause of the mistake as an output data of the diagnostic model. The method may further form the diagnostic model based on an analysis of mistakes registered in the set of neighborhoods.
The method may further form the diagnostic model based on the trusted operational data and the past predictive data for different periods of time when compared with past known trusted data for the different periods of time using rough set-based models in which intelligent systems may be characterized by insufficient and incomplete information. The method may further compute accurate approximations of past predictive data with rough set-based surrogate models and a heuristic optimization method.
The method may further base a surrogate machine learning model on the trusted operational data produced by the evaluated model, automatically apply a method of discretization, apply an algorithm to determine high-quality approximations, obtain trusted neighborhoods of each current instance by looking for trusted instances that were processed in a similar way by the surrogate machine learning model, and train the surrogate machine learning model as a model approximator.
The method may further obtain a set of neighborhoods using the model approximator comprising an ensemble of approximate reducts known from the theory of rough sets and determine a specific neighborhood of a diagnosed instance through a decision process of the model approximator, wherein the neighborhood for a diagnosed instance relative to a single reduct may be a subset of instances from the historical training dataset which belong to the same indiscernibility class. The final neighborhood may be the sum of neighborhoods computed for all reducts in the ensemble. The instances from neighborhoods may have weights that express how representative they may be for a given neighborhood.
The method may further approximate how many reducts in the ensemble of approximate reducts may be able to process in a same way a given pair of instances, count how many reducts the given pair of instances of the ensemble may be processed in the same way, and determine a similarity measure between instances through the counting of how many reducts in the ensemble the given pair of instances may be processed in the same way. The method may further analyze the specific neighborhood to determine characteristics comprising at least one of consistency of ground truth labels, consistency of original model predictions, consistency of approximations, neighborhood size, and uncertainty of predictions. The method may then determine a set of characteristics through analyzing the specific neighborhood to determine consistency of labels comprising at least one of ground truth labels, original model predictions, approximations, size, and uncertainty of predictions.
The method may further specify diagnostic attributes that can be derived from contents of computed neighborhoods through analyzing the specific neighborhood to determine characteristics and through determination of the set of characteristics. The method may further provide meaningful information on model operations by including the set of characteristics as diagnostic attributes that constitute an input in diagnostic rules and may link values of the diagnostic attributes to a set of possible causes of mistakes. Furthermore, when a neighborhood of a particular current instance is in at least one of a null and a minimal condition, then a probable cause of the mistake of the evaluated model on the particular current instance may be that this is a totally new dissimilar case to historic cases and the evaluated model was unprepared for such cases.
In another aspect, a system includes a processing system comparing a bank of computation processors and associated memory, a network, and a diagnostic module coupled with the processing system through the network. The diagnostic module further comprises an ingestion module to form a diagnostic model using machine learning by ingesting trusted operational data. The trusted operational data may be any one of a supply chain data, a sales data, a purchase data, a fulfillment data, a sensory capture data, an observation data, an empirical data, a historical data, an industrial data, or a financial data. The system further contains a matching module to determine that a predictive data produced by an evaluated model for a period of time does not match a known trusted data for the period of time, an optimization module to analyze whether the predictive data produced by the evaluated model for the period of time may be an optimal result of the evaluated model, a mistake-identification module to determine that a mistake occurred when the predictive data may not be the optimal result of the evaluated model, an explanation module to explain a cause of the mistake using the diagnostic model, and a tuning module to fine-tune the diagnostic model based on learnings from a past predictive data for different periods of time when compared with past known trusted data for the different periods of time using the processing system.
The system may further comprise an explanation module to compare the predictive data of the evaluated model with a simulated prediction of the diagnostic model when the mistake may be observed to explain the cause of the mistake using the diagnostic model. Furthermore, explaining the cause of the mistake may be applied to diagnose a black-box model without requiring any knowledge about technical specifications of a specific machine learning algorithm of the black-box model and without having direct access to it, and the diagnostic model may be applied on a diagnosed dataset and an output of the evaluated model comprising at least one of a prediction and a classification, without using predictions on a training set.
The system may further comprise a natural language module to generate a most probable explanation of the cause of the mistake as a natural language text. The system may further comprise a label-analysis module to determine the cause of the mistake may be because a labeling of a historical training data set on which the evaluated model was formed was erroneous. The system may further comprise an external-change module to determine the cause of the mistake may be because an external condition changed that caused the predictive data produced by the evaluated model to no longer conform to predictive trends.
In yet another aspect, a method includes determining that a predictive data produced by an evaluated model for a period of time does not match a known trusted data for the period of time, analyzing whether the predictive data produced by the evaluated model for the period of time may be an optimal result of the evaluated model, determining that a mistake occurred when the predictive data may not be the optimal result of the evaluated model, and explaining a cause of the mistake using a diagnostic model. Furthermore, the method compares the predictive data of the evaluated model with a simulated prediction of the diagnostic model when the mistake is observed to explain the cause of the mistake using the diagnostic model.
The methods and systems disclosed herein may be implemented in any means for achieving various aspects, and may be executed in various forms, when executed by a machine, cause the machine to perform any of the operations disclosed herein. Other features will be apparent from the accompanying drawings and from the detailed description that follows.
The embodiments of this invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.
Example embodiments, as described below, may be used to provide a method and/or a system to explain a cause of a mistake in a machine learning model using a diagnostics artificial intelligence model.
The diagnostic model 102 may be taught through machine learning algorithm to recognize patterns, make predictions, or perform tasks related to identifying causes of future mistakes based on the provided data.
Embodiments of
Image recognition such as facial recognition, object detection, and self-driving cars may deploy the diagnostic model 102 of
Natural language processing (NLP) such as the ability to understand and process human language in use cases such as spam filtering, machine translation, and/or question answering may deploy the diagnostic model 102 of
Speech recognition such as voice assistants, dictation software, and/or call centers may deploy the diagnostic model 102 of
Fraud detection to detect fraud machine learning models that identify patterns that humans might not be able to perceive may deploy the diagnostic model 102 of
Recommendation systems to recommend products, services, or content to users may deploy the diagnostic model 102 of
Predictive analytics to predict future events may use the diagnostic model 102 of
The evaluated model 302 may create the predictive data 304A (e.g. a simulated data, which is the output of the evaluated model 302) in response to the input data 300. Predictive data 304 may be produced from different input data 300 such as stock price prediction data, inventory prediction data, risk assessment data, or customer churn data, according to one embodiment. When the predictive data 304 is not the same as the known trusted data 306 (e.g., a new sales actuals data that later comes in), then the embodiments described in
A cause 312 of the mistake 310 may be explained using the diagnostic model 102. The method determines that a predictive data 304 produced by an evaluated model 302 for a period of time 350 does not match a known trusted data 306 for the period of time 350. The method analyzes whether the predictive data 304 produced by the evaluated model 302 for the period of time 350 is an optimal result of the evaluated model 302. A determination may be made that a mistake 310 occurred when the predictive data 304 is not the optimal result of the evaluated model 302. The method may compare the predictive data 304 of the evaluated model 302 with a predictive data 304A (e.g. a simulated data, which is the output of the evaluated model 302) of the diagnostic model 102 when the mistake 310 is observed to explain the cause 312 of the mistake 310 using the diagnostic model 102.
The method may further determine the cause 312 of the mistake 310 may be because of concept drift in a relationship between an input data 300 and the predictive data 304 caused because a property of a target variable has changed over time. The method may further determine the cause 312 of the mistake 310 may be because the evaluated model 302 is underfitted because while a similar input data 300 to an input data 300 happened in the past, the evaluated model 302 was not sufficiently fitted to the input data 300. The method may further determine the cause 312 of the mistake 310 may be because the evaluated model 302 is overfitted because the diagnosed evaluated model 302 may be unable to generalize away from a narrow band of deep optimizations to extrapolate to a general case.
The method may further determine the cause 312 of the mistake 310 may be because the evaluated model 302 is based on an anomaly meaning that normally the evaluated model 302 would be correct and that a human decision maker would most likely make the same mistake in this special case because of a unique condition of an input data 300 now received and this mistake may be caused by a non-determinism of a problem. The method may further determine the cause 312 of the mistake 310 may be because an input data 300 to the evaluated model 302 is based on an intentional attack caused by malignant actors attempting to undermine an integrity of the evaluated model 302 and this intentional attack may be an intentional modification of the input data 300 to the model.
The method may further determine the cause 312 of the mistake 310 and a corresponding fix recommendation, which suggests how to improve performance of the evaluated model 302. The method may then generate a visual report 314 emphasizing a ranked set of important findings based on an order of importance, which may comprise relevant statistics related to the evaluated model 302, the quality of its approximator, and the distributions of a diagnostic attribute. Furthermore, the visual report 314 may include an interactive plot to help to explore diagnoses for individual instances and analyze their statistics for specific groups. The visual report 314 may provide insights on the importance of original attributes, approximated by significance of attributes in the diagnostic model 102.
The method may further base a surrogate evaluated model on the trusted operational data 104 produced by the evaluated model 302, automatically apply a method of discretization, apply an algorithm to determine high-quality approximations, obtain trusted neighborhoods of each current instance by looking for trusted instances that were processed in a similar way by the surrogate evaluated model, and train the surrogate model as the approximator of the evaluated model.
In an alternate embodiment, an additional model may be used to train the diagnostic module 102, called the “global diagnostic model”. The global diagnostic model may be pre-trained on multiple historical prediction problems and related diagnostic data (values of diagnostic attributes associated with those historic prediction models diagnosed in the past). This global diagnostic model may classify the new diagnosed model (as a whole) into one of three categories (underfit, overfit, regular fit). This classification may later used by our diagnostic algorithm to assign diagnoses to particular samples (errors made by the diagnosed model).
The method may further specify diagnostic attributes that can be derived from contents of computed neighborhoods through analyzing the specific neighborhood to determine characteristics and through determination of the set of characteristics. The method may further provide meaningful information on model operations by including the set of characteristics as diagnostic attributes that constitute an input in diagnostic rules and may link values of the diagnostic attributes to a set of possible causes of mistakes. Furthermore, when a neighborhood of a particular current instance is in at least one of a null and a minimal condition, then a probable cause of the mistake 310 of the evaluated model 302 on the particular current instance may be that this is a totally new dissimilar case to historic cases and the evaluated model 302 was unprepared for such cases.
The method may further form the diagnostic model 102 based on the trusted operational data 104 and the past predictive data 402 for different periods of time 450 when compared with past known trusted data 406 for the different periods of time 450 using rough set-based models in which intelligent systems may be characterized by insufficient and incomplete information. The method may further compute accurate approximations of past predictive data 402 with rough set-based surrogate models and a heuristic optimization method.
The method may generate a most probable explanation of the cause 312 of the mistake 310 as a natural language text. The method may further determine the cause 312 of the mistake 310 may be because a labeling of a historical training data set 500 on which the evaluated model 302 was formed was erroneous. The method may further determine the cause 312 of the mistake 310 may be because an external condition changed which caused the predictive data 304 produced by the evaluated model 302 to no longer conform to predictive trends. The method may further determine the cause 312 of the mistake 310 may be because of an error in an input data 300 to the evaluated model 302 which may be caused by any one of an inaccurate sensor reading, a human error, and an anomaly. The method may further determine the cause 312 of the mistake 310 may be because an input data 300 is a novel scenario from previous input scenarios, and the evaluated model 302 may be unprepared in the novel scenario.
The training data set 500 may be a labeled set of data examples that may be used to train the evaluated model 302 and may comprise input data (features) and corresponding output labels or target variables, according to one embodiment. The black-box model 502 may be deep neural networks, ensemble methods like random forests or gradient boosting, support vector machines (SVMs), or other sophisticated algorithms that may be designed to capture intricate patterns and relationships in the data that may be challenging to interpret due to their complex structures and high dimensionality, according to one embodiment. The diagnosed dataset 506 may be a dataset that is labeled or annotated with diagnoses or labels related to a particular domain or problem, according to one embodiment.
The method may further generate reports containing relevant statistics related to the diagnostic model, the quality of its approximator, and the distributions of diagnostic attributes. Furthermore, the diagnostic model 102 may be a system that is responsible for making a diagnosis of causes of errors made by the evaluated model 302 that is being diagnosed and whose prediction 510 is already a concrete cause of error, and the approximator may be encapsulated within the diagnostic model 102 comprising of an ensemble of rough-set models for determining approximations and neighborhoods.
The method may further generate interactive plots to explore diagnoses for individual instances and analyze their statistics for specific groups. The method may further determine the importance of original attributes, approximated by the significance of attributes in the diagnostic model 102, and the diagnostic model 102 may be a surrogate model. The method may further generate a set of historical neighborhoods comprising a set of historical instances that were processed in a similar way to the current instance on which mistakes of the diagnosed evaluated model 302 may be observable. The method may further form a set of diagnostic attributes which describe a current instance through analysis of contents of the historical neighborhoods. The method may form the diagnostic model 102 as a decision model which obtains vectors of the set of diagnostic attributes as an input data 300 and deliver a most probable cause of the mistake 310 as an output 504 data of the diagnostic model 102. The method may further form the diagnostic model 102 based on an analysis of mistakes registered in the set of neighborhoods.
The method may further obtain a set of neighborhoods using the model approximator comprising an ensemble of approximate reducts known from the theory of rough sets and determine a specific neighborhood of a diagnosed instance through a decision process of the model approximator, wherein the neighborhood for a diagnosed instance relative to a single reduct is a subset of instances from the historical training dataset which belong to the same indiscernibility class. The final neighborhood may be the sum of neighborhoods computed for all reducts in the ensemble. The instances from neighborhoods may have weights that express how representative they may be for a given neighborhood.
The method may further approximate how many reducts in the ensemble of approximate reducts may be able to process in a same way a given pair of instances, count how many reducts the given pair of instances of the ensemble may be processed in the same way, and determine a similarity measure between instances through the counting of how many reducts in the ensemble the given pair of instances may be processed in the same way. The method may further analyze the specific neighborhood to determine characteristics comprising at least one of consistency of ground truth labels, consistency of original model predictions (e.g., prediction 510), consistency of approximations, neighborhood size, and uncertainty of predictions (e.g., prediction 510). The method may then determine a set of characteristics through analyzing the specific neighborhood to determine consistency of labels comprising at least one of ground truth labels, original model predictions (e.g., prediction 510), approximations, size, and uncertainty of predictions (e.g., prediction 510).
In operation 1110, the diagnostic model 102 determines whether the cause 312 of the mistake 310 was a concept drift in a relationship between an input data and the predictive data caused because a property of a target variable has changed over time. In operation 1112, the diagnostic model 102 determines whether the cause 312 of the mistake 310 was because the evaluated model may be underfitted because while a similar input data to an input data happened in the past, the evaluated model was not sufficiently fitted to the input data. In operation 1114, the diagnostic model 102 determines whether the cause 312 of the mistake 310 was because the evaluated model may be overfitted because the diagnosed machine learning model is unable to generalize away from a narrow band of deep optimizations to extrapolate to a general case.
In operation 1116, the diagnostic model 102 determines whether the cause 312 of the mistake 310 was because the evaluated model may be based on an anomaly meaning that normally the evaluated model would be correct and that a human decision maker would most likely make the same mistake in this special case because of a unique condition of an input data now received. In operation 1118, the diagnostic model 102 determines whether the cause 312 of the mistake 310 was a non-determinism of a problem. In operation 1120, the diagnostic model 102 determines whether the cause 312 of the mistake 310 may be that the input data to the evaluated model is based on an intentional attack caused by malignant actors attempting to undermine an integrity of the evaluated model.
The first, referred to in 1302, may be the surrogate model that approximates predictions of the evaluated model 302. The second, referred to in 1308, is a pre-trained model that takes as an input the values of diagnostic attributes and outputs something that may be called a global diagnosis. This diagnosis may then used to give the most probable causes of errors for individual historical data samples, according to one embodiment.
In operation 1306, the diagnostic model 102 forms a set of diagnostic attributes which describe a current instance through analysis of contents of the historical neighborhoods. In operation 1308, the diagnostic model 102 uses a pre-trained classification model as a decision model which obtains vectors of the set of diagnostic attributes as an input data. In operation 1310, the diagnostic model 102 delivers a most probable cause 312 of the mistake 310 as an output data of the diagnostic model 102.
In operation 1512, the diagnostic model 102 determines a specific neighborhood of a diagnosed instance through a decision process of the model approximator. In operation 1514, the diagnostic model 102 approximates how many reducts in the ensemble of approximate reducts may be able to process in a same way a given pair of instances. In operation 1516, the diagnostic model 102 counts how many reducts the given pair of instances of the ensemble may be processed in the same way.
In operation 1606, the diagnostic model 102 determines a set of characteristics through analyzing the specific neighborhood to determine consistency of labels comprising at least one of ground truth labels, original model predictions, approximations, size, and uncertainty of predictions. In operation 1608, the diagnostic model 102 specifies diagnostic attributes that can be derived from contents of computed neighborhoods through analyzing the specific neighborhood to determine characteristics and through determination of the set of characteristics. In operation 1610, the diagnostic model 102 provides meaningful information on model operations by including the set of characteristics as diagnostic attributes that constitute an input in diagnostic rules. In operation 1612, the diagnostic model 102 links the values of the diagnostic attributes to a set of possible causes of mistakes.
The various embodiments of
As described in
Oppositely, the rankings produced by the embodiments of
The embodiments of
The construction of our surrogate model in the embodiments of
In the embodiments of
If the desired quality cannot be achieved, according to the embodiments of
According to the embodiments of
According to the embodiments of
In the next step, according to the embodiments of
If the model was diagnosed as under-fitted, and for a given erroneously classified instance model's uncertainty was high, and its neighborhood was not small, then it may be that the mistake was caused by under-fitting. If the diagnosed instance has a very small neighborhood, i.e., is dissimilar to training instances, then it may be that the instance was an outlier.
In the last step, according to the embodiments of
According to the embodiments of
For each of the selected data sets, according to the embodiments of
Near optimal fit—the performance of the model may be close to the best possible prediction performance reported in the literature for a given data set. Under-fitted model—the model may be over-generalized or may not be sufficiently fitted to the available training data. This may manifest in relatively low prediction quality on both training and validation data. Over-fitted model—the model may be closely fitted to the training data, however, its generalization quality (measured on a validation set) may be poor. A border case model—this label may be used if an expert could not decide which of the three previous labels should be assigned.
Predictions of the fitted models may be analyzed using the methodology described in herein. In particular, according to the embodiments of
Moreover, for each of these 440 data sets, according to the embodiments of
The first experiment may be aimed at the verification of the ability of the embodiments of
To test it, according to the embodiments of
Before the experiment, the diagnostic attributes may be linearly scaled to the [0, 1] interval. The nearest neighbor algorithm may use the Euclidean distance, and due to the imbalanced distribution of mistakes, its performance may be measured using three different metrics, i.e., standard accuracy, balanced accuracy, and Cohen's Kappa coefficient. The results may show that the global diagnostic of prediction models using our diagnostic attributes may be feasible. All considered measures may indicate that random forest may be able to distinguish between the three considered model classes significantly better than random or naive (majority) predictions. Since in the discussed experiments, according to the embodiments of
As in the previous experiment, the comparison of the performance between different scenarios may show that the global diagnostic of a prediction model may be most difficult when it is done on a completely new data set (the first scenario). For all metrics, the results for scenario 1 may be significantly lower than for scenario 2, i.e., the p-value of a paired, one-sided Wilcoxon rank test may be ≤0.01 for the accuracy and Cohen's Kappa measures, and it may be ≤0.02 for the balanced accuracy measure. The differences between scenarios 2 and 3 may be even greater.
According to the embodiments of
In another embodiment, a system includes a processing system comparing a bank of computation processors and associated memory, a network, and a diagnostic module coupled with the processing system through the network. The diagnostic module further comprises an ingestion module to form a diagnostic model 102 using machine learning by ingesting trusted operational data 104. The trusted operational data 104 may be any one of a supply chain data, a sales data, a purchase data, a fulfillment data, a sensory capture data, an observation data, an empirical data, a historical data, an industrial data, or a financial data. The system further contains a matching module to determine that a predictive data 304 produced by an evaluated model 302 for a period of time 350 does not match a known trusted data 306 for the period of time 350, an optimization module to analyze whether the predictive data 304 produced by the evaluated model 302 for the period of time 350 may be an optimal result of the evaluated model 302, a mistake-identification module to determine that a mistake occurred when the predictive data 304 may be not the optimal result of the evaluated model 302, an explanation module to explain a cause 312 of the mistake 310 using the diagnostic model 102, and a tuning module to fine-tune the diagnostic model 102 based on learnings from a past predictive data 402 for different periods of time 450 when compared with past known trusted data 406 for the different periods of time 450 using the processing system.
The system may further comprise an explanation module to compare the predictive data 304 of the evaluated model 302 with a predictive data 304A (e.g. a simulated data, which may be the output of the evaluated model 302) of the diagnostic model 102 when the mistake 310 may be observed to explain the cause 312 of the mistake 310 using the diagnostic model 102. Furthermore, explaining the cause 312 of the mistake 310 may be applied to diagnose a black-box model 502 without requiring any knowledge about technical specifications of a specific machine learning algorithm 508 of the black-box model 502 and without having direct access to it, and the diagnostic model 102 may be applied on a diagnosed dataset 506 and an output 504 of the evaluated model 302 comprising at least one of a prediction 510 and a classification, without using predictions (e.g., prediction 510) on a training data set 500.
The system may further comprise a natural language module to generate a most probable explanation of the cause 312 of the mistake 310 as a natural language text. The system may further comprise a label-analysis module to determine the cause 312 of the mistake 310 may be because a labeling of a historical training data set 500 on which the evaluated model 302 was formed was erroneous. The system may further comprise an external-change module to determine the cause 312 of the mistake 310 may be because an external condition changed that caused the predictive data 304 produced by the evaluated model 302 to no longer conform to predictive trends.
In yet another embodiment, a method includes determining that a predictive data 304 produced by an evaluated model 302 for a period of time 350 does not match a known trusted data 306 for the period of time 350, analyzing whether the predictive data 304 produced by the evaluated model 302 for the period of time 350 may be an optimal result of the evaluated model 302, determining that a mistake occurred when the predictive data 304 may not be the optimal result of the evaluated model 302, and explaining a cause 312 of the mistake 310 using a diagnostic model 102. Furthermore, the method compares the predictive data 304 of the evaluated model 302 with a predictive data 304A (e.g. a simulated data, which may be the output of the evaluated model 302) of the diagnostic model 102 when the mistake 310 may be observed to explain the cause 312 of the mistake 310 using the diagnostic model 102.
Although the present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments.
A number of embodiments have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the claimed invention. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.
It may be appreciated that the various systems, methods, and apparatus disclosed herein may be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and/or may be performed in any order.
The structures and modules in the figures may be shown as distinct and communicating with only a few specific structures and not others. The structures may be merged with each other, may perform overlapping functions, and may communicate with other structures not shown to be connected in the figures. Accordingly, the specification and/or drawings may be regarded in an illustrative rather than a restrictive sense.