The present invention relates to a technology for visualizing the reason for decision making by artificial intelligence.
Artificial Intelligence (AI), which has been used for such purposes as prediction and classification, has made significant progress in recent years. AI is a sort of function approximator that can handle large amounts of data at high speed compared with humans. However, the content of AI models created by machine learning (e.g., deep learning neural network or deep neural network (DNN) models) is inordinately complicated and basically constitutes a black box. It is thus difficult for users to know the reason for the prediction and classification performed by AI.
In view of this, the concept of Explainable AI (XAI) has been advocated. XAI refers to a whole group of technologies not just for analyzing the cases where the processes leading to the results of prediction and classification performed by AI are explainable but also analyzing the reason for the results of prediction and classification by AI constituting a black box. Representative technologies of XAI include Local Interpretable Model-agnostic explanations (LIME) and, as its extension, SHapely Additive exPlanations (SHAP) (see S. M. Lundberg and S. Lee, “A Unified Approach to Interpreting Model Predictions, NIPS 2017”).
There are also known techniques for analyzing the relation between objective variables and explanatory variables with a view to identifying those explanatory variables that strongly affect changes in values of the objective variables. The explanatory variables in analogous relations to each other are then grouped in such a manner that their time-series data belong to the same group. From each of the groups, the time-series data of the explanatory variable representative of the group is extracted in order to analyze the data being represented by the explanatory variable (see WO2018/096683A1).
There is also known a methodology of searching for a causal relation between variables, based on, for example, data distribution, such as for a relation in which changing a variable A leads to a change in a variable B, the variable A being the cause and the variable B being the effect (i.e., a search for the causal direction from A to B and the magnitude of the A-to-B direction) (see Shohei Shimizu, et. al, “A Linear Non-Gaussian Acyclic Model for Causal Discovery,” Journal of Machine Learning Research 7 (2006) 2003-2030)).
With LIME and SHAP, if a change in a particular input data item (feature variable) inverts or significantly varies the result output from AI, then that item is estimated to have a “high degree of importance in decision making.”
However, in the existing examples mentioned above, there is a possibility that XAI may present an explanation incongruous with the findings in the field. This can detract from the reliability of the model involved. Such an eventuality can occur in cases where a machine learning model is trained with emphasis on the variables that have a high degree of correlation with the variable considered inherently important in domain knowledge and that are in spurious correlation with objective variables.
The inventors studied the cause of that eventuality and came to the view that, in a case where there are multiple variables highly relevant to the training data, a highly sophisticated learning model tends to learn by paying attention to as few variables as possible. The “highly relevant variables” are variables such that one variable permits estimation of the value of another variable, such as highly correlated variables.
In view of the above, although a given variable may be an important variable from the viewpoint of the field (e.g., time slot), the model may learn by paying attention to another highly relevant variable instead of the variable deemed inherently important (i.e., humidity is taken note of in place of time slot). The contribution of the variable considered inherently important (time slot) is thus absorbed by another highly relevant variable (humidity). This causes the inherently important variable to be underestimated while the contribution of an apparently irrelevant variable (humidity) is raised. That is, the variable considered irrelevant from the viewpoint of the field can be overestimated.
It is therefore an object of the present invention to provide an XAI technology for easily ensuring consistency with the findings in the field.
According to one preferred aspect of the present invention, there is provided an information processing system including a predictor, a contribution calculation section, and a supplemental reason generation section, the system being capable of accessing a feature variable relevance storage database that stores relevance between feature variables in case data and a case data contribution storage database that stores a contribution of a feature variable in the case data to a result of prediction by the predictor. The contribution calculation section inputs the predictor and evaluation target data as input to the predictor, calculates a contribution of each of the feature variables in the evaluation target data to output of the predictor, and outputs the calculated contributions and the acquired evaluation target data as contribution data. The supplemental reason generation section inputs the contribution data, extracts a group of data proximate to a value and a contribution of a first feature variable from the case data contribution storage database, identifies a second feature variable relevant to the first feature variable from the feature variable relevance storage database, generates supplemental reason data based on a distribution of the proximate data group within a distribution of the second feature variable by use of data in the case data contribution storage database, and outputs the generated supplemental reason data.
According to another preferred aspect of the present invention, there is provided an information processing method for generating supplemental information regarding a result of prediction output by a predictor upon receiving input of evaluation target data, the predictor having been trained by use of training data. The information processing method uses a feature variable relevance storage database that stores relevance between feature variables in the training data and a case data contribution storage database that stores a contribution of a feature variable in the training data to the result of prediction by the predictor. The method includes a first step of extracting a group of data proximate to a value and a contribution of a first feature variable from the case data contribution storage database; a second step of identifying a second feature variable relevant to the first feature variable from the feature variable relevance storage database; and a third step of generating information based on a distribution of the proximate data group within a distribution of the second feature variable by use of the data in the case data contribution storage database.
The invention thus provides an XAI technology for easily ensuring consistency with the findings in the field.
Some preferred embodiments of the present invention are described below. It is to be noted that the present invention should not be limited to the embodiments to be discussed below when interpreted. It will be understood by those skilled in the art that specific structures and configurations of the embodiments may be modified or altered within the sprit and scope of the present invention.
In the configurations of the embodiments to be described below, the parts having identical or similar functions are designated by the same reference signs across different drawings, and the explanations of such parts may be omitted where redundant.
In the case where there are multiple elements having identical or similar functions, these elements may be designated by the same reference signs furnished with different subscripts when described. However, where there is no need to distinguish between such multiple elements, the subscripts may be omitted from the description.
In this specification, the ordinal notations such as “first,” “second,” and “third” are provided to identify constituent elements and do not necessarily limit or determine the number, sequence, or details of these constituent elements. Also, the numeral for identifying a constituent element is used in each context; the numeral used in one context may or may not designate the same element in another context. Further, a constituent element identified by a given numeral may include a function or functions of another constituent element identified by another numeral.
In the drawings and elsewhere, the position, size, shape, and range of each configuration are provided to facilitate the understanding of the present invention and may not represent the position, size, shape, or range of the actual configuration. Thus, the present invention is not necessarily limited by the positions, sizes, shapes, or ranges disclosed in the drawings and elsewhere.
The publications, patents, and patent applications cited in this specification constitute part of the description of the present specification.
The constituent element represented in a singular form in this specification also includes its plural form unless otherwise specified explicitly in the context.
The embodiments below demonstrate examples in which, when XAI outputs, as a reason for model-based decision making, a contribution of an apparently irrelevant variable to the result of the decision, field personnel unfamiliar with AI technology are provided with information for supporting the interpretation and understanding of the reason of the decision making.
In one embodiment, given a feature variable A as the reason for decision making, and given the combination of its value in test data with the ratio of its contribution to model-based decision making, past case data indicative of similar trends is extracted from a database. From statistical information regarding the extracted range of data, supplemental information for interpreting the reason for the decision made is generated. As the statistical information, a range of values that can be taken by a different variable B highly relevant to the variable A is used, for example.
The computer system of the embodiment includes one or more computers 1. Although
The computer 1 includes a relevance calculation section 100, a contribution calculation section 200, a predictor 500, a supplemental reason generation section 700, and a result output section 800 as functional blocks for carrying out processing. Also included are an inter-feature-variable relevance storage section 300, a case data contribution storage section 400, and case data 600 as sets of data or databases (DB). A terminal 2 is further provided to control the functional blocks and to access the data.
A keyboard, a mouse, and/or a similar device may be used as the input device 11. A printer, an image display, and/or a similar device may be used as the output device 12. Any one of diverse central processing units (CPU) may be used as the processor 13. A magnetic disk drive or a similar device may be used as the main storage device 14. Any one of diverse semiconductor memories may be used as the sub storage device 15. The network interface 16 permits wired or wireless communication over networks in accordance with various protocols. These structures may be implemented using known technology and thus will not be discussed further.
In the present embodiment, the inter-feature-variable relevance storage section 300, the case data contribution storage section 400, and the case data 600 are placed in the sub storage device 15. The relevance calculation section 100, the contribution calculation section 200, the predictor 500, the supplemental reason generation section 700, and the result output section 800 are implemented by the processor 13 loading and executing relevant software stored in the sub storage device 15 in coordination with other hardware.
In the present embodiment, it is to be noted that the functions equivalent to those implemented by software may also be implemented by hardware such as the Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC). The above configuration may be constituted by a single computer 1. Alternatively, some or all of the input device 11, the output device 12, the processor 13, the main storage device 14, the sub storage device 15, and the network interface 16 may be configured using other computers connected over networks. For example, the inter-feature-variable relevance storage section 300, the case data contribution storage section 400, and the case data 600 may be disposed remotely and include an accessible network interface 16.
In
In
In
Although the training data itself is assumed to be used as the case data in the above processing, other suitable data statistically equivalent to the training data may be used instead.
In
The relevance calculation section 100 calculates inter-feature-variable relevance data from the case data 600 and stores the calculated data into the inter-feature-variable relevance storage section 300 as a DB (see
The contribution calculation section 200 calculates contribution data from the case data 600 and from the predictor 500, and stores the calculated data as a DB into the case data contribution storage section 400 (see
Generally, the prediction by the predictor 500 involves inputting evaluation target data 900 as explanatory variables and outputting prediction result data 1000 as objective variables.
Here, the predictor 500 is a black box that outputs the prediction result data 1000 solely as a result of prediction. It is thus difficult for the user to get hold of the reason for the decision made. As discussed above, LIME and SHAP demonstrate the contribution of each item (feature variable) to the result of prediction. This helps understanding the reason for decision making by the predictor.
For example, suppose that the predictor 500 furnished with a model for predicting the incidence of burglary outputs the prediction result data 1000 illustrated in
The reason for the above decision making is hard to understand unless supplemented with an explanation taking false correlation and confounding factors into consideration, such as “humidity is low in the daytime; people are often not at home in the daytime; thus burglary tends to occur.”
In the present embodiment, when presented with the contribution of an apparently irrelevant feature variable as the reason for decision making by the model, personnel in the field unfamiliar with AI technology are concomitantly given supplemental information for helping to interpretation and understanding of the reason for the decision made. For example, the embodiment extracts the finding “time slot is the daytime” as another factor affecting in common the two factors “humidity is low” and “burglary occurs,” and presents the additional factor to the personnel.
Explained below with reference to the conceptual diagram of
In the zero-th step, “humidity” and its contribution “+35%” are extracted as the feature variables that contribute most to the reason for decision making with the evaluation target data 900.
In the first step, from the information in the case data contribution storage section 400, pieces of peripheral data regarding the feature variables “humidity=20% and contribution=+35%” are acquired, and their indexes are extracted. In the present specification, the acquired peripheral data may be referred to as “proximate data group” for reasons of descriptive convenience. An index refers to a data ID that uniquely identifies a group of data in the training data. A peripheral plot 1401 is selected from a relation diagram involving the variables “humidity” and “contribution” that are apparently irrelevant to each other.
In the second step, from the information in the inter-feature-variable relevance storage section 300, a feature variable “time slot” highly relevant to “humidity” is selected.
In the third step, with emphasis on the values of “time slot” in the information in the case data contribution storage section 400, an evaluation is made as to whether there is a significant difference between the range where the data with the extracted index (proximate data group) is distributed (called the distribution range hereunder) on one hand and the range where all the other pieces of data are distributed on the other hand.
In the case where there is a significant difference and where the values of “time slot” in explanation target data are included in the distribution range, the distribution range is presented concomitantly as the information supplementing the initially presented reason based on “humidity.” This example thus reveals that the data indicative of a high contribution around the humidity of 20% is concentrated in the time slot of 9 to 11. From this, the contribution of “humidity” is found to also include the contribution of “time slot” being “9 to 11” to the predicted value.
A specific example of the information processing system for implementing the above processing is explained below.
In step S1501, the supplemental reason generation section 700 acquires the contribution data 1100.
In step S1502, loop processing is started on the feature variables in the evaluation target data 900.
In step S1503, the value of a target feature variable in the evaluation target data 900 and its contribution are acquired from the contribution data 1100. The loop processing may be performed on all feature variables as depicted in
In step S1504, from the case data contribution storage section 400, one or more indexes having data proximate to the group of the feature variable and contribution acquired in step S1503 are extracted. The extracted case data constitutes the proximate data group. Whether given case data is proximate or not may be determined by verifying whether the feature variable and the contribution fall within their respective predetermined ranges, for example.
In step S1505, from the inter-feature-variable relevance storage section 300, a feature variable highly relevant to the target feature variable is acquired.
In step S1506, the value of the feature variable obtained in step S1505 is acquired from the case data contribution storage section 400 for comparison between the distribution range of the proximate data group and that of the other data. Known statistical techniques may be adopted as the algorithm for making the comparison.
In step S1507, it is determined whether there is a significant difference between the distribution ranges. How much is significant as the difference may be suitably defined beforehand using known statistical techniques.
In the case where the difference is not significant, step S1508 is reached. In step S1508, a feature variable with the next highest relevance to the target feature variable is acquired from the inter-feature-variable relevance storage section 300 for use as the target feature variable. Steps S1506 and S1507 are then repeated.
In the case where the difference is significant, step S1509 is reached. In step S1509, supplemental reason data 1200 is generated from the distribution range of the proximate data group of the highly relevant feature variable.
In step S1510, the loop processing is repeated on all feature variables. In some cases, the processing may be performed only on a portion of the feature variables as discussed above.
In step S1511, the generated supplemental reason data 1200 is output to the result output section 800.
In response to a request from the terminal 2, for example, the result output section 800 generates output that causes the supplemental reason data 1200 to be transmitted to the terminal 2 for display on a display device thereof. In this embodiment, for example, the terminal 2 instructs the computer 1 to transmit the output to the terminal 2. What follows is an explanation of a graphical user interface (GUI) that can be used for the above purpose. The terminal 2 may be an ordinary personal computer or mobile terminal, with its display implemented by use of an ordinary browser, for example.
The embodiment described above thus provides an XAI technology by which the value of a first variable largely contributing to a prediction result and its contribution are estimated; a group of data proximate to the estimated value is extracted from training data; a second variable different from (but relevant to) the first variable is identified; and a comparison is made between the proximate data group and the group of other data that is of a value of the second variable to easily ensure consistency with the findings in the field.
In the processing flow of the first embodiment in
In an alternative method, a graph such as one depicted on the right side of
The first embodiment depicted in
When the supplemental reason data is generated not exhaustively but on demand, the cost of the processing can be reduced.
Explained below as another example of reducing the processing cost is one in which the feature variable for which the supplemental reason data is generated is automatically selected. In the loop processing of the first embodiment in
In that case, a specific feature variable may be selected as the target feature variable on the reason of the strength of its causal relation with the objective variable as evaluated by known techniques of search for causal relation. This helps reducing the processing cost with regard to the variables with no need for supplements.
For example, in order to find a noteworthy variable such as humidity, the strength of direct causal relation with the objective variable is measured by causal inference. The loop processing in
What follows is an explanation of another example of the method of searching for the proximate data group in a peculiar distribution. In reference to
It has been explained that the relevance calculation section 100 of the first embodiment calculates the correlation coefficient between feature variables and stores the calculated coefficients into the inter-feature-variable relevance storage section 300 in the form of a DB. However, given that the correlation coefficients are useful solely for evaluating the linear strength of relevance, for example, the relevance calculation section 100 may calculate a regression formula, evaluate the degrees of fit (level of error) with the regression formula, as the degree of relevance, and store the evaluated degrees of relevance into the inter-feature-variable relevance storage section 300.
Alternatively, the Maximum Information Coefficient (MIC) supporting nonlinear relevance or the causal strength explained in Shohei Shimizu, et. al, “A Linear Non-Gaussian Acyclic Model for Causal Discovery,” Journal of Machine Learning Research 7 (2006) 2003-2030) may be adopted in representing the relevance between the variables.
Explained above in connection with the first embodiment is the example in which the supplemental reason data is generated and displayed for a single target feature variable (e.g., humidity). Alternatively, the processing may be expanded in such a manner that, in search of supplemental information, the information may be generated using not only one but also multiple variables.
In the example of “humidity” for the first embodiment, the processing in
Likewise, if the graph of relation on the right side in
In this manner, more detailed studies are made possible by generating the supplemental reason data with use of the relations between multiple feature variables.
According to the above-described embodiments, given the contribution of a feature variable presented as the reason for decision making, the values of explanation target data and the contributions of their variables are matched against a group of contribution vectors with respect to previously stored training data. From the characteristics of a range of values that can be taken by another highly relevant feature variable on the reason of the result of the matching, supplemental information is generated regarding the reason for the decision made with an apparently irrelevant feature variable.
According to S. M. Lundberg and S. Lee, “A Unified Approach to Interpreting Model Predictions, NIPS 2017”, the variables strongly correlated to each other are grouped by similarity. From the grouped variables, representative variables are extracted for factor analysis. This resolves the problem of multiple similar feature variables being output in the result of contribution analysis. This method, however, cannot be used when applied to XAI unless the model itself is altered. Further, useful feature variables may be neglected for the ease of understanding the reason, which can worsen the accuracy of the model.
According to the configurations of the above-described embodiments, it is possible to find a feature variable that should be considered inherently important but of which the ratio of direct contribution is underestimated and to present the found feature variable as the supplemental information regarding the feature variable overestimated in the result of the decision made by the prediction model. As a result, on the screen presenting the contribution of each feature variable to model-based decision making, it is possible to display the characteristics of another strongly relevant feature variable as the supplemental information regarding the contribution of a specific feature variable.
Number | Date | Country | Kind |
---|---|---|---|
2020-180026 | Oct 2020 | JP | national |