This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2021-208217, filed on Dec. 22, 2021, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an information processing technology.
With the popularization of the artificial intelligence (AI) technology, there has been an increase in the demand for a machine learning model that is capable of providing explanation, because of the fact that the determination of a black-box-type machine learning model cannot be accepted without questioning and because of the wish that the premise for a human-interpretable determination is presented. Hence, a white box model such as a rule list, a decision tree, or a linear model is used in advance. However, merely using a white-box-type machine learning model does not ensure that the machine learning model is human-interpretable or is capable of providing explanation.
Hence, in recent years, an interactive approach has been implemented by which the generation of a machine learning model and the feedback to a person is carried out repeatedly, so that an accurate machine learning model is generated that is acceptable to a person. In the interactive approach, for example, a feature is selected from among the features that are believed to be an important feature in the machine learning model; the user is asked whether the selected feature is truly important; and the questioning is repeated for each such feature until the user is satisfied. Meanwhile, the features in the machine learning model are also called explanatory variables or are simply called variables.
Subsequently, according to the feedback, the parameters used at the time of optimizing the machine learning model are changed, and thus the machine learning model is updated. As a result of repeatedly performing such operations, an accurate machine learning model is generated that is acceptable to a person.
A machine learning model has a large number of features. Thus, by taking into account the possibility that the user discontinues the interaction midway, it is desirable that the maximum possible number of features answered as important features are obtained with as few questions as possible.
In that regard, some methods are available, such as a method in which the questioning is performed in order of the features having a large value calculated using the available statistic such as correlation, mutual information content, and chi-square value; or a method in which the impact on the predicted distribution with respect to each feature of the machine learning model is measured, and the features having a relatively larger impact are selected for the questioning purpose.
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein an information processing program that causes a computer to execute a process including, deciding on one or more variables, from among a plurality of variables, to be a target for a question regarding degree of importance, based on order of priority of the variables and estimated amount, the order of priority of the variables being determined based on a plurality of patterns indicating ranking of the plurality of variables, the estimated amount indicating possibility of a match with predetermined condition regarding each of the patterns, and updating the estimated amount based on an answer result of the question about the decided variable.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
However, a particular statistic is not necessarily consistent with the on-the spot knowledge that is satisfactory to the user, and there are times when a large number of questions need to be asked until the user is satisfied. Moreover, also in the case in which the features are selected by the machine learning model, even though the selection is dependent on a particular index of questions in the machine learning model, that index is not necessarily consistent with the on-the spot knowledge of the user. Hence, eventually, there are times when the number of questions becomes large.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. However, the present invention is not limited by the embodiment described below. Meanwhile, it is possible to combine embodiments without causing contradictions.
Functional Configuration of Information Processing Device 10
Explained below with reference to
The communication unit 11 is a processing unit that controls the communication performed with other information processing devices.
The memory unit 12 is a memory device used to store a variety of data and to store computer programs to be executed by the control unit 13. The memory unit 12 is used to store a classification model 121, index data 122, and probability data 123.
The classification model 121 is a machine learning model that, for example, classifies input data into one of two values. For example, when an image in which a person is captured is input, the classification model 121 determines whether or not the person is wearing a uniform, and outputs the determination result indicating whether or not the person is wearing a uniform.
The index data 122 is related to the indexes that represent the patterns indicating the ranking of the features, that is, the order of priority of the features in a machine learning model.
In the example illustrated in
Meanwhile, an estimated amount α represents the weight coefficient of each index, and has the initial value set to 1.0. The information processing device 10 questions the user about the importance of each feature, and updates the estimated amount α based on the answer to each question. Thus, although the details are explained later, the information processing device 10 treats an index in which the important features are ranked high as the index that is more consistent with the on-the spot knowledge. Hence, the features can be more easily selected from that index.
Each index in the index data 122 is generated by ranking the features using the statistic such as correlation, mutual information content, and chi-square value or using an existing technology such as the predicted distribution with respect to each feature of the machine learning model.
The probability data 123 is related to the probability at which the user is estimated to comply regarding each feature. That is, the probability data 123 is related to the feature-by-feature probability at which the concerned feature is estimated to be an important feature and at which an affirmative answer (Yes) is estimated to be obtained from the user to the question about whether or not the concerned feature is an important feature. In the example illustrated in
Meanwhile, the abovementioned data stored in the memory unit 12 is only exemplary, and a variety of data other than the abovementioned data can also be stored in the memory unit 12.
The control unit 13 is a processing unit that controls the information processing device 10 in entirety; and includes a deciding unit 131, an output unit 132, and an updating unit 133.
The deciding unit 131 selects and decides on the target feature for questioning the degree of importance based on: the order of priority of the features in the machine learning model as based on the index data 122; and based on the estimated amount α indicating the possibility of a match with a predetermined condition regarding each set of index data 122.
For example, the deciding unit 131 decides on the target feature for questioning based on: the order of priority of the features decided on the basis of the statistic indicating at least one of the correlation, the mutual information content, and the chi-square value with respect to the features of the machine learning model; and based on the estimated amount α. Meanwhile, the features in a machine learning model are equivalent to the variables in the machine learning model, and the index data 122 is equivalent to a plurality of patterns indicating the ranking of a plurality of variables.
The output unit 132 outputs a question about the degree of importance of the feature that is decided by the deciding unit 131. For example, the output unit 132 outputs a question via an output device such as a display device connected to the information processing device 10. Alternatively, the output unit 132 can output a question to an information processing terminal (not illustrated) that is communicably connected via a network.
The updating unit 133 obtains the answer to the question about the degree of importance as output by the output unit 132, and updates the estimated amount α based on the answer.
For example, if the answer to the question about the degree of importance indicates that the feature decided by the deciding unit 131 is an important feature, then the updating unit 133 increases the estimated amount α of the index data 122 in which the order of priority of the concerned feature is equal to or higher than a predetermined threshold value. On the other hand, if the answer to the question about the degree of importance indicates that the feature decided by the deciding unit 131 is not an important feature, then the updating unit 133 increases the estimated amount α of the index data 122 in which the order of priority of the concerned feature is not equal to or higher than the predetermined threshold value. Herein, the predetermined threshold value is, for example, the value indicating the second rank from the top.
Details of Functions
With reference to
Firstly, as the prerequisite, the total number of features in the machine learning model is assumed to be five, namely, the features “a” to “e”. Moreover, it is assumed that the two features “a” and “b” are considered important by the user but are not known to be an important feature to the information processing device 10, that is, are not set or stored in advance in the information processing device 10. Furthermore, although explained in detail later, it is assumed that a user parameter δ that is used at the time of deciding the updating details of the estimated volume α is equal to two (ranked second from the top). For example, the user parameter δ can be decided based on a predetermined ratio of the total number of features, and can be stored in the information processing device 10. Alternatively, the user parameter δ can be set to an arbitrary value by the user.
Regarding the feature deciding operation, as illustrated on the left side in
Subsequently, as illustrated on the right side in
For example, if there are three indexes “X”, “Y”, and “Z” that are referred to as the first index, the second index, and the third index, respectively; then the probability of the i-th index is calculated using Equation (1) given below.
In Equation (1), Σjαj represents the combined total of the estimated amounts a of all indexes. In the example illustrated in
The information processing device 10 calculates the probability of each index using Equation (1) and adds the feature-by-feature probabilities to obtain the probability of each feature as illustrated on the right side in
Subsequently, the information processing device 10 selects and decides on the feature “a”, which has the highest probability, as the target feature for the first round of questioning. Then, the information processing device 10 outputs, as the first round of questioning, a question to the user about whether or not the feature “a” is an important feature. Since the user considers the feature “a” to be an important feature, an affirmative answer (important feature) is obtained from the user in response to the question.
Since it becomes clear that the feature “a” is an important feature, the information processing device 10 updates the estimation amount α in such a way that the indexes X and Y, which include the feature “a” within the top two ranks as indicated by “2” in the example illustrated in
Moreover, the information processing device 10 can update the estimated amount α of the index Z, which does not include the important feature “a” within the top two ranks, from 1.0 to 0.85, for example. That makes it easier to select the target feature for questioning from among the indexes X and Y that include the important feature in the top ranks. Meanwhile, the amount of decrease in the estimation amount α can also be based on, for example, a preset value.
Given below is the explanation of the feature selection operation corresponding to the second round of questioning according to the present embodiment.
Firstly, as illustrated on the left side in
Then, as illustrated on the right side in
Then, the information processing device 10 selects and decides on the feature “e”, which has the highest probability, as the target feature for the second round of questioning. Subsequently, the information processing device 10 outputs, as the second round of questioning, a question to the user about whether or not the feature “e” is an important feature. Since the user does not consider the feature “e” to be an important feature, a negative answer (non-important feature) is obtained from the user in response to the question.
Since it becomes clear that the feature “e” is not an important feature, the information processing device 10 updates the estimation amount α in such a way that the index X that does not include the feature “e” within the top two ranks is given preference. More particularly, the information processing device 10 updates the estimated amount α of the index X from 1.15 to 2.0, for example.
Moreover, the information processing device 10 can update the estimated amount α of the indexes Y and Z, which include the non-important feature “e” within the top two ranks, from 1.15 to 1.1 and from 0.85 to 0.8, respectively, for example.
Given below is the explanation of the feature selection operation corresponding to the third round of questioning according to the present embodiment.
Firstly, as illustrated on the left side in
Then, as illustrated on the right side in
Then, the information processing device 10 selects and decides on the feature “b”, which has the highest probability, as the target feature for the third round of questioning. Subsequently, the information processing device 10 outputs, as the third round of questioning, a question to the user about whether or not the feature “b” is an important feature. Since the user considers the feature “b” to be an important feature, an affirmative answer (important feature) is obtained from the user in response to the question.
Since it becomes clear that the feature “b” is an important feature, the information processing device 10 updates the estimation amount α in such a way that the index X, which includes the feature “b” within the top two ranks, is given preference. Moreover, the information processing device 10 can reduce the estimated amount α of the indexes Y and Z that do not include the important feature “b” within the top two ranks.
In the example explained with reference to
Flow of Operations
Explained below with reference to
Firstly, as illustrated in
Then, the information processing device 10 initializes the estimated amount α in the index data 122 (Step S102). More particularly, in the initial stage, the information processing device 10 is not aware of the features considered important by the user, and is not aware about which index in the index data 122 is consistent with the on-the spot knowledge of the user. Hence, the information processing device 10 sets the estimated amount α of each index to 1.0 without exception. Meanwhile, the initialization of the estimated amount α can be performed along with the generation of the index data 122.
Subsequently, as explained earlier with reference to
If an affirmative answer (important feature) is obtained from the user in response to the question asked at Step S103 (Yes at Step S104), then the information processing device 10 updates the estimated amount α in such a way that the indexes which include the target feature for questioning within the top δ ranks are given preference (Step S105).
On the other hand, if a negative answer (non-important feature) is obtained from the user (No at Step S104), then the information processing device 10 updates the estimated amount α in such a way that the indexes which do not include the target feature for questioning within the top δ ranks are given preference (Step S106).
Then, the information processing device 10 confirms, via a user interface, whether or not the user is satisfied. If the answer indicates that the user is satisfied (Yes at Step S107), then the information processing device 10 ends the estimated-amount updating operation illustrated in
On the other hand, if the answer indicates that the user is not satisfied (No at Step S107), then the system control returns to Step S103 and, until the user is satisfied, the information processing device 10 repeatedly asks questions about the degree of importance of the features not yet treated as the targets for questioning (Step S103 to Step S107).
Effect
As explained above, based on a plurality of priorities based on a plurality of patterns indicating the ranking of a plurality of variables and based on the estimated amount indicating the possibility of a match with a predetermined condition regarding each pattern, the information processing device 10 decides on the target variable for questioning the degree of importance and updates the estimated amount based on the answer to the question regarding the decided target variable.
In this way, based on the index and the estimated amount that indicates the possibility of a match with the on-the spot knowledge of the user, the information processing device 10 selects and decides on the target feature for questioning about the degree of importance, and updates the estimated amount based on the answer. As a result, in order to generate a machine learning model capable of providing explanation, the information processing device 10 can select, in a more efficient manner, the feature matching with the on-the spot knowledge of the user.
Meanwhile, when the answer indicates that the decided variable is important, the operation of updating the estimated amount includes increasing the estimated amount for the patterns in which the order of priority of the decided variable is equal to or higher than a predetermined threshold value.
As a result, it becomes easier for the information processing device 10 to select the feature that is consistent with the on-the spot knowledge of the user.
On the other hand, when the answer indicates that the decided variable is not important, the operation of updating the estimated amount includes increasing the estimated amount for the patterns in which the order of priority of the decided variable is not equal to or higher than the predetermined threshold value.
As a result, it becomes easier for the information processing device 10 to select the feature that is consistent with the on-the spot knowledge of the user.
The operation of deciding on the variable includes deciding on the variable based on: the order of priority of the patterns indicating the ranking decided based on the statistic indicating at least one of the correlation, the mutual information content, and the chi-square value with respect to the variables; and based on the estimated amount.
As a result, it becomes easier for the information processing device 10 to select the feature that is consistent with the on-the spot knowledge of the user.
System
The processing procedures, the control procedures, specific names, various data, and information including parameters described in the embodiments or illustrated in the drawings can be changed as required unless otherwise specified. Moreover, the specific examples, the distributions, and the numerical values explained in the working examples are only exemplary and can be arbitrarily changed.
The constituent elements of the information processing device 10 are merely conceptual, and need not be physically configured as illustrated. For example, the deciding unit 131 of the information processing device 10 can be divided into a plurality of processing units, or the deciding unit 131 and the output unit 132 of the information processing device 10 can be integrated into a single processing unit. Thus, the constituent elements, as a whole or in part, can be separated or integrated either functionally or physically based on various types of loads or use conditions. Furthermore, the process functions implemented in each device are entirely or partially implemented by a central processing unit (CPU) or by computer programs that are analyzed and executed by a CPU, or are implemented as hardware by wired logic.
The communication interface 10a is a network interface card that communicates with other information processing devices. The HDD 10b is used to store a computer program meant for implementing the functions illustrated in
The processor 10d is a CPU, a micro processing unit (MPU), or a graphics processing unit (GPU). Alternatively, the processor 10d can be implemented using an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). The processor 10d is a hardware circuit that reads a computer program, which executes operations identical to the operations of the processing units illustrated in
Alternatively, the information processing device 10 can read the computer program from a recording medium using a medium reading device, and execute the program so that the functions according to the embodiment can be implemented. Meanwhile, the computer program is not limited to be executed by the information processing device 10. For example, even when the computer program is executed by some other information processing device or when the computer program is executed in cooperation among devices, the embodiment can still be implemented in an identical manner.
The computer program can be distributed via a network such as the Internet. Alternatively, the computer program can be recorded in a recording medium such as a hard disk, a flexible disk (FD), a compact disc read only memory (CD-ROM), a magneto-optical (MO) disk, or a digital versatile disc (DVD) readable by an information processing device. Then, an information processing device can read the computer program from the recording medium, and execute it.
According to an aspect, in order to generate a machine learning model capable of providing explanation, the features that are consistent with the on-the spot knowledge of the user can be selected in a more efficient manner.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2021-208217 | Dec 2021 | JP | national |