The present disclosure relates to a technique for evaluating predictors.
In the operation of AI (Artificial Intelligence), in order to adapt and improve the performance of AI against environmental changes, it is essential to perform relearning using new data and update AI. When updating AI, it is required that the accuracy of AI after the updating be improved from the AI before the updating. Patent Document 1 discloses a method for reducing the deterioration of a model when updating a model generated by machine learning. Further, Patent Document 2 discloses a technique for evaluating the closeness of the structures of the prediction models before and after the relearning as the closeness of the nature of the prediction models.
Even when the accuracy is improved by updating AI, the behavior of AI may differ before and after the updating. For example, there may occur such a phenomenon that the AI after the updating cannot correctly answer the question that the AI before the updating answered correctly. In this situation, AI operators may need to spend much effort and time to grasp the habits of AI after the updating, or they may need to change business operations for the prediction by AI.
It is an object of the present disclosure to provide a technique for evaluating compatibility of predictors.
According to an example aspect of the present disclosure, there is provided a compatibility evaluation device comprising:
According to another example aspect of the present disclosure, there is provided a compatibility evaluation method comprising:
According to still another example aspect of the present disclosure, there is provided a recording medium storing a program, the program causing a computer to:
According to the present disclosure, the compatibility of the predictors can be evaluated.
Preferred example embodiments of the present disclosure will be described with reference to the accompanying drawings.
<Compatibility Evaluation Index>
(Compatibility of Predictors)
When AI is updated (relearned) using new data, the update is carried out so that the accuracy may be improved. At that time, the compatibility of AI becomes an issue. Compatibility refers to the degree of coincidence between the correct/incorrect answers of the pre-update AI and the correct/incorrect answers of the post-update AI.
One of the indices showing compatibility is a Backward Trust Compatibility (hereafter referred to as “BTC”) score. The BTC score is a percentage that the post-update AI can correctly answer the data that the pre-update AI correctly answered. When the BTC score is high, compatibility is high.
As shown in
Another index of compatibility is a Backward Error Compatibility (hereinafter referred to as “BEC”) score. The BEC score is a percentage that the pre-update AI mistakes data that the post-update AI mistakes. When the BEC is high, the compatibility is high.
Thus, when updating AI by relearning, not only accuracy but also compatibility with the pre-update AI must be considered. In the following, we propose a generalized backward compatibility index that can be applied to various tasks.
(Generalized Backward Compatibility Index)
The generalized backward compatibility index is an index which generalized the compatibility index such as the aforementioned BTC and BEC. Examples of the generalized backward compatibility index are described below.
The first example is an example of the most basic generalized backward compatibility index. It is assumed that a predictor h and a pair of an input and an output are as follows.
The above Equation (1) includes four relational expressions CC(h1,h2), EC(h1,h2), IC1(h1,h2), IC2(h1, h2) that show the relation between the output of the predictor h1 and the output of the predictor h2 for the evaluation data. Each of “a0”, “a00”, “a01”, “a10”, “a11”, “b0”, “b00”, “b01”, “b10”, “b11” are a coefficient (weight).
The four relational expressions have the following meanings:
Specifically, the above four relational expressions are given as follows.
In Equation (1), when the coefficients a11, b10, b11 are set to “1” and the other coefficients are set to “0,” the GBC score of Equation (1) matches the BTC score. Thus, the above GBC includes the BTC.
Further, in Equation (1), when the coefficients a00, b00, b10 are set to “1” and the other coefficients are set to “0”, the GBC score in Equation (1) matches the BEC score. Thus, the above GBC includes the BEC.
Thus, by utilizing the above generalized backward compatibility index (GBC), an appropriate compatibility index can be defined according to the task of the predictor by changing the coefficients (weights) of Equation (1).
Next, an example of the equation for calculating the score using GBC of the first example is shown. Now, the inputs are set as follows.
The estimated value GBC A of the GBC score is given by the following equation. For convenience, the symbol with “{circumflex over ( )}” on the letter “X” is referred to as “X{circumflex over ( )}”.
The relational expressions CC{circumflex over ( )}, EC{circumflex over ( )}, IC1{circumflex over ( )}, IC2{circumflex over ( )} are given by the following equations, by replacing the expected value in Equations (2) to (5) with the sample mean.
In the first example described above, as shown in Equation (1), the coefficients (weights) are set for the four relational expressions CC, EC, IC1, IC2. In contrast, in the second example, the coefficients (weights) are set for each class y predicted by the predictors h1 and h2. The GBC score according to the second example is given by the following equation.
In addition, the four relational expressions are given as follows.
Incidentally, if the weights are set to be constant such as a11=a11,1= . . . =a11,|y|, the Equation (11) matches the Equation (1) of the first example.
The second GBC enables to construct a variety of pre-existing binary classifiers, which can be expressed in linear fractional expressions, in the context of backward compatibility. For example, the weights of the GBC shown in Equation (11) can be adjusted to construct a compatibility index that is effective for imbalanced binary classification. If the compatibility is not considered, the F-value (Y=1 is positive class and Y=0 is negative class) in the binary classification Y∈{0,1} are as follows:
This F-value is an index of the accuracy in the imbalanced binary classification, which emphasizes the positive class with less data.
On the other hand, the F-value (referred to as “BC-F”) considering the compatibility is as follows when a11,1,=b11,1=2, b11,0=b00,1=1 and the remaining factors are “0” in the GBC.
This BC-F value is an index of compatibility in imbalanced binary classification, which emphasizes the positive class with less data. Thus, by adjusting the weights of the GBC, compatibility indices in various binary classifications can be generated.
The third example is an example of a compatibility index other than the linear fractional expression such as the first example and the second example. In the binary classification, we consider the task in which the score ranking of the pre-update predictor coincides with the score ranking of the the post-update predictor. Assuming that the predictor assigns real numbers to “−1” or “+1,” the following compatibility index is obtained.
This compatibility index includes a relational expression
1|g1(X)>g1(X′)┘
showing the magnitude relation of the output of the pre-update predictor and a relational expression
1[g2(X)>g2(X′)|┘
showing the magnitude relation of the output of the post-update predictor, when the evaluation data X whose correct answer is “+1” and the evaluation data X′ whose correct answer is “−1” are inputted. By this compatibility index, an expected value that the magnitude relation of the outputs to X′ and X before the update is maintained after the update can be obtained as the GBC score. In other words, the GBC score indicates whether or not the output tendency of the predictor before and after the update for the input matches. By this compatibility index, the effect like AUC (Area under the ROC curve) is expected.
(Application to Regression Tasks)
In the first and second examples described above, the predictor is assumed to perform a classification task. However, the GBC can also be applied to the predictor performing a regression task. In that case, the GBC of the first example or the second example may be applied by regarding the expected value as a correct answer if the difference between the expected value that the predictor outputs for the evaluation data and the actual value corresponding to the evaluation data is equal to or smaller than a predetermined threshold value, and regard the expected value as an incorrect answer if the difference is larger than the predetermined threshold value.
[Overall Configuration]
The predictor h1 and the predictor h2 output the predicted value for the inputted evaluation data to the compatibility evaluation device 100. The compatibility evaluation device 100 uses the generalized backward compatibility index (GBC) described above to output the compatibility score indicating the compatibility between the output of the predictor h1 and the output the predictor h2.
[Hardware Configuration]
The interface (IF) 101 receive the predicted values from the predictors h1, h2. The IF 101 outputs the compatibility score calculated by the compatibility evaluation device 100 to an external device. The IF 101 is an example of an acquisition means.
The processor 102 is a computer, such as a CPU, and controls the entire compatibility evaluation device 100 by executing a program prepared in advance. The processor 102 may be a GPU or an FPGA (Field-Programmable Gate Array). Specifically, the processor 102 performs the compatibility evaluation processing described below.
The memory 103 may be a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The memory 103 stores information of the generalized backward compatibility index, the coefficients (weights) for each index number, and the like. The memory 103 is also used as a working memory during various processing operations by the processor 102.
The recording medium 104 is a non-volatile and non-transitory recording medium such as a disk-like recording medium, a semiconductor memory, or the like, and is configured to be detachable from the compatibility evaluation device 100. The recording medium 104 records various programs executed by the processor 102. When the compatibility evaluation device 100 executes the processing, the program recorded in the recording medium 104 is loaded into the memory 103 and executed by the processor 102.
The input unit 105 may be, for example, a keyboard, mouse, or the like, and is used by a user to perform various instructions and inputs. The display unit 106 is, for example, a liquid crystal display device, and displays various types of information to the user.
[Functional Configuration]
The index number is determined in advance in association with the combination of the coefficients (weights) included in Equation (1). For example, when the compatibility index number “1” corresponds to the BTC, a combination of coefficients “the coefficients a11=b10=b11=1, and other coefficients=0” is previously associated for the compatibility index number “1”. Therefore, when the user designates the compatibility index number “1”, the evaluation index determination unit 110 substitutes the “the coefficients a11=b10=b11=1, and other coefficients=0” to the Equation (1), and generates an evaluation index indicating the BTC score.
The score calculation unit 120 calculates and outputs the compatibility score from the predicted values outputted by the predictors h1, h2 using the determined evaluation index. For example, the score calculation unit 120 calculates the values of the four relational expressions CC(h1,h2), EC(h1,h2), IC1(h1,h2), IC2(h1,h2) by substituting the predicted values outputted by the predictors to Equations (7) to (10), and substitutes the values to the evaluation index such as Equation (6) to calculate and output the GBC score.
The evaluation index determination unit 110 is an example of an index determination means, and the score calculation unit 120 is an example of a calculation means.
[Compatibility Evaluation Processing]
First, the compatibility evaluation device 100 receives the designation of the index number by the user (step S11). Next, the evaluation index determination unit 110 determines the evaluation index based on the designated index number (step S12). For example, when the GBC of the first example or the second example described above is used as an evaluation index, the evaluation index determination unit 110 acquires the respective coefficients (weights) corresponding to the index number, and substitutes them to Equation (1) or Equation (11) to determine the evaluation index.
Next, the score calculation unit 120 acquires the predicted values that the the predictors h1, h2 outputted for the evaluation data (step S13), and inputs the them to the evaluation index determined in step S12 to calculate and output the compatibility score (the GBC score) (step S14). Thus, the compatibility score indicating the compatibility between the predictor h1 and the predictor h2 can be obtained. Then, the processing ends.
[Use Case]
When a plurality of post-update predictors with different hyperparameters or seeds are generated at the time of updating the predictors, the GBC can be used as an index to evaluate their compatibility. By selecting a predictor having a high compatibility with the pre-update predictor from among the plurality of post-update predictors, it is possible to reduce the cost for procedural changes associated with the changed behavior of AI after the update.
In addition, when seasonal changes occur to the data, the GBC can be used to look for the prediction model highly compatible with the current prediction model from among the past prediction models. When there is a past prediction model that is highly accurate and highly compatible with the current prediction model, switching the current prediction model to the prediction model achieves the switching to a prediction model appropriate for the season without the cost of relearning.
Also, when KPI (Key Performance Indicator on the business side) changes during operation of AI, the GBC can be used to construct a compatibility index that emphasizes items that the new KPI emphasizes (e.g., the class that they want to correctly answer) and realize a continuous AI operation.
[Construction of Predictor Using GBC]
In the above example, the GBC is used to evaluate the compatibility of the predictors at the time of updating. Instead, the GBC can be used in training the predictor. In this case, at the time of training the predictor, the GBC is added as a regularization term to the error function which is used in ordinary training. Specifically, the upper bound of the GBC can be constructed by replacing the indication function with the loss function (squared loss or hinge loss) as in the conventional generalized binary classification index. Then, the prediction model is trained so as to minimize the combined error function of the constructed upper bound and the ordinary binary classification. By inputting the pre-update predictors and the additionally collected data, and by using the GBC for the regularization, a new predictor suitable for the target task and having high backward compatibility can be constructed.
Next, a second example embodiment of the present disclosure will be described.
According to the compatibility evaluation device 70 of the second example embodiment, it is possible to evaluate the compatibility of the predictors using an appropriate compatibility index according to the task of the predictors.
A part or all of the example embodiments described above may also be described as the following supplementary notes, but not limited thereto.
(Supplementary Note 1)
A compatibility evaluation device comprising:
(Supplementary Note 2)
The compatibility evaluation device according to Supplementary note 1, wherein the generalized backward compatibility index is represented by four arithmetic operations of a plurality of weighted relational expressions.
(Supplementary Note 3)
The compatibility evaluation device according to Supplementary note 2, further comprising a designation means configured to receive a designation of the compatibility index,
(Supplementary Note 4)
The compatibility evaluation device according to any one of Supplementary notes 1 to 3,
a third equation indicating a percentage that the output of the first predictor is incorrect and the output of the second predictor is correct; and
a fourth equation indicating a percentage that the output of the first predictor is correct and the output of the second predictor is incorrect.
(Supplementary Note 5)
The compatibility evaluation device according to Supplementary note 4,
(Supplementary Note 6)
The compatibility evaluation device according to Supplementary note 1,
(Supplementary Note 7)
A compatibility evaluation method comprising:
(Supplementary Note 8)
A recording medium storing a program, the program causing a computer to:
While the present disclosure has been described with reference to the example embodiments and examples, the present disclosure is not limited to the above example embodiments and examples. Various changes which can be understood by those skilled in the art within the scope of the present disclosure can be made in the configuration and details of the present disclosure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/008149 | 3/3/2021 | WO |