The present invention relates to an evaluation of a machine learning model.
PTL 1 discloses a method of evaluating compatibility between a pre-update model and a post-update model when a machine learning model is updated.
According to the technique described in PTL 1, compatibility in overall performance is evaluated. However, it may be desirable that the performance evaluation of the machine learning model is performed for each of a plurality of groups such as a plurality of departments in a company or the like or a plurality of viewpoints with respect to an evaluation target.
An object of the present invention is to provide an evaluation of a machine learning model in which a plurality of groups are considered.
An information processing device according to an aspect of the present invention includes a data acquisition means that acquires at least one condition for evaluation data of a machine learning model, a performance calculation means that calculates a performance index of the machine learning model and a performance index of the machine learning model after being updated using a data set specified for each of the at least one condition, and an index calculation means that calculates a deterioration index of a performance of the machine learning model based on the performance indexes before and after the machine learning model is updated.
An information processing device according to an aspect of the present invention includes a data acquisition means that acquires at least one condition for evaluation data of a machine learning model, a performance calculation means that calculates a performance index of the machine learning model using a data set specified for each of the at least one condition, and a learning means that causes the machine learning model to perform relearning based on the performance index.
An information processing method according to an aspect of the present invention includes acquiring at least one condition for evaluation data of a machine learning model, calculating a performance index of the machine learning model and a performance index of the machine learning model after being updated using a data set specified for each of the at least one condition, and calculating a deterioration index of a performance of the machine learning model based on the performance indexes before and after the machine learning model is updated.
A recording medium according to an aspect of the present invention records a program for causing a computer to execute acquiring at least one condition for evaluation data of a machine learning model, calculating a performance index of the machine learning model and a performance index of the machine learning model after being updated using a data set specified for each of the at least one condition, and calculating a deterioration index of a performance of the machine learning model based on the performance indexes before and after the machine learning model is updated.
According to the present invention, a machine learning model can be evaluated in consideration of a plurality of groups.
Hereinafter, example embodiments of the present invention will be described with reference to the drawings. However, the example embodiments are not limited to the description of the drawings.
A machine learning model that is a premise in the present example embodiment will be described. The machine learning model is information indicating a relationship between an explanatory variable and an objective variable. The machine learning model is, for example, a component for estimating a result of an estimation target by calculating an objective variable based on an explanatory variable.
The machine learning model is generated by executing a learning algorithm using learning data in which the value of the objective variable has already been obtained and a certain parameter as inputs. The machine learning model may output a variable that describes a probability distribution of an objective variable. The machine learning model may be described as a “learning model”, an “analysis model”, an “AI model”, a “trained model”, an “inference model”, a “prediction formula”, or the like.
The explanatory variable is a variable used as an input in the machine learning model. The explanatory variable may be described as a “feature amount”, a “feature”, or the like.
The learning algorithm for generating the machine learning model is not particularly limited, and may be an existing learning algorithm. For example, the learning algorithm may be a random forest, a support vector machine, Naive Bayes, a neural network, or a piecewise linear model using factorized asymptotic bayesian (FAB) inference, or a neural network.
A method of a piecewise linear model using FAB inference is disclosed in, for example, US 2014/0222741 A1.
The performance of the machine learning model as described above may deteriorate due to a change in environment. When the accuracy of the machine learning model decreases, a user of the machine learning model re-train the machine learning model (also referred to as a “pre-update model”) to improve the accuracy of the updated machine learning model (also referred to as a “post-update model”).
However, users may evaluate the performance of the machine learning model from various perspectives. For example, a machine learning model that predicts purchases of products in a store will be described as an example. A store manager who operates the store evaluates the performance of the machine learning model using the accuracy of all purchases in the store. However, a person in charge on weekdays evaluates the performance of the machine learning model using the accuracy of purchases on weekdays rather than the accuracy of all purchases. On the other hand, a person in charge on weekends evaluates the performance of the machine learning model using the accuracy of purchases on weekends.
In this way, the users evaluate the performance of the machine learning model from various perspectives. Therefore, it is preferable to divide data for evaluating performance of machine learning into several groups of units to evaluate the performance of the machine learning model.
At this point, an information processing device 10 according to a first example embodiment outputs an index for evaluating the post-update model in consideration of the plurality of groups.
In the following description, the pre-update model, the post-update model, and the evaluation data are stored at any location, and a device for storing them is not limited. For example, the information processing device 10 may operate after a pre-update model and a post-update model are acquired and stored, or may operate using a pre-update model and a post-update model stored in a device that is not illustrated. Alternatively, the information processing device 10 may operate after a data set related to a group is stored in advance, or may operate while acquiring data.
The data acquisition unit 110 acquires a group condition for specifying a group in which each piece of data (also referred to as “evaluation data” or “data to be evaluated”) used for prediction by the model is included. The group condition is a condition for specifying a group (also referred to as a “data set”) in which each piece of evaluation data is included.
The evaluation data is data including an explanatory variable (x) and an objective variable (y). At least some of the evaluation data may belong to a plurality of groups. For example, the above-described machine learning model that predicts purchases of products in a store will be described as an example. The evaluation data of the machine learning model used in a certain store is a data set including all data measured in the store as elements. The group “weekday” is a data set of data measured at the store, under the condition that the data are measured on weekdays. The group “midnight time zone” is a data set of data measured at the store, under the condition that the data are measured in the midnight time zone. In this case, data measured in midnight time zones on weekdays belongs to the group “weekday” and the group “midnight time zone”.
The data acquisition unit 110 may acquire a performance index for evaluating the performance of the machine learning model. When the data acquisition unit 110 acquires the performance index, the performance calculation unit 120 to be described later calculates the acquired performance index. The data acquisition unit 110 may acquire a calculation method such as a performance calculation formula.
The performance index is an index for evaluating the performance of the machine learning model. An example of the performance index in the present example embodiment is an index of which a value is higher as the performance is higher. For example, in the case of a discrimination model, the performance index is accuracy, correct answer rate, matching rate, reproduction rate, F measure (F1 score), FB score, receiver operating characteristic curve (ROC)-area under the curve (AUC), or precision recall curve (PR)-AUC. However, the performance index is not limited thereto. For example, in the case of a regression model, the performance index may be a coefficient of determination.
The performance calculation unit 120 calculates a performance index of the pre-update model and a performance index of the post-update model using the group defined according to the group condition acquired by the data acquisition unit 110. Hereinafter, the performance index of the pre-update model will be referred to as a “pre-update performance”, and the performance index of the post-update model will be referred to as a “post-update performance”.
In at least some of the groups, the performance calculation unit 120 may calculate a post-update performance by using some of the evaluation data included in the group, rather than using all the evaluation data included in the group. When some of the data is used, data used for calculating a pre-update performance may be different from data used for calculating a post-update performance in at least some of the groups.
The index calculation unit 130 calculates a degradation index (also referred to as “deterioration index”) indicating a deterioration of the performance of the post-update model relative to the pre-update model based on the pre-update performance and the post-update performance for each of the groups. A low value of the degradation index indicates that the post-update model is a good model. For example, the index calculation unit 130 calculates the degradation index based on a difference between the pre-update performance and the post-update performance. More specifically, the index calculation unit 130 may calculate an index represented by the following Formula 1 as a degradation index.
The index of Formula 1 is a value obtained by summing up deteriorations in performance of the post-update model relative to the pre-update model for the plurality of groups and normalizing the sum using a normalization coefficient. In Formula 1, the second term of max (x,y) indicates a deterioration in performance. The function max (x,y) is a function that returns the larger-value argument of the two arguments (x,y). The coefficient Zg in Formula 1 is a normalization coefficient that entirely normalizes the sum of the terms of the max function.
The second term of the max function in Formula 1 takes a positive value when the performance index of the post-update model deteriorates as compared with the pre-update performance. On the other hand, when the post-update performance improves as compared with the pre-update performance, the second term of the max function takes a negative value. Therefore, the max function becomes a positive value when the performance of the post-update model deteriorates as compared with the performance of the pre-update model, and becomes 0 when the performance of the post-update model improves as compared with the performance of the pre-update model.
The second term of the function max (x,y) in Formula 1 will be described in detail. The coefficient Zs included in the second term of the function max (x,y) of Formula 1 is a normalization coefficient for a deteriorating amount of the performance of the post-update model as compared with the pre-update model in the group. The function Ms(hn, {(x,y)}) is a function that outputs a performance index of a model hn for the set {(x,y)}. h1 refers to a pre-update model. h2 refers to a post-update model. D1 is a data set used to calculate a pre-update performance. D2 is a data set used to calculate a post-update performance. D1 and D2 may be the same data set or different data sets.
The function Ms(hn, {(x,y)}) in Formula 1 will be described in detail. The function s(f,x,y) in the function Ms(hn, {(x,y)}) of Formula 1 is an element included in the set S, and is a function that outputs 1 (True) when the set of f, x, and y satisfies the condition and outputs 0 (False) when the set of f, x, and y does not satisfy the condition.
The index calculation unit 130 may use an index represented by Formula 2. Formula 2 is a formula for calculating a degradation index for each of the plurality of groups based on the difference between the pre-update performance and the post-update performance, and selecting the largest one of the calculated degradation indexes.
When the index calculation unit 130 uses a performance index of which a value is smaller as the performance is better, the index calculation unit 130 may use Formulas 3 and 4 in which the sign of the second term of the function max (x,y) in Formulas 1 and 2 is inverted.
As examples of the above-described normalization coefficients Zs and Zg, the index calculation unit 130 may use, but not limited to, the following averages (A) to (C).
In a case where an average value weighted based on the pre-update performances is used as a degradation index, a normalization coefficient is as follows.
In a case where an average value weighted based on the number of pieces of data belonging to the group is used as a degradation index, a normalization coefficient is as follows.
In a case where an average value weighted based on the performance index and the evaluation of the pre-update model is used as a degradation index, a normalization coefficient is as follows.
In a case where an average value weighted based on an importance level designated by the user for each group s is used as a degradation index, 1/Zs is a weight proportional to the importance level of the group s designated by the user. Zg is a value adjusted so that the maximum value becomes 1.
The index calculation unit 130 may not perform normalization. In this case, both Zs and Zg are 1.
The index calculation unit 130 may use cross entropy, mean square error, mean absolute error, compatibility index (back trust compatibility (BTC)) of PTL 1, or the like, not limited to Formula 1 described above, as a degradation index.
Here, a case where the index calculation unit 130 calculates a degradation index using the compatibility index (back trust compatibility (BTC)) of PTL 1 will be described. The following Formula 9 represents a BTC, and the following Formula 10 represents a degradation index using the BTC.
Here, in Formula 9, hn (x)=y means that an output for an input x is y in a machine learning model hn. In Formula 10, D1 and D2 are as follows.
The output unit 140 outputs a degradation index on a display device. A method by which the output unit 140 displays a degradation index on the display device is not particularly limited. For example, the output unit 140 may output a degradation index for each group. The output unit 140 may output a plurality of types of degradation indexes.
Alternatively, the output unit 140 may output information on a group having a deterioration index lower than a predetermined threshold in comparison with the overall performance. In this case, the display device acquiring information from the output unit 140 may display the information in a state where a display form of a group condition associated with the degradation index lower than the threshold is changed based on the information.
The output unit 140 may output other information associated with the degradation index. For example, the output unit 140 may output the post-update model and the degradation index of the updated model in association with each other. Alternatively, the output unit 140 may output the post-update model, the degradation index, and the performance for each group in association with each other. For example, the output unit 140 may output the post-update model, the degradation index, and the performance of the post-update model for all the data in association with each other. In this case, the display device may display the post-update model, the degradation index, and the performance for all the evaluation data in association with each other. Hereinafter, the performance of the machine learning model for all the evaluation data will be referred to as an “overall performance”.
The user may select a post-update model satisfying a desired overall performance and a desired degradation index from among the plurality of post-update models displayed as illustrated in
(Example where Machine Learning Model is Evaluated Based on Degradation Index)
A white circle in
All of the accuracies of the machine learning models 1 to 3, which are post-update models, for all the evaluation data are 60%, which is improved over the accuracy of the pre-update model. Furthermore, the accuracies of the machine learning models 1 and 2 for the group 1 are 83%, which is improved over the accuracy of the pre-update model. In particular, the machine learning model 2 has a BTC of 80%, which is highest. Therefore, in terms of BTC, the machine learning model 2 is the best machine learning model. However, the accuracies of the machine learning models 1 and 2 for the group 0 are 25%, which deteriorates as compared with the accuracy of the pre-update model. On the other hand, the accuracy of the machine learning model 3 is improved for both the groups 0 and 1. Therefore, in a case where the groups are considered, the machine learning model 3 is the most desirable model. In addition, the degradation index calculated by the information processing device 10 is 0%, which is the lowest value, in the machine learning model 3. Therefore, the user of the information processing device 10 can select the machine learning model 3 of which the accuracy does not decrease in any group based on the degradation index.
As described above, the information processing device 10 includes a data acquisition unit 110, a performance calculation unit 120, an index calculation unit 130, and an output unit 140. The data acquisition unit 110 acquires a group condition for specifying a group in which each piece of evaluation data is included and a performance index for evaluating performances of models. The performance calculation unit 120 calculates a pre-update performance evaluated according to the performance index using evaluation data included in the group in relation to a pre-update model and a post-update performance evaluated according to the performance index using evaluation data included in the group in relation to a post-update model. The index calculation unit 130 calculates a degradation index indicating a deterioration in performance of the post-update model relative to the pre-update model based on the pre-update performance and the post-update performance for each of the groups. The output unit 140 outputs the degradation index.
As described above, the degradation index is an index in which the group is considered. That is, the information processing device 10 provides a model evaluation method in which a plurality of groups are considered. As a result, the user can evaluate a model in consideration of a plurality of groups. For example, the user can select a post-update model with reference to the graph as illustrated in
The learning unit 150 executes relearning using the pre-update model and training data for relearning to create a post-update model. The training data used for relearning may be the same data as the evaluation data or may be other data. When performing relearning, the learning unit 150 uses a weight determined based on the pre-update performance for each group.
Specifically, the learning unit 150 uses a weight for a post-update performance for a group extracted based on one or more specific conditions designated in advance. For example, the learning unit 150 uses a weight wi of the following Formula 12. i is a subscript indicating an index of training data. That is, wi is a weight for i-th training data.
λ and τ are hyperparameters that take values greater than 0, and are values that are adjusted depending on the object of the model. λ is a coefficient with respect to the sum of pre-update performances for each group. As λ increases, the weight of the evaluation data belonging to the group pointed out by the user increases. τ is a value adjusted according to a learning result. D1 represents a set of evaluation data.
The learning unit 150 may store the created post-update model in a device that is not illustrated or may output the created post-update model to the performance calculation unit 120. The performance calculation unit 120 may acquire the post-update model from the learning unit 150, or may acquire the post-update model from a device that is not illustrated, similarly to the first example embodiment.
A method in which the learning unit 150 sets weights will be described. First, the learning unit 150 gives a weight “1.0” to all the training data. Then, the learning unit 150 adds a weight proportional to the pre-update performance. Specifically, the learning unit 150 adds a weight proportional to the performance (0.5) to training data included in the group 0, and adds a weight proportional to the performance (0.8) to the training data belonging to the group 1. Then, the learning unit 150 adds a weight proportional to the value (1.3) obtained by adding the performance to training data belonging to both the groups 0 and 1. The learning unit 150 does not add a weight to data not belonging to any group.
Not limited to the weight setting in
(Relearning Method 2: Method Using Loss Function to which Degradation Index is Introduced)
The learning unit 150 may directly create a post-update model using the degradation index of the post-update model, not limited to the performances for the groups. For example, the learning unit 150 may create a post-update model in such a way as to optimize the loss function using Formula 13.
D2 represents training data for relearning. L is a normal loss function (including a regularization term and the like). L′s is a loss function related to a group satisfying the condition s, and is a function whose gradient can be calculated. Examples of L′s are shown below using Formula 14 and Formula 15. When cross entropy is used, L′s is formula 14.
In a case where the sigmoid function is used, L′s is Formula 15.
Here, α is a hyperparameter indicating a weight of the function h2(y|x). h2(y|x) is a function that outputs a probability that is an output y for an input x of the machine learning model h2.
The learning unit 150 may update the parameter of the machine learning model based on at least one of the performances for the group and the degradation index, not limited to the weight.
In this manner, the learning unit 150 of the information processing device 11 creates a post-update model by executing relearning based on at least one of the performances of the pre-update model for the group and the degradation index of the post-update model. Therefore, the information processing device 11 can create a post-update model that has executed appropriate relearning.
Here, an application example in a case where the present example embodiment is applied to the healthcare field will be described.
The information processing device 11 uses a machine learning model for performing healthcare for the user. In this case, the machine learning model is, for example, a model that predicts a health condition of the user based on biological data of the user acquired from a terminal device worn by the user.
The biological data of the user is, for example, data that is a blood oxygen concentration, a heart rate, a perspiration amount, a blood pressure, or other data affecting health condition of the user. The health condition of the user is, for example, an arrhythmia detection result, an atrial fibrillation detection result, a score indicating the quality of sleep, a score indicating the amount of exercise, or another index used to determine whether the user is healthy.
The information processing device 11 may create an action to be performed by the user or an action plan based on the prediction of the machine learning model. For example, the information processing device 11 collects biological data of the user from a wristwatch-type terminal device worn by the user. Then, the information processing device 11 displays a prediction value for the biological data on the terminal device of the user using the machine learning model. The information processing device 11 may calculate an action to be performed by the user or an action plan by applying a predetermined mathematical optimization calculation method to the prediction value, and display the action or the action plan on the terminal device.
The content output by the information processing device 11 is, for example, as follows, but is not limited thereto.
When the user wakes up in the morning, the information processing device 11 outputs a morning exercise to be performed by the user based on data acquired from the terminal device. Before breakfast, the information processing device 11 outputs a breakfast menu having an appropriate nutritional balance based on data acquired from the terminal device.
After work or school, the information processing device 11 outputs an appropriate exercise based on data acquired from the terminal device. Before lunch, the information processing device 11 outputs a lunch menu having an appropriate nutritional balance based on data acquired from the terminal device.
In the evening, the information processing device 11 outputs an appropriate exercise based on data acquired from the terminal device. Before dinner, the information processing device 11 outputs a dinner menu having an appropriate nutritional balance based on data acquired from the terminal device.
Before sleep, the information processing device 11 outputs an appropriate before-sleep stretching or breathing method based on data acquired from the terminal device.
In each use scene, the information processing device 11 may acquire an evaluation result as to whether the prediction result or the optimized proposal is appropriate from the user. The information processing device 11 calculates a degradation index of the machine learning model using the evaluation result of the user as evaluation data. Then, the information processing device 11 updates the machine learning model based on the deterioration index. In this manner, the information processing device 11 can update the machine learning model at all times by using the health data of the user and the evaluation result.
The data of the plurality of groups are data sets specified for the respective conditions. Therefore, the information processing device 10 may be configured as in a third example embodiment to be described below.
The index calculation unit 130 calculates a deterioration index based on a difference between the performance index of the machine learning model and the performance index of the machine learning model after being updated. Furthermore, the information processing device 12 may include an output unit 140. In this case, the output unit 140 may output the deterioration index and an overall performance of the machine learning model after being updated. Furthermore, the output unit 140 may output the machine learning model, the machine learning model after being updated, data for which the machine learning model is correct, data for which the machine learning model after being updated is correct, a performance for each condition, and an overall performance of the machine learning model after being updated.
The information processing device 11 may be configured as in a fourth example embodiment to be described below.
The information processing device 13 may include an index calculation unit 130. In this case, the performance calculation unit 120 further calculates a performance index of the machine learning model after being updated. Then, the index calculation unit 130 calculates a deterioration index of a performance of the machine learning model based on the performance indexes before and after the machine learning model is updated. Then, the learning unit 150 causes the machine learning model to perform relearning using a loss function related to the deterioration index of the machine learning model after being updated.
Next, a hardware configuration of each of the information processing devices 10 to 13 will be described. Each component of each of the information processing devices 10 to 13 may be configured by a hardware circuit. Alternatively, each component of each of the information processing devices 10 to 13 may be configured using a plurality of devices connected to each other via a network. For example, each of the information processing devices 10 to 13 may be configured using cloud computing. Alternatively, a plurality of components of each of the information processing devices 10 to 13 may be configured by one piece of hardware. Alternatively, each of the information processing devices 10 to 13 may be implemented as a computer device including a processor, a read-only memory (ROM), a random access memory (RAM), and a network interface card. As the processor, for example, a central processing unit (CPU), a graphic processing unit (GPU), a digital signal processor (DSP), a micro processing unit (MPU), a floating point number processing unit (FPU), a physics processing unit (PPU), a tensor processing unit (TPU), a quantum processor, a microcontroller, or a combination thereof can be used.
The processor 610 reads a program from at least one of the ROM 620 and the storage device 640. Then, the processor 610 controls the RAM 630, the storage device 640, and the network interface 650 based on the read program. Then, the computer device 600 including the processor 610 controls these components to implement functions as the data acquisition unit 110, the performance calculation unit 120, the index calculation unit 130, the output unit 140, and the learning unit 150. As described above, the computer device 600 may implement functions as a combination of hardware and software. The processor 610 may read a program included in a recording medium 690, which stores the program in a computer readable manner, using a recording medium reading device that is not illustrated. Alternatively, the processor 610 may receive a program from an external device that is not illustrated via the network interface 650, store the program in the RAM 630 or the storage device 640, and operate based on the stored program.
The ROM 620 stores programs to be executed by the processor 610 and fixed data. The ROM 620 is, for example, a programmable ROM (P-ROM) or a flash ROM. The RAM 630 temporarily stores programs to be executed by the processor 610 and data. The RAM 630 is, for example, a dynamic RAM (D-RAM). The storage device 640 stores data and programs to be stored by the computer device 600 for a long period of time. The storage device 640 may operate as a temporary storage device of the processor 610. The storage device 640 is, for example, a hard disk device, a solid state drive (SSD), or a disk array device.
The ROM 620 and the storage device 640 are non-volatile (non-transitory) recording media. On the other hand, the RAM 630 is a volatile (transitory) recording medium. The processor 610 can operate based on programs stored in the ROM 620, the storage device 640, and the RAM 630. That is, the processor 610 can operate using either a non-volatile recording medium or a volatile recording medium. When implementing each function, the processor 610 may use at least one of the RAM 630 and the storage device 640 as a medium for temporarily storing a program and data.
The network interface 650 relays exchange of data with an external device that is not illustrated via a network. The network interface 650 is, for example, a local area network (LAN) card. Furthermore, the network interface 650 may be used in a wireless manner, not limited to the wired manner.
The computer device 600 configured as described above implements the functions of the information processing devices 10 to 13 by executing the operations of the components of the information processing devices 10 to 13.
Some or all of the above-described example embodiments may be described as in the following supplementary notes, but are not limited to the following supplementary notes.
An information processing device including:
The information processing device according to supplementary note 1, in which
The information processing device according to supplementary note 1 or 2, further including:
The information processing device according to supplementary note 3, in which
An information processing device including:
The information processing device according to supplementary note 5, in which
The information processing device according to supplementary note 5 or 6, in which
The information processing device according to any one of supplementary notes 1 to 7, in which
The information processing device according to any one of supplementary notes 1, 2, 3, and 7, in which
An information processing method including:
A recording medium recording a program for causing a computer to execute:
acquiring at least one condition for evaluation data of a machine learning model;
calculating a performance index of the machine learning model and a performance index of the machine learning model after being updated using a data set specified for each of the at least one condition; and
While the invention has been particularly shown and described with reference to exemplary embodiments thereof, the invention is not limited to these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2023/008239 | 3/6/2023 | WO |